Forecasting mental states in schizophrenia using digital phenotyping data
Thierry Jean,
Rose Guay Hottin and
Pierre Orban
PLOS Digital Health, 2025, vol. 4, issue 2, 1-20
Abstract:
The promise of machine learning successfully exploiting digital phenotyping data to forecast mental states in psychiatric populations could greatly improve clinical practice. Previous research focused on binary classification and continuous regression, disregarding the often ordinal nature of prediction targets derived from clinical rating scales. In addition, mental health ratings typically show important class imbalance or skewness that need to be accounted for when evaluating predictive performance. Besides it remains unclear which machine learning algorithm is best suited for forecast tasks, the eXtreme Gradient Boosting (XGBoost) and long short-term memory (LSTM) algorithms being 2 popular choices in digital phenotyping studies. The CrossCheck dataset includes 6,364 mental state surveys using 4-point ordinal rating scales and 23,551 days of smartphone sensor data contributed by patients with schizophrenia. We trained 120 machine learning models to forecast 10 mental states (e.g., Calm, Depressed, Seeing things) from passive sensor data on 2 predictive tasks (ordinal regression, binary classification) with 2 learning algorithms (XGBoost, LSTM) over 3 forecast horizons (same day, next day, next week). A majority of ordinal regression and binary classification models performed significantly above baseline, with macro-averaged mean absolute error values between 1.19 and 0.77, and balanced accuracy between 58% and 73%, which corresponds to similar levels of performance when these metrics are scaled. Results also showed that metrics that do not account for imbalance (mean absolute error, accuracy) systematically overestimated performance, XGBoost models performed on par with or better than LSTM models, and a significant yet very small decrease in performance was observed as the forecast horizon expanded. In conclusion, when using performance metrics that properly account for class imbalance, ordinal forecast models demonstrated comparable performance to the prevalent binary classification approach without losing valuable clinical information from self-reports, thus providing richer and easier to interpret predictions.Author summary: Symptoms associated with mental health disorders vary greatly over time. Periods of partial remission unfortunately alternate with relapses defined by a marked worsening of symptoms. Hence, assessing future risk and adopting preventive measures is a key challenge for clinical psychiatry. With their many sensors, smartphones can provide novel insights into human behavior outside the medical office. By using machine learning, a branch of artificial intelligence, it is possible to use such smartphone sensor data to predict future mental states and symptoms in psychiatric patients. The present work highlights the importance of predicting fine-grained levels of symptom severity, as commonly reported by patients using so-called ordinal rating scales. Such ordinal predictions were not less accurate than the simplified binary predictions (on/off, high/low) often reported in previous efforts. Besides, we underscore that severe mental states are rare compared to healthy ones, and that this imbalance brings methodological challenges that need to be taken into account to develop valid predictive models.
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000734 (text/html)
https://journals.plos.org/digitalhealth/article/fi ... 00734&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pdig00:0000734
DOI: 10.1371/journal.pdig.0000734
Access Statistics for this article
More articles in PLOS Digital Health from Public Library of Science
Bibliographic data for series maintained by digitalhealth ().