Evaluating algorithmic fairness of machine learning models in predicting underweight, overweight, and adiposity across socioeconomic and caste groups in India: evidence from the longitudinal ageing study in India
John Tayu Lee,
Sheng Hui Hsu,
Vincent Cheng-Sheng Li,
Kanya Anindya,
Meng-Huan Chen,
Charlotte Wang,
Toby Kai-Bo Shen,
Valerie Tzu Ning Liu,
Hsiao-Hui Chen and
Rifat Atun
PLOS Digital Health, 2025, vol. 4, issue 11, 1-16
Abstract:
Machine learning (ML) models are increasingly applied to predict body mass index (BMI) and related outcomes, yet their fairness across socioeconomic and caste groups remains uncertain, particularly in contexts of structural inequality. Using nationally representative data from more than 55,000 adults aged 45 years and older in the Longitudinal Ageing Study in India (LASI), we evaluated the accuracy and fairness of multiple ML algorithms—including Random Forest, XGBoost, Gradient Boosting, LightGBM, Deep Neural Networks, and Deep Cross Networks—alongside logistic regression for predicting underweight, overweight, and central adiposity. Models were trained on 80% of the data and tested on 20%, with performance assessed using AUROC, accuracy, sensitivity, specificity, and precision. Fairness was evaluated through subgroup analyses across socioeconomic and caste groups and equity-based metrics such as Equalized Odds and Demographic Parity. Feature importance was examined using SHAP values, and bias-mitigation methods were implemented at pre-processing, in-processing, and post-processing stages. Tree-based models, particularly LightGBM and Gradient Boosting, achieved the highest AUROC values (0.79–0.84). Incorporating socioeconomic and health-related variables improved prediction, but fairness gaps persisted: performance declined for scheduled tribes and lower socioeconomic groups. SHAP analyses identified grip strength, gender, and residence as key drivers of prediction differences. Among mitigation strategies, Reject Option Classification and Equalized Odds Post-processing moderately reduced subgroup disparities but sometimes decreased overall performance, whereas other approaches yielded minimal gains. ML models can effectively predict obesity and adiposity risk in India, but addressing bias is essential for equitable application. Continued refinement of fairness-aware ML methods is needed to support inclusive and effective public-health decision-making.Author summary: India now faces the paradox of widespread under-nutrition alongside a rising tide of obesity among its older population. We asked whether state-of-the-art machine-learning models could accurately identify individuals at highest risk of under-weight, overweight–obesity, and central adiposity while treating all social groups equitably. Using nationally representative data on more than 55,000 adults aged 45 years and above, we compared gradient-boosted decision trees, random forests, logistic regression, and other approaches with conventional regression techniques. Overall, the modern algorithms produced the strongest predictions. Yet a closer look revealed systematic shortfalls for scheduled tribes, scheduled castes, and the lowest income quintile—even when the models achieved excellent accuracy in the population as a whole. We then applied several well-established bias-mitigation strategies, such as re-weighting the training data and post-processing the decision thresholds. These interventions reduced the performance gap for disadvantaged groups, albeit at a modest cost to overall accuracy. By combining careful fairness audits with Shapley-based interpretation of feature importance, we illuminate how socioeconomic and caste-related factors shape both nutritional risk and prediction error. Our findings underscore that fair, trustworthy decision support systems in public health must be designed explicitly with equity objectives, rather than assuming that technical excellence alone will guarantee just outcomes.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000951 (text/html)
https://journals.plos.org/digitalhealth/article/fi ... 00951&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pdig00:0000951
DOI: 10.1371/journal.pdig.0000951
Access Statistics for this article
More articles in PLOS Digital Health from Public Library of Science
Bibliographic data for series maintained by digitalhealth ().