Regularized regression in ultra-small chemometric datasets: A methodological case study using FTIR spectra of Schiff bases
Khudhayr Abdullah Rashedi,
Tariq Saleh Alshammari,
Khalid Mohammed Alshammari,
Talal Abdulrahman Alanazi,
Javid Shabbir and
Tahir Mehmood
PLOS ONE, 2026, vol. 21, issue 6, 1-14
Abstract:
This study is not intended to establish a predictive framework for reaction yield. Instead, it is framed as a methodological investigation examining the statistical behavior and instability of regularized regression techniques when applied to ultra-small, high-dimensional chemometric datasets. The analysis is based on a curated dataset of Schiff base compounds (n = 21) for which post-synthesis Fourier Transform Infrared (FTIR) spectra and experimentally reported reaction yields are available. Structural information for all compounds is fully disclosed to ensure chemical transparency. Descriptive physicochemical properties, including molecular weight, physical appearance, retention factor (Rf), melting point, and reaction yield, are summarized to characterize the dataset; however, only yield (%) is used as the response variable in the subsequent statistical analyses. Baseline-corrected and normalized FTIR spectra were transformed into a high-dimensional explanatory matrix and analyzed using regularized regression approaches designed for high collinearity and p≫n settings, specifically sparse Partial Least Squares (sPLS) and Elastic Net regression. Model behavior was examined using leave-one-out cross-validation (LOOCV), which is more appropriate for extremely small datasets where conventional train–test splitting is unreliable. Given the severe sample-size limitation, the analysis is interpreted as a methodological illustration rather than a generalizable predictive framework. Model outputs are therefore discussed primarily in terms of coefficient sparsity, variability, and stability under regularization rather than predictive accuracy. Overall, the study demonstrates the practical challenges and statistical instability that arise when regression-based machine learning techniques are applied to ultra-small spectral datasets. The results highlight the importance of cautious interpretation and methodological transparency when chemometric models are developed under severe sample-size constraints.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0341850 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 41850&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0341850
DOI: 10.1371/journal.pone.0341850
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().