On regression and classification with possibly missing response variables in the data
Majid Mojirsheibani (),
William Pouliot and
Andre Shakhbandaryan
Additional contact information
Majid Mojirsheibani: California State University
Andre Shakhbandaryan: California State University
Metrika: International Journal for Theoretical and Applied Statistics, 2024, vol. 87, issue 6, No 1, 607-648
Abstract:
Abstract This paper considers the problem of kernel regression and classification with possibly unobservable response variables in the data, where the mechanism that causes the absence of information can depend on both predictors and the response variables. Our proposed approach involves two steps: First we construct a family of models (possibly infinite dimensional) indexed by the unknown parameter of the missing probability mechanism. In the second step, a search is carried out to find the empirically optimal member of an appropriate cover (or subclass) of the underlying family in the sense of minimizing the mean squared prediction error. The main focus of the paper is to look into some of the theoretical properties of these estimators. The issue of identifiability is also addressed. Our methods use a data-splitting approach which is quite easy to implement. We also derive exponential bounds on the performance of the resulting estimators in terms of their deviations from the true regression curve in general $$L_p$$ L p norms, where we allow the size of the cover or subclass to diverge as the sample size n increases. These bounds immediately yield various strong convergence results for the proposed estimators. As an application of our findings, we consider the problem of statistical classification based on the proposed regression estimators and also look into their rates of convergence under different settings. Although this work is mainly stated for kernel-type estimators, it can also be extended to other popular local-averaging methods such as nearest-neighbor and histogram estimators.
Keywords: Regression; Partially observed data; Kernel; Convergence; Classification; Margin condition; Primary 62G05; Secondary 62G08 (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00184-023-00923-3 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:metrik:v:87:y:2024:i:6:d:10.1007_s00184-023-00923-3
Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/184/PS2
DOI: 10.1007/s00184-023-00923-3
Access Statistics for this article
Metrika: International Journal for Theoretical and Applied Statistics is currently edited by U. Kamps and Norbert Henze
More articles in Metrika: International Journal for Theoretical and Applied Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().