TNN: A transfer learning classifier based on weighted nearest neighbors
Haiyang Sheng and
Guan Yu
Journal of Multivariate Analysis, 2023, vol. 193, issue C
Abstract:
Weighted nearest neighbors (WNN) classifiers are popular non-parametric classifiers. Despite the significant progress in WNN, most existing WNN classifiers are designed for traditional supervised learning problems where both training samples and test samples are assumed to be independent and identically distributed. However, in many real applications, it could be difficult or expensive to obtain training samples from the distribution of interest. Therefore, data collected from some related distributions are often used as supplementary training data for the classification task under the distribution of interest. It is essential to develop effective classification methods that could incorporate both training samples from the distribution of interest (if they exist) and supplementary training samples from a different but related distribution. To address this challenge, we propose a novel Transfer learning weighted Nearest Neighbors (TNN) classifier. As a WNN classifier, TNN determines the weights on the class labels of training samples for different test samples adaptively by minimizing an upper bound on the conditional expectation of the estimation error of the regression function. It puts decreasing weights on the class labels of the successive more distant neighbors. To accommodate the difference between training samples from the distribution of interest and supplementary training samples, TNN adds a non-negative offset to the distance between each supplementary training sample and the test sample, and thus constrains the excessive influence of the supplementary training samples on the prediction. Our theoretical studies show that, under certain conditions, TNN is consistent and minimax optimal (up to a logarithmic factor) in the covariate shift setting. In the posterior drift or the more general setting where both covariate shift and posterior drift exist, the excess risk of TNN depends on the maximum posterior discrepancy between the distribution of the supplementary training samples and the distribution of interest. Both our simulation studies and an application to the land use/land cover mapping problem in geography demonstrate that TNN outperforms other existing methods. It can serve as an effective tool for transfer learning.
Keywords: Binary classification; Minimax optimal; Nearest neighbor; Non-parametric classification; Transfer learning (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0047259X22001178
Full text for ScienceDirect subscribers only
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:jmvana:v:193:y:2023:i:c:s0047259x22001178
Ordering information: This journal article can be ordered from
http://www.elsevier.com/wps/find/supportfaq.cws_home/regional
https://shop.elsevie ... _01_ooc_1&version=01
DOI: 10.1016/j.jmva.2022.105126
Access Statistics for this article
Journal of Multivariate Analysis is currently edited by de Leeuw, J.
More articles in Journal of Multivariate Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().