Training-Free Few-Shot Image Classification via Kernel Density Estimation with CLIP Embeddings
Marcos Sergio Pacheco dos Santos Lima Junior (),
Juan Miguel Ortiz- de-Lazcano-Lobato and
Ezequiel López-Rubio
Additional contact information
Marcos Sergio Pacheco dos Santos Lima Junior: Department of Computer Languages and Computer Science, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Andalucía, Spain
Juan Miguel Ortiz- de-Lazcano-Lobato: Department of Computer Languages and Computer Science, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Andalucía, Spain
Ezequiel López-Rubio: Department of Computer Languages and Computer Science, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Andalucía, Spain
Mathematics, 2025, vol. 13, issue 22, 1-28
Abstract:
Few-shot image classification aims to recognize novel classes from only a handful of labeled examples, a challenge in domains where data collection is costly or impractical. Existing solutions often rely on meta learning, fine tuning, or data augmentation, introducing computational overhead, risk of overfitting, or are not highly efficient. This paper introduces ProbaCLIP, a simple training-free approach that leverages Kernel Density Estimation (KDE) within the embedding space of Contrastive Language-Image Pre-training (CLIP). Unlike other CLIP-based methods, the proposed approach operates solely on visual embeddings and does not require text labels. Class-conditional probability densities were estimated from few-shot support examples, and queries were classified by likelihood evaluation, where Principal Component Analysis (PCA) was used for dimensionality reduction, compressing the dissimilarities between classes on each episode. We further introduced an optional bandwidth optimization strategy and a consensus decision mechanism through cross-validation, while addressing the special case of one-shot classification with distance-based measures. Extensive experiments on multiple datasets demonstrated that our method achieved competitive or superior accuracy compared to the state-of-the-art few-shot classifiers, reaching up to 98.37% accuracy in five-shot tasks and up to 99.80% in a 16-shot framework with ViT-L/14@336px. We proved our methodology by achieving high performance without gradient-based training, text supervision, or auxiliary meta-training datasets, emphasizing the effectiveness of combining pre-trained embeddings with statistical density estimation for data-scarce classification.
Keywords: few-shot; image classification; vision-language models; kernel density estimation; principal component analysis; vision transformer (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/22/3615/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/22/3615/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:22:p:3615-:d:1792021
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().