Training-Free Few-Shot Image Classification via Kernel Density Estimation with CLIP Embeddings

Junior, Marcos Sergio Pacheco dos Santos Lima; de-Lazcano-Lobato, Juan Miguel Ortiz-; López-Rubio, Ezequiel

Training-Free Few-Shot Image Classification via Kernel Density Estimation with CLIP Embeddings

Marcos Sergio Pacheco dos Santos Lima Junior (), Juan Miguel Ortiz- de-Lazcano-Lobato and Ezequiel López-Rubio
Additional contact information
Marcos Sergio Pacheco dos Santos Lima Junior: Department of Computer Languages and Computer Science, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Andalucía, Spain
Juan Miguel Ortiz- de-Lazcano-Lobato: Department of Computer Languages and Computer Science, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Andalucía, Spain
Ezequiel López-Rubio: Department of Computer Languages and Computer Science, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Andalucía, Spain

Mathematics, 2025, vol. 13, issue 22, 1-28

Abstract: Few-shot image classification aims to recognize novel classes from only a handful of labeled examples, a challenge in domains where data collection is costly or impractical. Existing solutions often rely on meta learning, fine tuning, or data augmentation, introducing computational overhead, risk of overfitting, or are not highly efficient. This paper introduces ProbaCLIP, a simple training-free approach that leverages Kernel Density Estimation (KDE) within the embedding space of Contrastive Language-Image Pre-training (CLIP). Unlike other CLIP-based methods, the proposed approach operates solely on visual embeddings and does not require text labels. Class-conditional probability densities were estimated from few-shot support examples, and queries were classified by likelihood evaluation, where Principal Component Analysis (PCA) was used for dimensionality reduction, compressing the dissimilarities between classes on each episode. We further introduced an optional bandwidth optimization strategy and a consensus decision mechanism through cross-validation, while addressing the special case of one-shot classification with distance-based measures. Extensive experiments on multiple datasets demonstrated that our method achieved competitive or superior accuracy compared to the state-of-the-art few-shot classifiers, reaching up to 98.37% accuracy in five-shot tasks and up to 99.80% in a 16-shot framework with ViT-L/14@336px. We proved our methodology by achieving high performance without gradient-based training, text supervision, or auxiliary meta-training datasets, emphasizing the effectiveness of combining pre-trained embeddings with statistical density estimation for data-scarce classification.

Keywords: few-shot; image classification; vision-language models; kernel density estimation; principal component analysis; vision transformer (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/22/3615/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/22/3615/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:22:p:3615-:d:1792021

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().