EconPapers    
Economics at your fingertips  
 

CISA: Context Substitution for Image Semantics Augmentation

Sergey Nesteruk, Ilya Zherebtsov, Svetlana Illarionova, Dmitrii Shadrin, Andrey Somov (), Sergey V. Bezzateev, Tatiana Yelina, Vladimir Denisenko and Ivan Oseledets
Additional contact information
Sergey Nesteruk: Skolkovo Institute of Science and Technology (Skoltech), 121205 Moscow, Russia
Ilya Zherebtsov: Voronezh State University of Engineering Technology (VSUET), 394036 Voronezh, Russia
Svetlana Illarionova: Skolkovo Institute of Science and Technology (Skoltech), 121205 Moscow, Russia
Dmitrii Shadrin: Skolkovo Institute of Science and Technology (Skoltech), 121205 Moscow, Russia
Andrey Somov: Skolkovo Institute of Science and Technology (Skoltech), 121205 Moscow, Russia
Sergey V. Bezzateev: Saint-Petrsburg State University of Aerospace Instrumentation (SUAI), 190000 Saint Petersburg, Russia
Tatiana Yelina: Saint-Petrsburg State University of Aerospace Instrumentation (SUAI), 190000 Saint Petersburg, Russia
Vladimir Denisenko: Voronezh State University of Engineering Technology (VSUET), 394036 Voronezh, Russia
Ivan Oseledets: Skolkovo Institute of Science and Technology (Skoltech), 121205 Moscow, Russia

Mathematics, 2023, vol. 11, issue 8, 1-24

Abstract: Large datasets catalyze the rapid expansion of deep learning and computer vision. At the same time, in many domains, there is a lack of training data, which may become an obstacle for the practical application of deep computer vision models. To overcome this problem, it is popular to apply image augmentation. When a dataset contains instance segmentation masks, it is possible to apply instance-level augmentation. It operates by cutting an instance from the original image and pasting to new backgrounds. This article challenges a dataset with the same objects present in various domains. We introduce the Context Substitution for Image Semantics Augmentation framework (CISA), which is focused on choosing good background images. We compare several ways to find backgrounds that match the context of the test set, including Contrastive Language–Image Pre-Training (CLIP) image retrieval and diffusion image generation. We prove that our augmentation method is effective for classification, segmentation, and object detection with different dataset complexity and different model types. The average percentage increase in accuracy across all the tasks on a fruits and vegetables recognition dataset is 4.95 % . Moreover, we show that the Fréchet Inception Distance (FID) metrics has a strong correlation with model accuracy, and it can help to choose better backgrounds without model training. The average negative correlation between model accuracy and the FID between the augmented and test datasets is 0.55 in our experiments.

Keywords: image augmentation; computer vision; data collection; image retrieval; image generation; few-shot learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/8/1818/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/8/1818/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:8:p:1818-:d:1121094

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:11:y:2023:i:8:p:1818-:d:1121094