EconPapers    
Economics at your fingertips  
 

#PraCegoVer: A Large Dataset for Image Captioning in Portuguese

Gabriel Oliveira dos Santos, Esther Luna Colombini and Sandra Avila
Additional contact information
Gabriel Oliveira dos Santos: Institute of Computing, University of Campinas (Unicamp), Campinas 13083-852, Brazil
Esther Luna Colombini: Institute of Computing, University of Campinas (Unicamp), Campinas 13083-852, Brazil
Sandra Avila: Institute of Computing, University of Campinas (Unicamp), Campinas 13083-852, Brazil

Data, 2022, vol. 7, issue 2, 1-27

Abstract: Automatically describing images using natural sentences is essential to visually impaired people’s inclusion on the Internet. This problem is known as Image Captioning . There are many datasets in the literature, but most contain only English captions, whereas datasets with captions described in other languages are scarce. We introduce the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese. In contrast to popular datasets, #PraCegoVer has only one reference per image, and both mean and variance of reference sentence length are significantly high, which makes our dataset challenging due to its linguistic aspect. We carry a detailed analysis to find the main classes and topics in our data. We compare #PraCegoVer to MS COCO dataset in terms of sentence length and word frequency. We hope that #PraCegoVer dataset encourages more works addressing the automatic generation of descriptions in Portuguese.

Keywords: #PraCegoVer; image captioning in Portuguese; image captioning; image-to-text (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/7/2/13/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/2/13/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:2:p:13-:d:730414

Access Statistics for this article

Data is currently edited by Ms. Cecilia Yang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jdataj:v:7:y:2022:i:2:p:13-:d:730414