#PraCegoVer: A Large Dataset for Image Captioning in Portuguese
Gabriel Oliveira dos Santos,
Esther Luna Colombini and
Sandra Avila
Additional contact information
Gabriel Oliveira dos Santos: Institute of Computing, University of Campinas (Unicamp), Campinas 13083-852, Brazil
Esther Luna Colombini: Institute of Computing, University of Campinas (Unicamp), Campinas 13083-852, Brazil
Sandra Avila: Institute of Computing, University of Campinas (Unicamp), Campinas 13083-852, Brazil
Data, 2022, vol. 7, issue 2, 1-27
Abstract:
Automatically describing images using natural sentences is essential to visually impaired people’s inclusion on the Internet. This problem is known as Image Captioning . There are many datasets in the literature, but most contain only English captions, whereas datasets with captions described in other languages are scarce. We introduce the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese. In contrast to popular datasets, #PraCegoVer has only one reference per image, and both mean and variance of reference sentence length are significantly high, which makes our dataset challenging due to its linguistic aspect. We carry a detailed analysis to find the main classes and topics in our data. We compare #PraCegoVer to MS COCO dataset in terms of sentence length and word frequency. We hope that #PraCegoVer dataset encourages more works addressing the automatic generation of descriptions in Portuguese.
Keywords: #PraCegoVer; image captioning in Portuguese; image captioning; image-to-text (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/7/2/13/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/2/13/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:2:p:13-:d:730414
Access Statistics for this article
Data is currently edited by Ms. Cecilia Yang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().