Fine-tuning image-to-text models on Liechtenstein tourist attractions

Ebrahimi, Pejman; Schneider, Johannes

Fine-tuning image-to-text models on Liechtenstein tourist attractions

Pejman Ebrahimi () and Johannes Schneider ()
Additional contact information
Pejman Ebrahimi: University of Liechtenstein
Johannes Schneider: University of Liechtenstein

Electronic Markets, 2025, vol. 35, issue 1, No 55, 25 pages

Abstract: Abstract Adjusting pre-trained artificial intelligence models to domain-specific problems is essential for many business problems. But domain-specific data is often scarce and expensive to collect. Moreover, fine-tuning on small datasets is challenging, as it carries risks of overfitting and catastrophic forgetting. This paper systematically investigates the effectiveness of fine-tuning pre-trained image-to-text models for domain-specific applications, emphasizing how model performance scales with dataset size. We compare two state-of-the-art architectures, Generative Image-to-Text (GIT) and Florence-2, using small and large datasets of Liechtenstein tourism attractions. Our analysis reveals a nuanced relationship between model architecture and data efficiency. On the small dataset, measured by BLEU score, GIT outperformed Florence-2 (0.71 vs 0.03). However, with the larger dataset, Florence-2 surpassed GIT by 33–37%. Similarly, CIDEr scores improved dramatically from 0.00 to 0.97 for GIT and from 0.33 to 0.95 for Florence-2, underscoring the critical importance of data volume. Our results suggest that fine-tuned models are capable of generating contextually accurate captions, capturing architectural details, historical context, and geographical information of tourist attractions, as well as potentially benefiting other domains like cultural heritage preservation and education. Our methodology emphasizes computational efficiency, requiring less than 3 GB of GPU memory for both GIT and Florence-2, making these approaches accessible to organizations with limited resources. This research contributes both theoretical insights into model scaling properties and practical guidance on selecting appropriate architectures based on available data resources. The results demonstrate that while fine-tuning can enable reasonable performance even with limited domain-specific data, architecture selection should be informed by anticipated data availability. Furthermore, evaluating multiple models is highly recommended.

Keywords: Image-to-text models; Fine-tuning; Domain-specific applications; Evaluation metrics (BLEU; CIDEr; ROUGE); Liechtenstein tourist attractions; Data scaling (search for similar items in EconPapers)
JEL-codes: C02 O3 R10 Y8 Z3 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s12525-025-00806-7 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:elmark:v:35:y:2025:i:1:d:10.1007_s12525-025-00806-7

Ordering information: This journal article can be ordered from
http://www.springer. ... ystems/journal/12525

DOI: 10.1007/s12525-025-00806-7

Access Statistics for this article

Electronic Markets is currently edited by Rainer Alt and Hans-Dieter Zimmermann

More articles in Electronic Markets from Springer, IIM University of St. Gallen
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().