Leveraging Computer Vision and Visual LLMs for Cost-Effective and Consistent Street Food Safety Assessment in Kolkata India
Alexey Chernikov (),
Klaus Ackermann (),
Caitlin Brown () and
Denni Tommasi ()
Additional contact information
Alexey Chernikov: SoDa Labs & Department of Econometrics and Business Statistics, Monash University
Klaus Ackermann: SoDa Labs & Department of Econometrics and Business Statistics, Monash University
Caitlin Brown: Department of Economics, Université Laval
Denni Tommasi: Department of Economics, University of Bologna
No 2025-02, SoDa Laboratories Working Paper Series from Monash University, SoDa Laboratories
Abstract:
Ensuring street food safety in developing countries is crucial due to the high prevalence of foodborne illnesses. Traditional methods of food safety assessments face challenges such as resource constraints, logistical issues, and subjective biases influenced by surveyors personal lived experiences, particularly when interacting with local communities. For instance, a local food safety inspector may inadvertently overrate the quality of infrastructure due to prior familiarity or past purchases, thereby compromising objective assessment. This subjectivity highlights the necessity for technologies that reduce human biases and enhance the accuracy of survey data across various domains. This paper proposes a novel approach based on a combination of Computer Vision and a lightweight Visual Large Language Model (VLLM) to automate the detection and analysis of critical food safety infrastructure in street food vendor environments at a field experiment in Kolkata, India. The system utilises a three-stage object extraction pipeline from the video to identify, extract and select unique representations of critical elements such as hand-washing stations, dishwashing areas, garbage bins, and water tanks. These four infrastructure items are crucial for maintaining safe food practices, irrespective of the specific methods employed by the vendors. A VLLM then analyses the extracted representations to assess compliance with food safety standards. Notably, over half of the pipeline can be processed using a user's smartphone, significantly reducing government server workload. By leveraging this decentralised approach, the proposed system decreases the analysis cost by many orders of magnitude compared to alternatives like ChatGPT or Claude 3.5. Additionally, processing data on local government servers provides better privacy and security than cloud platforms, addressing critical ethical considerations. This automated approach significantly improves efficiency, consistency, and scalability, providing a robust solution to enhance public health outcomes in developing regions.
Keywords: Food Safety; Visual Language Models; Survey Accuracy; Field Assessments; Bias Reduction (search for similar items in EconPapers)
JEL-codes: C83 I18 O12 O33 Q18 (search for similar items in EconPapers)
Date: 2025-03
New Economics Papers: this item is included in nep-agr
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://soda-wps.s3-website-ap-southeast-2.amazonaw ... r/sodwps/2025-02.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ajr:sodwps:2025-02
Ordering information: This working paper can be ordered from
https://www.monash.edu/business/soda-labs/home
Access Statistics for this paper
More papers in SoDa Laboratories Working Paper Series from Monash University, SoDa Laboratories SoDa Laboratories, Monash University, Victoria 3800, Australia. Contact information at EDIRC.
Bibliographic data for series maintained by Ashani Amarasinghe ( this e-mail address is bad, please contact ).