The “Collections as ML Data” checklist for machine learning and cultural heritage
Benjamin Charles Germain Lee
Journal of the Association for Information Science & Technology, 2025, vol. 76, issue 2, 375-396
Abstract:
Within cultural heritage, there has been a growing and concerted effort to consider a critical sociotechnical lens when applying machine learning techniques to digital collections. Though the cultural heritage community has collectively developed an emerging body of work detailing responsible operations for machine learning in galleries, museums, archives, and libraries at the organizational level, there remains a paucity of guidelines created for researchers embarking on machine learning projects with digital collections. The manifold stakes and sensitivities involved in applying machine learning to cultural heritage underscore the importance of developing such guidelines. This article contributes to this need by formulating a detailed checklist with guiding questions and practices that can be employed while developing a machine learning project that utilizes cultural heritage data. I call the resulting checklist the “Collections as ML Data” checklist, which, when completed, can be published with the deliverables of the project. By surveying existing projects, including my own project, Newspaper Navigator, I justify the “Collections as ML Data” checklist and demonstrate how the formulated guiding questions can be employed by researchers.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1002/asi.24765
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bla:jinfst:v:76:y:2025:i:2:p:375-396
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=2330-1635
Access Statistics for this article
More articles in Journal of the Association for Information Science & Technology from Association for Information Science & Technology
Bibliographic data for series maintained by Wiley Content Delivery ().