Relational Databases and Machine Learning for Qualitative Big Data
Erik Lakomaa () and
Christoffer Friedl ()
Additional contact information
Erik Lakomaa: Institute for Economic and Business History Research, Postal: Stockholm School of Economics, P.O. Box 6501, SE-113 83 Stockholm, Sweden
Christoffer Friedl: Stockholm School of Economics, Postal: Stockholm School of Economics, P.O. Box 6501, SE-113 83 Stockholm, Sweden
No 2026:2, SSE Working Paper Series in Economic History from Stockholm School of Economics
Abstract:
Recent advances in large-scale digitisation have created new opportunities for economic and business historians who work with substantial bodies of qualitative archival material. Although historical datasets of around 10,000 to 100,000 observations are modest compared to conventional big data, they present similar information processing challenges and make it possible to apply a wide range of machine learning techniques. In this paper, we show how relational databases can provide the necessary infrastructure for preparing, structuring, and analysing large qualitative historical datasets, and how they support the effective use of machine learning tools, including Large Language Models (LLMs). We draw on a research program that has collected more than 114,000 digitised documents from over 30 archives. Our relational database design enables us to structure unstructured sources, standardise metadata, link documents to events and actors, and create longitudinal datasets that can be used for supervised learning, topic modelling, document classification, and embedding-based similarity searches. We also assess the value and limitations of LLMs in historical research. LLMs can accelerate tasks such as document triage, entity recognition, thematic grouping, and preliminary coding. At the same time, they introduce risks related to hallucinations, opaque reasoning processes, and difficulties in tracing the evidentiary basis of outputs. We argue that relational databases reduce these risks by retaining document-level traceability, by making the full set of consulted sources transparent, and by allowing researchers to verify and reinterpret AI-assisted results by saving the epistemological chain of tentative AI suggestions and subsequent researcher validation. Our contribution is an empirically grounded demonstration of how qualitative big data, relational databases, and machine learning methods can be combined to advance economic history, along with a discussion of the safeguards needed to ensure these tools are used responsibly.
Keywords: Qualitative methods; Artificial intelligence; Machine Learning; Big data; Economic History (search for similar items in EconPapers)
JEL-codes: A00 N00 N01 (search for similar items in EconPapers)
Pages: 23 pages
Date: 2026-02-11
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hhs:haechi:2026_002
Access Statistics for this paper
More papers in SSE Working Paper Series in Economic History from Stockholm School of Economics Stockholm School of Economics, P.O. Box 6501, 113 83 Stockholm, Sweden. Contact information at EDIRC.
Bibliographic data for series maintained by Erik Lakomaa ().