The Governance Challenge Posed by Large Learning Models
Susan Aaronson
Working Papers from The George Washington University, Institute for International Economic Policy
Abstract:
Only 8 months have passed since Chat-GPT and the large learning model underpinning it took the world by storm. This article focuses on the data supply chain-the data collected and then utilized to train large language models and the governance challenge it presents to policymakers. These challenges include: - How web scraping may affect individuals and firms which hold copyrights. - How web scraping may affect individuals and groups who are supposed to be protected under privacy and personal data protection laws. - How web scraping revealed the lack of protections for content creators and content providers on open access web sites; and - How the debate over open and closed source LLM reveals the lack of clear and universal rules to ensure the quality and validity of datasets. As the US National Institute of Standards explained, many LLMs depend on "largescale datasets, which can lead to data quality and validity concerns. "The difficulty of finding the "right" data may lead AI actors to select datasets based more on accessibility and availability than on suitability... Such decisions could contribute to an environment where the data used in processes is not fully representative of the populations or phenomena that are being modeled, introducing downstream risks" -in short problems of quality and validity (NIST: 2023, 80). Thie author uses qualitative methods to examine these data governance challenges. In general, this report discusses only those governments that adopted specific steps (actions, policies, new regulations etc.) to address web scraping, LLMs, or generative AI. The author acknowledges that these examples do not comprise a representative sample based on income, LLM expertise, and geographic diversity. However, the author uses these examples to show that while some policymakers are responsive to rising concerns, they do not seem to be looking at these issues systemically. A systemic approach has two components: First policymakers recognize that these AI chatbots are a complex system with different sources of data, that are linked to other systems designed, developed, owned, and controlled by different people and organizations. Data and algorithm production, deployment, and use are distributed among a wide range of actors who together produce the system's outcomes and functionality. Hence accountability is diffused and opaque(Cobbe et al: 2023). Secondly, as a report for the US National Academy of Sciences notes, the only way to govern such complex systems is to create "a governance ecosystem that cuts across sectors and disciplinary silos and solicits and addresses the concerns of many stakeholders." This assessment is particularly true for LLMs—a global product with a global supply chain with numerous interdependencies among those who supply data, those who control data, and those who are data subjects or content creators (Cobbe et al: 2023).
Keywords: data; data governance; personal data; property rights; open data; open source; governance (search for similar items in EconPapers)
JEL-codes: P51 (search for similar items in EconPapers)
Pages: 29 pages
Date: 2023-07
New Economics Papers: this item is included in nep-ain
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www2.gwu.edu/~iiep/assets/docs/papers/2023WP/AaronsonIIEP2023-07.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gwi:wpaper:2023-07
Access Statistics for this paper
More papers in Working Papers from The George Washington University, Institute for International Economic Policy Contact information at EDIRC.
Bibliographic data for series maintained by Kyle Renner ().