EXPLORING COMPRESSION STRATEGIES FOR LARGE LANGUAGE MODELS TOWARDS EFFICIENT ARTIFICIAL INTELLIGENCE IMPLEMENTATIONS
Doinita Sendre (),
Dana-Mihaela Petrosanu () and
Alexandru Pirjan ()
Additional contact information
Doinita Sendre: Romanian-American University, Romania
Dana-Mihaela Petrosanu: National University of Science and Technology POLITEHNICA Bucharest, Romania
Alexandru Pirjan: Romanian-American University, Romania
Journal of Information Systems & Operations Management, 2024, vol. 18, issue 1, 225-260
Abstract:
The rapid advancements of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), have brought and accelerated significant innovations across various domains. Regardless of their widespread usefulness, the scalability of LLMs poses considerable challenges, primarily due to their substantial demands on computational and energy resources. This article explores the importance of developing and applying effective compression techniques to mitigate these numerous challenges. Techniques such as pruning, quantization, and knowledge distillation are analyzed for their potential to decrease a LLM's size and its associated computational demands, while striving to maintain performance integrity. Each technique inherently presents unique trade-offs between model efficiency and accuracy, requiring a nuanced understanding of their applications. We have made an in-depth analysis into the complexities of implementing these techniques, highlighting the balance required between performance and compression, along with the complex process of customization to specific LLM architectures. The article further analyzes the very important validation and testing phases that are much needed for ensuring that compressed models perform adequately in real-world applications. We have also considered the future adaptability of compression techniques to evolving AI models and architectures. The conducted study emphasizes the ongoing need for innovative research in model compression in order to make AI technologies more sustainable and accessible across various sectors, thereby expanding their potential benefits while addressing the limitations and risks associated with their deployment.
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.rebe.rau.ro/RePEc/rau/jisomg/SU24/JISOM-SU24-A16.pdf (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:rau:jisomg:v:18:y:2024:i:1:p:225-260
Access Statistics for this article
More articles in Journal of Information Systems & Operations Management from Romanian-American University Contact information at EDIRC.
Bibliographic data for series maintained by Alex Tabusca ().