Linking Industry Sectors and Financial Statements: A Hybrid Approach for Company Classification
Guy Stephane Waffo Dzuyo (),
Gaël Guibon (),
Christophe Cerisara () and
Luis Belmar-Letelier ()
Additional contact information
Guy Stephane Waffo Dzuyo: Forvis Mazars, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique, SYNALP - Natural Language Processing : representations, inference and semantics - LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery - LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique
Gaël Guibon: LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique, LIPN - Laboratoire d'Informatique de Paris-Nord - CNRS - Centre National de la Recherche Scientifique - Université Sorbonne Paris Nord, SYNALP - Natural Language Processing : representations, inference and semantics - LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery - LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique
Christophe Cerisara: SYNALP - Natural Language Processing : representations, inference and semantics - LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery - LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications - Inria - Institut National de Recherche en Informatique et en Automatique - CentraleSupélec - UL - Université de Lorraine - CNRS - Centre National de la Recherche Scientifique
Luis Belmar-Letelier: Forvis Mazars
Post-Print from HAL
Abstract:
The identification of the financial characteristics of industry sectors has a large importance in accounting audit, allowing auditors to prioritize the most important area during audit. Existing company classification standards such as the Standard Industry Classification (SIC) code allow to map a company to a category based on its activity and products. In this paper, we explore the potential of machine learning algorithms and language models to analyze the relationship between those categories and companies' financial statements. We propose a supervised company classification methodology and analyze several types of representations for financial statements. Existing works address this task using solely numerical information in financial records. Our findings show that beyond numbers, textual information occurring in financial records can be leveraged by language models to match the performance of dedicated decision tree-based classifiers, while providing better explainability and more generic accounting representations. We think this work can serve as a preliminary work towards semi-automatic auditing. Models, code, and a preprocessed dataset are publicly available for further research at https://github.com/WaguyMz/hybrid company classification
Keywords: Machine Learning; Industry Sectors; Large Language Models; LLM Applications; Audit; Financial Statement (search for similar items in EconPapers)
Date: 2025-02-25
New Economics Papers: this item is included in nep-acc, nep-ain and nep-big
Note: View the original document on HAL open archive server: https://hal.science/hal-05031499v1
References: View references in EconPapers View complete reference list from CitEc
Citations:
Published in The 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025), Feb 2025, Philadelphia (Pennsylvania), United States
Downloads: (external link)
https://hal.science/hal-05031499v1/document (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:hal:journl:hal-05031499
Access Statistics for this paper
More papers in Post-Print from HAL
Bibliographic data for series maintained by CCSD ().