Interpretable Company Similarity with Sparse Autoencoders
Marco Molinari,
Victor Shao,
Vladimir Tregubiak,
Abhimanyu Pandey,
Mateusz Mikolajczak and
Sebastian Kuznetsov Ryder Torres Pereira
Papers from arXiv.org
Abstract:
Determining company similarity is a vital task in finance, underpinning hedging, risk management, portfolio diversification, and more. Practitioners often rely on sector and industry classifications to gauge similarity, such as SIC-codes and GICS-codes - the former being used by the U.S. Securities and Exchange Commission (SEC), and the latter widely used by the investment community. Since these classifications can lack granularity and often need to be updated, using clusters of embeddings of company descriptions has been proposed as a potential alternative, but the lack of interpretability in token embeddings poses a significant barrier to adoption in high-stakes contexts. Sparse Autoencoders (SAEs) have shown promise in enhancing the interpretability of Large Language Models (LLMs) by decomposing LLM activations into interpretable features. We apply SAEs to company descriptions, obtaining meaningful clusters of equities in the process. We benchmark SAE features against SIC-codes, Major Group codes, and Embeddings. Our results demonstrate that SAE features not only replicate but often surpass sector classifications and embeddings in capturing fundamental company characteristics. This is evidenced by their superior performance in correlating monthly returns - a proxy for similarity - and generating higher Sharpe ratio co-integration strategies, which underscores deeper fundamental similarities among companies.
Date: 2024-12, Revised 2024-12
New Economics Papers: this item is included in nep-ain, nep-big, nep-inv and nep-mac
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://arxiv.org/pdf/2412.02605 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2412.02605
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().