How Much Can Machines Learn Finance from Chinese Text Data?

Zhou, Yang; Fan, Jianqing; Xue, Lirong

How Much Can Machines Learn Finance from Chinese Text Data?

Yang Zhou (), Jianqing Fan and Lirong Xue ()
Additional contact information
Yang Zhou: Institute for Big Data, Fudan University, Shanghai 200433, China; MOE Laboratory for National Development and Intelligent Governance, Fudan University, Shanghai 200433, China
Lirong Xue: Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544

Management Science, 2024, vol. 70, issue 12, 8962-8987

Abstract: How much can we learn finance directly from text data? This paper presents a new framework for learning textual data based on the factor augmentation model and sparsity regularization, called the factor-augmented regularized model for prediction (FarmPredict), to let machines learn financial returns directly from news. FarmPredict allows the model itself to extract information directly from articles without predefined information, such as dictionaries or pretrained models as in most studies. Using unsupervised learned factors to augment the predictors would benefit our method with a “double-robust” feature: that the machine would learn to balance between individual words or text factors/topics. It also avoids the information loss of factor regression in dimensionality reduction. We apply our model to the Chinese stock market with a large proportion of retail investors by using Chinese news data to predict financial returns. We show that positive sentiments scored by our FarmPredict approach from news generate on average 83 basic points (bps) stock daily excess returns, and negative news has an adverse impact of 26 bps on the days of news announcements, where both effects can last for a few days. This asymmetric effect aligns well with the short-sale constraints in the Chinese equity market. The result shows that the machine-learned prediction does provide sizeable predictive power with an annualized return of 54% at most with a simple investment strategy. Compared with other statistical and machine learning methods, FarmPredict significantly outperforms them on model prediction and portfolio performance. Our study demonstrates the far-reaching potential of using machines to learn text data.

Keywords: machine learning; FarmPredict; factor model; sparse regression; textual analysis (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2022.01468 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:70:y:2024:i:12:p:8962-8987

Access Statistics for this article

More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().