Interpolation of non-random missing values in financial statements’ big data using CatBoost
Shouji Fujimoto (),
Takayuki Mizuno () and
Atushi Ishikawa ()
Additional contact information
Shouji Fujimoto: Kanazawa Gakuin University
Takayuki Mizuno: National Institute of Informatics
Atushi Ishikawa: Kanazawa Gakuin University
Journal of Computational Social Science, 2022, vol. 5, issue 2, No 7, 1301 pages
Abstract:
Abstract Financial statements’ big data have the characteristics of “Incompleteness” and “Nonrepresentative”. In this paper, employing the world’s largest commercial database on finance, ORBIS, we first find that the rate of missing data varies depending on the country, the type and size of financial items, and the year. Using information on missing data, we interpolate non-random missing financial variables from the previous- and/or next-year values of the same financial item, the values of other financial items, and the conditions of missing values determined by CatBoost. Because the distribution of financial values obeys Zipf’s law in the large-scale range and mean and variance diverge, we employ an inverse hyperbolic function to convert the value of a financial item as a target variable. We introduce two types of missing interpolation models according to the two types of situations involving missing objective variables. After verifying the accuracies and stabilities of these models, we describe the properties of firm-scale variables in which non-random missing values are interpolated. In the final stage of this work, we combine these two models. From our observations, we confirm that the range in which Zipf’s law is established becomes wider than before interpolation.
Keywords: Interpolation; Non-random missing; CatBoost; Big data; Firm financials; Machine learning (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
http://link.springer.com/10.1007/s42001-022-00165-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:jcsosc:v:5:y:2022:i:2:d:10.1007_s42001-022-00165-9
Ordering information: This journal article can be ordered from
http://www.springer. ... iences/journal/42001
DOI: 10.1007/s42001-022-00165-9
Access Statistics for this article
Journal of Computational Social Science is currently edited by Takashi Kamihigashi
More articles in Journal of Computational Social Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().