EconPapers    
Economics at your fingertips  
 

Synthetic data generation method providing enhanced covariance matrix estimation

Seungkyu Kim (), Johan Lim () and Donghyeon Yu ()
Additional contact information
Seungkyu Kim: Seoul National University
Johan Lim: Seoul National University
Donghyeon Yu: Inha University

Computational Statistics, 2025, vol. 40, issue 7, No 23, 4007-4035

Abstract: Abstract Synthetic data generation is an important tool to ensure data confidentiality. Various synthetic data generators have been developed in the literature. The methods in the literature are mostly for general purposes. They aim to generate data whose distributions are the same as the original data set, and the synthesized data are used for every purpose depending on who uses them. However, it could not be good for all purposes. In this paper, we study the synthetic data generation tailored for a specific purpose. We are particularly interested incovariance matrix estimation, which is a key part of many multivariate statistical analyses. To do it, we first see the connection between the sequential regression model and the modified Cholesky decomposition. We then devise a new synthetic data generator, named SynCov, that controls the error variances of the sequential regression model. We show that the sample covariance matrix of the synthetic data generated by SynCov is equivalent to a shrinkage covariance matrix estimator, which reduces estimation error in Frobenius norm. Our comprehensive numerical study shows that SynCov performs better than other synthetic data generation methods in covariance matrix estimation. Finally, we apply our SynCov to two real data examples, (i) the estimation of the covariance matrix of the (selected) variables of the Los Angeles City Employee Payroll data and (ii) the classification of the Taiwanese Bankruptcy Data.

Keywords: Covariance matrix estimation; Data confidentiality; Sequential regression model; Shrinkage estimator; Synthetic data generator (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s00180-025-01643-0 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:compst:v:40:y:2025:i:7:d:10.1007_s00180-025-01643-0

Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/180/PS2

DOI: 10.1007/s00180-025-01643-0

Access Statistics for this article

Computational Statistics is currently edited by Wataru Sakamoto, Ricardo Cao and Jürgen Symanzik

More articles in Computational Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-07-14
Handle: RePEc:spr:compst:v:40:y:2025:i:7:d:10.1007_s00180-025-01643-0