EconPapers    
Economics at your fingertips  
 

ASIDS: A Robust Data Synthesis Method for Generating Optimal Synthetic Samples

Yukun Du, Yitao Cai, Xiao Jin, Hongxia Wang (), Yao Li and Min Lu
Additional contact information
Yukun Du: School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China
Yitao Cai: School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China
Xiao Jin: School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China
Hongxia Wang: School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China
Yao Li: School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China
Min Lu: School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China

Mathematics, 2023, vol. 11, issue 18, 1-15

Abstract: Most existing data synthesis methods are designed to tackle problems with dataset imbalance, data anonymization, and an insufficient sample size. There is a lack of effective synthesis methods in cases where the actual datasets have a limited number of data points but a large number of features and unknown noise. Thus, in this paper we propose a data synthesis method named Adaptive Subspace Interpolation for Data Synthesis (ASIDS). The idea is to divide the original data feature space into several subspaces with an equal number of data points, and then perform interpolation on the data points in the adjacent subspaces. This method can adaptively adjust the sample size of the synthetic dataset that contains unknown noise, and the generated sample data typically contain minimal errors. Moreover, it adjusts the feature composition of the data points, which can significantly reduce the proportion of the data points with large fitting errors. Furthermore, the hyperparameters of this method have an intuitive interpretation and usually require little calibration. Analysis results obtained using simulated original data and benchmark original datasets demonstrate that ASIDS is a robust and stable method for data synthesis.

Keywords: data synthesis; unknown noise; interpolation; sample optimization; robust and stable (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/18/3891/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/18/3891/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:18:p:3891-:d:1238725

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:11:y:2023:i:18:p:3891-:d:1238725