EconPapers    
Economics at your fingertips  
 

Child Health Dataset Publishing and Mining Based on Differential Privacy Preservation

Wenyu Li, Siqi Wang, Hongwei Wang and Yunlong Lu ()
Additional contact information
Wenyu Li: School of Mathematics and Statistics, Beihua University, Jilin 132013, China
Siqi Wang: School of Mathematics and Statistics, Beihua University, Jilin 132013, China
Hongwei Wang: Departments of Mathematics and Physics, Texas A&M International University, Laredo, TX 78045, USA
Yunlong Lu: School of Mathematics and Statistics, Beihua University, Jilin 132013, China

Mathematics, 2024, vol. 12, issue 16, 1-11

Abstract: With the emergence and development of application requirements such as data analysis and publishing, it is particularly important to use differential privacy protection technology to provide more reliable, secure, and compliant datasets for research in the field of children’s health. This paper focuses on the differential privacy protection of the ultrasound examination health dataset of adolescents in southern Texas from three aspects: differential privacy protection with output perturbation on basic statistics, publication of differential privacy marginal histogram and synthesized data, and a machine learning differential privacy learning algorithm. Firstly, differential privacy protection results with output perturbation show that Laplace and Gaussian mechanisms for numerical data, as well as the exponential mechanism for non-numerical data, can achieve the goal of protecting privacy. The exponential mechanism provides higher privacy protection. Secondly, a differential privacy marginal histogram with four attributes can be obtained with an appropriate privacy budget that approximates the marginal histogram of the original data. In order to publish synthetic data, we construct a synthetic query to obtain the corresponding differential privacy histogram for two attributes. Further, a synthetic dataset can be constructed by following the data distribution of the original dataset and the quality of the synthetic data publication can also be evaluated by the mean square error and error rate. Finally, consider a differential privacy logistic regression model under machine learning to predict whether children have fatty liver in binary classification tasks. The experimental results show that the model combined with quadratic perturbation has better accuracy and privacy protection. This paper can provide differential privacy protection models under different demands, which provides important data release and analysis options for data managers and research organizations, in addition to enriching the research on child health data releasing and mining.

Keywords: children health; synthetic data release; differential privacy; marginal histograms; logistic regression (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/16/2487/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/16/2487/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:16:p:2487-:d:1454731

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:12:y:2024:i:16:p:2487-:d:1454731