Overview of the “Syntucky” data for the participants of the Data literacy & Evidence building class by NYU/Accenture/UMD/KYStats/Coleridge Initiative
Anna-Carolina Haensch
No 4u6we, SocArXiv from Center for Open Science
Abstract:
This document was written for participants of the DATA LITERACY & EVIDENCE BUILDING class by NYU/Accenture/UMD/KYStats/Coleridge Initiative, providing a focused overview of the synthetic data in the class. The instructors decided to use synthetic data for its accessibility and relevance. This document does not delve into the complexities of synthetic data, but rather, it recounts the key aspects that are important for participants when using synthetic data in class. This includes a comprehensive glossary of terms related to synthetic data and privacy, which will help participants navigate. In the following sections, we will summarize the focus of the class, as it helps in understanding the chosen dataset as well as the decision for synthetic data. We will then describe the data more in detail, beginning with an overview of the original data structure, sourced from the Kentucky Postsecondary Education Data System (KPEDS). We will then focus on our target sample, which comprises cohorts that commenced studying in Kentucky between 2013 and 2015, and a cross-sectional snapshot was taken in 2015. Subsequently, we will delve into the data preprocessing techniques implemented to clean and organize the dataset for the class. Lastly, we will discuss the synthetic data generation methods.
Date: 2023-07-31
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/64c6f933c7ab290e91d4e05e/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:4u6we
DOI: 10.31219/osf.io/4u6we
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().