EconPapers    
Economics at your fingertips  
 

Exploratory Data Analysis on Large Data Sets: The Example of Salary Variation in Spanish Social Security Data

Catia Nicodemo and Albert Satorra ()
Additional contact information
Albert Satorra: Universitat Pompeu Fabra

No 13459, IZA Discussion Papers from Institute of Labor Economics (IZA)

Abstract: New challenges arise in data visualization when a sizable database is used in the analysis. With many data points, classical scatterplots are non-informative due to the cluttering of points. On the contrary, simple plots such as the boxplot that are of limited use in small samples, offer great potential to facilitate group comparison in the case of an extensive sample. This paper presents Exploratory Data Analysis (EDA) methods that are useful when a large dataset is involved. The EDA methods, (introduced by Tukey in his seminal book of 1977) encompass a set of statistical tools aimed to extract information from data using simple graphical tools. In this paper, some of the EDA methods like the Boxplot and Scatterplot are revisited and enhanced using modern graphical computational devices (as, e.g., the heat-map) and their use illustrated with Spanish Social Security data. We explore how earnings vary across several factors like age, gender, type of occupation and contract and in particular, the gender gap in salaries is visualized in various dimensions relating to the type of occupation. The EDA methods are also applied to assessing competing regressions with earnings as the dependent variable. The methods discussed should be useful to researchers to assess heterogeneity in data, across group-variation, and classical diagnostic plots of residuals from alternative models fits.

Keywords: heat-maps; EDA Analysis; large dataset; ggplot; R (search for similar items in EconPapers)
JEL-codes: C55 C80 J01 J08 Y10 (search for similar items in EconPapers)
Pages: 32 pages
Date: 2020-07
New Economics Papers: this item is included in nep-lab
References: View references in EconPapers View complete reference list from CitEc
Citations:

Published - published in: BRQ Business Research Quarterly, 2022, 25 (3), 283–294

Downloads: (external link)
https://docs.iza.org/dp13459.pdf (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:iza:izadps:dp13459

Ordering information: This working paper can be ordered from
IZA, Margard Ody, P.O. Box 7240, D-53072 Bonn, Germany

Access Statistics for this paper

More papers in IZA Discussion Papers from Institute of Labor Economics (IZA) IZA, P.O. Box 7240, D-53072 Bonn, Germany. Contact information at EDIRC.
Bibliographic data for series maintained by Holger Hinte ().

 
Page updated 2025-03-30
Handle: RePEc:iza:izadps:dp13459