Working with Large and Complex Datasets
Thomas W. MacFarland and
Jan M. Yates
Additional contact information
Thomas W. MacFarland: Nova Southeastern University Fort Lauderdale, Senior Research Associate, Office of Institutional Effectiveness
Jan M. Yates: Nova Southeastern University Fort Lauderdale, Professor Emerita, Abraham S. Fischler College of Education
Chapter Chapter 8 in Using R for Biostatistics, 2021, pp 585-882 from Springer
Abstract:
Abstract The purpose of this lesson on working with large and complex datasets is to provide a realistic demonstration of how R is used for challenging analyses, challenging because the dataset is fairly large, challenging because there are many variables requiring attention, challenging because there are missing data, challenging because certain subjects require special accommodation, challenging because selected data in the original dataset need to be put into filtered subsets, etc. Fortunately, R can accommodate these challenges, as demonstrated in this lesson and the accompanying addenda. Give special attention to the way different Boolean-type selection processes are used to address focused inquiries of importance for subjects in selected breakout groups, as opposed to large-scale analyses against the entire dataset.
Keywords: Analysis of variance (ANOVA); Association; Association plot; bagplot or bivariate boxplot; Bar plot or bar chart; Beanplot; Beeswarm plot; Big data; Boolean selection; Boxplot or box-and-whiskers plot; Box-percentile plot; Coefficient of correlation; Color gradient plot; Correlation; Correlogram; Density plot; Dotplot or dotchart; Engelmann–Hecker (EH) plot; Exploratory data analysis (EDA); Gantt chart; Graphical themes; Hexbin plot; Histogram; Interaction plot; International classification of diseases (ICD); Line chart or line graph; Long format data; Mosaic plot; Nonparametric; Normality; Parametric; Pearson’s r; Pie chart; Pirate chart; Probability; Quantile-quantile (Q-Q) plot; Regression; Scatter plot or scatter diagram; Scatterplot matrix (SPLOM); Staircase plot; Stem-and-leaf plot; Stripchart; Sunflower scatterplot; Tidyverse; Trellis graphics; Triangular plot for 3-D representation; Violin plot; Waffle chart or squared pie chart; Wide format data (search for similar items in EconPapers)
Date: 2021
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-030-62404-0_8
Ordering information: This item can be ordered from
http://www.springer.com/9783030624040
DOI: 10.1007/978-3-030-62404-0_8
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().