What is data wrangling? Intro, Motivation, Outline, Setup -- Pt. 1 Data Wrangling Introduction
Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.
Pt. 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw
- 01:44 Intro and what’s covered
Ground Rules
- 02:40 What’s a tibble
- 04:50 Use View
- 05:25 The Pipe operator:
- 07:20 What do I mean by data wrangling?
Pt. 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM
- /00:48 Goal 1 Making your data suitable for R
- /01:40 `tidyr` “Tidy” Data introduced and motivated
- /08:15 `tidyr::gather`
- /12:38 `tidyr::spread`
- /15:30 `tidyr::unite`
- /15:30 `tidyr::separate`
Pt. 3: Data manipulation tools: `dplyr` https://youtu.be/Zc_ufg4uW4U
- 00.40 setup
- /02:00 `dplyr::select`
- /03:40 `dplyr::filter`
- /05:05 `dplyr::mutate`
- /07:05 `dplyr::summarise`
- /08:30 `dplyr::arrange`
- /09:55 Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
- /11:45 `dplyr::group_by`
- /15:00 `dplyr::group_by`
Pt. 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg
Combining two datasets together
- /00.42 `dplyr::bind_cols`
- /01:27 `dplyr::bind_rows`
- /01:42 Set operations
`dplyr::union`, `dplyr::intersect`, `dplyr::set_diff`
- /02:15 joining data
`dplyr::left_join`, `dplyr::inner_join`, `dplyr::right_join`, `dplyr::full_join`,
______________________________________________________________
Cheatsheets: https://www.rstudio.com/resources/cheatsheets/
Documentation:
`tidyr` docs: tidyr.tidyverse.org/reference/
- `tidyr` vignette: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
`dplyr` docs: http://dplyr.tidyverse.org/reference/
- `dplyr` one-table vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
- `dplyr` two-table (join operations) vignette: https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html
______________________________________________________________
New York Times “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”, By STEVE LOHRAUG. 17, 2014 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
______________________________________________________________
cheatsheets
dplyr
rstudio
tibble
tidyr
tidyverse
tidyverse.org
Grammar of Data Manipulation
Data Science
Data Wrangling
Applied Statistics
Statistics
RStudio
Data Manipulation