Javier Luraschi | Datasets in Reproducible Research with 'pins' | RStudio (2020)
Open source code is an essential piece in making science reproducible. Tools like 'rmarkdown' and GitHub facilitate running and sharing outcomes with colleagues and with the broad scientific community at large. However, it is less clear what tools should be used to retrieve, store and share datasets; while it is possible to make datasets part of your workflows today, it is usually hard and we are often left with manually sharing or downloading links to datasets. Not only that, but it's also hard to share or discover datasets. In this talk we will introduce for the first time the 'pins' package. A package designed to: pin, discover and share resources. Meaning that, you can use 'pins' to simplify your data science workflows by easily fetching resources from GitHub, Kaggle, CRAN and RStudio Connect. We will present a 'pin' as a generic resource that can contain tabular datasets like CSVs, unstructured data like JSON files, image archives as ZIP files and so on. This talk will be highly interactive showing you how to get started by installing 'pins' from CRAN, retrieve and cache resources, share and discover useful and fun data resources to improve and enhance your day-to-day workflows
rmarkdown
rstudio
Rstudio::conf(2020)
Javier Luraschi
RStudio
Data Science
Machine Learning
Python
Stats
Tidyverse
Data Visualization
Data Viz
Ggplot
Technology
Coding
Connect
Server Pro
Shiny
Rmarkdown
Package Manager
CRAN
Interoperability
Serious Data Science
Dplyr
Forcats
Ggplot2
Tibble
Readr
Stringr
Tidyr
Purrr
Github
Data Wrangling
Tidy Data
Odbc
Rayshader
Plumber
Blogdown
Gt
Lazy Evaluation
Tidymodels
Statistics
Debugging
Programming Education
Rstats
Open Source
Oss
Reticulate