Neal Richardson | Bigger Data With Ease Using Apache Arrow | RStudio
The Apache Arrow project enables data scientists using R, Python, and other languages to work with large datasets efficiently and with interactive speed. Arrow is so fast at some workflows that it seems to defy reality--or at least the limits of R's capabilities. This talk examines the unique characteristics of the Arrow project that enable it to redefine what is possible in R. The talk also highlights some of the latest developments in the arrow R package, including how you can query and manipulate multi-file datasets, and it presents strategies for speeding up workflows by up to 100x.
About Neal:
Currently Director of Engineering at Ursa Labs / RStudio. Previously led product and engineering at Crunch.io. Ph.D. in Political Science from the University of California, Berkeley
rstudio
RStudio
Data Science
Machine Learning
Python
Stats
Tidyverse
Data Visualization
Data Viz
Ggplot
Technology
Coding
Connect
Server Pro
Shiny
Rmarkdown
Package Manager
CRAN
Interoperability
Serious Data Science
Dplyr
Ggplot2
Tibble
Readr
Stringr
Tidyr
Purrr
Github
Data Wrangling
Tidy Data
Odbc
Rayshader
Plumber
Blogdown
Gt
Lazy Evaluation
Tidymodels
Statistics
Debugging
Forcats
Rstats
Open Source
Oss
Reticulate
Neal Richardson
Apache
Apache Arrow
Big Data