I’m excited to announce that vroom 1.1.0 is now available on CRAN!
vroom reads rectangular data, such as comma separated (csv), tab separated (tsv) or fixed width files (fwf) into R.
It performs similar roles to functions like readr::read_csv()
, data.table::fread()
or read.csv().
But for many datasets vroom::vroom() can read them much, much faster (hence the name).
Get the latest version with:
|
|
And attach the package by running
|
|
Improvements in this release include: a hex logo, support for big integer data, improved delimiter guessing, including delimiters in specifications, and streamlined reading from standard input.
See the change log for a full list of changes and bug fixes in this version.
Hex logo#
Thanks to Allison Horst we now have an awesome hex logo for vroom!
Big integer support#
R’s standard integers are stored in 32 bits of binary data, which means that the largest value they can store is 2,147,483,647 (2^31 - 1).
R implicitly converts integers for most operations with doubles to 64-bit floating point values, which is why you may not have noticed this limitation before.
|
|
However, even 64-bit floating point values can only store consecutive integers up to 9,007,199,254,740,992 (2^53) without losing precision.
You can observe this because if you try adding 1 to this number you will get the same number.
|
|
To store consecutive integers bigger than this you need to use a 64-bit integer type. R does not have native support for 64-bit integers, however the bit64 package provides support for them. Because these integers are so large, they rarely occur in real world data, however they can often be obtained from generated data, such as database identifiers.
vroom 1.1.0 now supports reading these big integers into the integer64 type provided by bit64 with a new col_big_integer() column type (shortcut ‘I’).
|
|
Improved delimiter guessing#
The code to guess delimiters has been rewritten, which should make it more robust to most inputs. Previous versions of vroom would fall back to using a newline delimiter if a delimiter could not be guessed. vroom 1.1.0 instead throws an error.
|
|
Delimiters in the specification#
vroom now includes the delimiter in the specification object, which means you no longer have to separately provide the delimiter if you are using an existing specification.
|
|
Reading from standard input#
vroom makes it straightforward to read from the C standard input, like you would do when calling R from the terminal command line.
Simply use stdin() as your input. Let’s say you want to take the first few lines the mtcars file and find the average horsepower.
|
|
Acknowledgements#
This release also contains a number of bug fixes and improvements which should make it more robust than previous releases. See the change log for full details.
A big thanks to all contributors of code, issues and documentation to this release, including many who helped out at the tidyverse developer day in Toulouse, France!
@2005m , @atomman , @batpigandme , @blairj09 , @Chris-M-P , @chsafouane , @CriscelyLP , @DyfanJones , @ecoquant , @edzer , @ericbrownaustin , @estroger34 , @frm1789 , @georgevbsantiago , @guiastrennec , @hadley , @HenrikBengtsson , @henry090 , @jaapwalhout , @jimhester , @jonaszierer , @kiernann , @martindut , @meta00 , @mgirlich , @mllg , @osiris08 , @Plebejer , @R3myG , @randomgambit , @sanromd , @Shians , @stephen-hayne , @vjcitn , @wlattner , and @xiaodaigh .
