We’re tickled pink to announce the release of version 0.8.0 of dplyr
, the grammar of data manipulation in the tidyverse.
This is a major update that has kept us busy for almost a year. We take the coincidence of a Valentine’s day release as a sign
of continuous ❤️ for dplyr’s approach to tidy data manipulation.
Important changes are discussed in detail in the pre-release post , we are grateful to members of the community for their feedback in the last couple of months, this has been tremendously useful in making the release process smoother.
The bulk of the changes are internal, and part of an ongoing effort to make the codebase more robust and less surprising. This is an investment that will continue to pay off for years, and serve as a foundation for more innovations in the future.
For a comprehensive list of changes, please see the NEWS for the 0.8.0 release, the sections below discusses the main changes.
Group hug#
Grouping has always been at the center of what dplyr is about, this release expands on the
existing group_by() with a set of experimental functions with a variety of
perspectives on the notion of grouping.
We believe they offer new unique possibilities, but we welcome community feedback and use cases
before we put a 💍 on them. Let’s illustrate them with a subset from the
well-known gapminder data.
|
|
- group_nest()
is similar to
tidyr::nest(), but focuses on the variables to nest by instead of the nested columns.
|
|
- group_split()
is a tidy version
of
base::split(). In particular, it respects agroup_by()-like grouping specification, and refuses to name its result.
|
|
- group_map() and group_walk() offer a way to iterate on groups of a grouped data frame.
|
|
- group_data() , group_rows() , and group_keys() expose the grouping information, that has been restructured in a tibble.
|
|
- group_by()
gains a
.dropargument which you can set toFALSEto respect empty groups associated with factors (more on this below).
Give factors some love#
The internal grouping algorithm has been redesigned to make it possible to
better respect factor levels and empty groups. To limit the disruption, we have not made
this the default behaviour. To keep empty groups,
you have to set group_by()
’s
.drop argument to FALSE.
This can make data manipulation more predictable and reliable, because when factors are involved, the groups are based on the levels of the factors, rather than which levels have data points.
Let’s illustrate this with our favourite flowers 💐,
and a function, species_count(), that counts the number of each species after
a filter(), and structures it as a tibble with one column per species.
|
|
Because we use .drop = FALSE we get one column per level of the factor,
even when there’s no data associated with a level:
|
|
These 0 instead of missing columns make the experience easier when you want to combine multiple results:
|
|
Thanks#
Thanks to all contributors for this release.
@abouf , @adisarid , @adrfantini , @aetiologicCanada , @afdta , @albertomv83 , @alistaire47 , @aloes2512 , @andresimi , @antaldaniel , @AnthonyEbert , @ArtemSokolov , @AshesITR , @bakaburg1 , @batpigandme , @bbachrach , @bbolker , @behrman , @BenjaminLouis , @bifouba , @billdenney , @bnicenboim , @BobMuenchen , @brooke-watson , @CarolineBarret , @cbailiss , @CerebralMastication , @cfhammill , @cfry-propeller , @choisy , @ChrisBeeley , @chrsigg , @clauswilke , @ClaytonJY , @colearendt , @ColinFay , @coolbutuseless , @Copepoda , @cpsievert , @dah33 , @damianooldoni , @DanChaltiel , @danyal123 , @DavisVaughan , @Demetrio92 , @dewoller , @dfalbel , @DiogoFerrari , @dirkschumacher , @dmenne , @dmvianna , @dongzhuoer , @earowang , @echasnovski , @eddelbuettel , @EdwinTh , @eijoac , @elbersb , @Eli-Berkow , @EmilHvitfeldt , @epetrovski , @erblast , @etienne-s , @foundinblank , @FrancoisGuillem , @geotheory , @ggrothendieck , @GoldbergData , @gowerc , @grayskripko , @GrimTrigger88 , @grizzthepro64 , @hadley , @hafen , @heavywatal , @helix123 , @henrikmidtiby , @hpeaker , @htc502 , @hughjonesd , @ignacio82 , @igoldin2u , @igordot , @ilarischeinin , @Ilia-Kosenkov , @IndrajeetPatil , @ipofanes , @jasonmhoule , @jayhesselberth , @jennybc , @jepusto , @jflynn264 , @jialu512 , @JiaxiangBU , @jimhester , @jkylearmstrongibx , @jnolis , @JohnMount , @jonkeane , @jonthegeek , @jschelbert , @jsekamane , @jtelleria , @kendonB , @kevinykuo , @krlmlr , @langbe , @ldecicco-USGS , @leungi , @libbieweimer , @lionel- , @liz-is , @lloven , @ltrgoddard , @luccastermans , @maicel1978 , @Make42 , @MalditoBarbudo , @markdly , @markvanderloo , @mattbk , @maxheld83 , @melissakey , @mem48 , @mgirlich , @mikmart , @MilesMcBain , @minhsphuc12 , @mkoohafkan , @momeara , @moodymudskipper , @move[bot] , @nealpsmith , @NightWinkle , @o1iv3r , @PascalKieslich , @petermeissner , @peterzsohar , @philstraforelli , @PMassicotte , @PPICARDO , @privefl , @prokulski , @quartin , @rabutler-usbr , @ramongallego , @randomgambit , @rappster , @rensa , @reshmamena , @richard987 , @richierocks , @RickPack , @riship2009 , @RobertMyles , @romainfrancois , @rontomer , @roumail , @rozsoma , @rundel , @rupesh2017 , @s-fleck , @S-UP , @salmansyed0709 , @schloerke , @seasmith , @sharlagelfand , @shizidushu , @simon-anasta , @skaltman , @skylarhopkins , @sowla , @statsccpr , @stenhaug , @streamline55 , @stuartE9 , @stufield , @suzanbaert , @sverchkov , @thackl , @the-knife , @ThiAmm , @thisisnic , @tinyheero , @tmelconian , @tobadia , @tonyelhabr , @torbjorn , @trueNico , @tungmilan , @TylerGrantSmith , @ukkonen , @vincentanutama , @vnijs , @wanfahmi , @waynelapierre , @wch , @wdenton , @wgrundlingh , @wmayner , @wolski , @yiqinfu , @yutannihilation , @Zanidean , @Zedseayou , @zslajchrt , @zx8754 , and @zzygyx9119 .
