We’re exceedingly happy to announce the release of forcats 0.5.0 on CRAN. The goal of the forcats package is to provide a suite of tools that solve common problems with factors, including changing the order of levels or the values.

This release includes improvements to several existing functions, as well as a division of fct_lump() into four new functions: fct_lump_min(), fct_lump_prop(), fct_lump_n(), and fct_lump_lowfreq(). For a complete inventory of updates in this version, please see the Change log .

You can install forcats with:

1
install.packages("forcats")

Attach the package by running:

1
library(forcats)

New features#

fct_lump() function family#

Lumping seems like a popular activity, and there are many interesting variants. Splitting fct_lump() into pieces makes it much easier for this collection to grow over time.

  • fct_lump_min() lumps levels that appear fewer than min times.
  • fct_lump_prop() lumps levels that appear fewer than prop * n times.
  • fct_lump_n() lumps all levels except for the n most frequent (or least frequent, if n < 0).
  • fct_lump_lowfreq() lumps together the least frequent levels, ensuring that "Other" is still the smallest level.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
x <- factor(rep(LETTERS[1:8], times = c(40, 10, 5, 27, 3, 1, 1, 1)))

x %>% table()
#> .
#>  A  B  C  D  E  F  G  H 
#> 40 10  5 27  3  1  1  1

x %>% fct_lump_min(5) %>% table()
#> .
#>     A     B     C     D Other 
#>    40    10     5    27     6

x %>% fct_lump_prop(0.10) %>% table()
#> .
#>     A     B     D Other 
#>    40    10    27    11

x %>% fct_lump_n(3) %>% table()
#> .
#>     A     B     D Other 
#>    40    10    27    11

x %>% fct_lump_lowfreq() %>% table()
#> .
#>     A     D Other 
#>    40    27    21

New arguments, and helpers#

fct_collapse() now has an argument, other_level, which allows a user-specified Other level. Factors are now correctly collapsed when other_level is not NULL, and makes Other the last level.

fct_reorder2() now has a helper function, first2(), which sorts .y by the first value of .x.

Acknowledgements#

A special thanks goes out to everyone who contributed to forcats during Tidyverse developer day: Kelly Bodwin , Layla Bouzoubaa , Scott Brenstuhl , Jonathan Carroll , Monica Gerber , John Goldin , Laura Gomez , Mitchell O’Hara-Wild , Riinu Pius , and Emily Robinson .

We’re extremely grateful for all 48 people who helped with this release: @808sAndBR , @adisarid , @alejandroschuler , @AmeliaMN , @AndrewKinsman , @avishaitsur , @batpigandme , @bczucz , @billdenney , @bxc147 , @cuttlefish44 , @dan-reznik , @dpprdan , @dylanjm , @GegznaV , @ghost , @gralgomez , @gtm19 , @hadley , @hongcui , @jamiefo , @jburos , @jimhester , @johngoldin , @jonocarroll , @jtr13 , @jwilliman , @jzadra , @kbodwin , @kei51e , @kyzphong , @labouz , @ledbettc , @lwjohnst86 , @martinjhnhadley , @melissakey , @mitchelloharawild , @monicagerber , @mstr3336 , @riinuots , @robinsones , @sgschreiber , @sinarueeger , @sindribaldur , @stelsemeyer , @VincentGuyader , @yimingli , and @zkamvar .