We’re delighted to announce the release of three new tidymodels packages. These are “parsnip-adjacent” packages that add new models to the tidymodels framework.
baguette#
This package contains basic functions and parsnip wrappers for bagging (aka bootstrap aggregating
) ensemble models. Right now, there are parsnip wrappers called bag_tree() and bag_mars() although more are planned, especially for rule-based models.
One nice feature of this package is that the resulting model objects are smaller than they would normally be. Two separate operations are used to do this:
-
The butcher package is used to remove object elements that are not crucial to using the models. For example, some models contain copies of the training set or model residuals when created. These are removed so that space is saved.
-
For ensembles whose base models use a formula method, there is is a built-in redundancy because each model has an identical
termsobject. However, each one of these takes up separate space in memory and can be quite large when there are many predictors. baguette fixes this by replacing eachtermsobject with the object from the first model in the ensemble. Since the othertermsobjects are not modified, we get the same functional capabilities using far less memory to save the ensemble. A similar trick is used for the resampling method sinmodelrandrsample.
The models also return aggregated variable importance scores.
Here’s an example:
|
|
poissonreg#
The parsnip package has methods for linear, logistic, and multinomial models. poissonreg extends this to data where the outcome is a count. There are engines for glm, rstanarm, glmnet, hurdle, and zeroinfl. The latter two enable zero-inflated Poisson models from the pscl
package.
Here is an example using a log-linear model for analyzing a three dimensional contingency table using the data from Agresti (2007, Table 7.6):
|
|
One interesting thing about the zero-inflated Poisson models is that there can be different predictors for the usual linear predictor as well as others for the probability of a zero count (see Zeileis et al (2008) for more details). For example:
|
|
plsmod#
This package has parsnip methods for Partial Least Squares (PLS) regression and classification models based on the work in the Bioconductor mixOmics package. This package facilitates ordinary PLS models as well as sparse versions. Additionally, it can also be used for multivariate models.
Let’s take the meats data from the modeldata package. Spectroscopy was used to estimate the percentage of protein, fat, and water from different meats. The predictors are a set of 100 highly correlated spectra values that would come from an instrument. The model can be used to estimate the three percentages simultaneously:
|
|
This model used 5 PLS components for each of the outcomes. The use of num_terms enables effect sparsity where the 20 most influential predictors (out of 100) are used for each of the 5 PLS components. Different predictors can be used for each component. While this is not feature selection, it does offer the possibility of simpler models than ordinary PLS techniques.
Other notes#
Each of these models come fully enables to be used with the tune package; their model parameters can be optimized for performance.
There are one or two other parsnip-adjacent packages that are around the corner. One is for mixed- and hierarchical models and another is for rule-based machine learning models (e.g. cubist, RuleFit, etc.) currently on GitHub in the rules repo .

