We are happy to announce that luz version 0.3.0 is now on CRAN. This
release brings a few improvements to the learning rate finder
first contributed by Chris
McMaster
. As we didn’t have a
0.2.0 release post, we will also highlight a few improvements that
date back to that version.
What’s luz?#
Since it is relatively new
package
, we are
starting this blog post with a quick recap of how luz works. If you
already know what luz is, feel free to move on to the next section.
luz is a high-level API for torch that aims to encapsulate the training
loop into a set of reusable pieces of code. It reduces the boilerplate
required to train a model with torch, avoids the error-prone
zero_grad() - backward() - step() sequence of calls, and also
simplifies the process of moving data and models between CPUs and GPUs.
With luz you can take your torch nn_module(), for example the
two-layer perceptron defined below:
|
|
and fit it to a specified dataset like so:
|
|
luz will automatically train your model on the GPU if it’s available,
display a nice progress bar during training, and handle logging of metrics,
all while making sure evaluation on validation data is performed in the correct way
(e.g., disabling dropout).
luz can be extended in many different layers of abstraction, so you can
improve your knowledge gradually, as you need more advanced features in your
project. For example, you can implement custom
metrics
,
callbacks
,
or even customize the internal training
loop
.
To learn about luz, read the getting
started
section on the website, and browse the examples
gallery
.
What’s new in luz?#
Learning rate finder#
In deep learning, finding a good learning rate is essential to be able to fit your model. If it’s too low, you will need too many iterations for your loss to converge, and that might be impractical if your model takes too long to run. If it’s too high, the loss can explode and you might never be able to arrive at a minimum.
The lr_finder() function implements the algorithm detailed in Cyclical Learning Rates for
Training Neural Networks
(Smith 2015) popularized in the FastAI framework (Howard and Gugger 2020). It
takes an nn_module() and some data to produce a data frame with the
losses and the learning rate at each step.
|
|
You can use the built-in plot method to display the exact results, along with an exponentially smoothed value of the loss.
|
|
If you want to learn how to interpret the results of this plot and learn
more about the methodology read the learning rate finder
article
on the
luz website.
Data handling#
In the first release of luz, the only kind of object that was allowed to
be used as input data to fit was a torch dataloader(). As of version
0.2.0, luz also support’s R matrices/arrays (or nested lists of them) as
input data, as well as torch dataset()s.
Supporting low level abstractions like dataloader() as input data is
important, as with them the user has full control over how input
data is loaded. For example, you can create parallel dataloaders,
change how shuffling is done, and more. However, having to manually
define the dataloader seems unnecessarily tedious when you don’t need to
customize any of this.
Another small improvement from version 0.2.0, inspired by Keras, is that
you can pass a value between 0 and 1 to fit’s valid_data parameter, and luz will
take a random sample of that proportion from the training set, to be used for
validation data.
Read more about this in the documentation of the
fit()
function.
New callbacks#
In recent releases, new built-in callbacks were added to luz:
luz_callback_gradient_clip(): Helps avoiding loss divergence by clipping large gradients.luz_callback_keep_best_model(): Each epoch, if there’s improvement in the monitored metric, we serialize the model weights to a temporary file. When training is done, we reload weights from the best model.luz_callback_mixup(): Implementation of ‘mixup: Beyond Empirical Risk Minimization’ (Zhang et al. 2017). Mixup is a nice data augmentation technique that helps improving model consistency and overall performance.
Final remarks#
You can see the full changelog available here .
In this post we would also like to thank:
-
@jonthegeek for valuable improvements in the
luzgetting-started guides. -
@mattwarkentin for many good ideas, improvements and bug fixes.
-
@cmcmaster1 for the initial implementation of the learning rate finder and other bug fixes.
-
@skeydan for the implementation of the Mixup callback and improvements in the learning rate finder.
Thank you!
Howard, Jeremy, and Sylvain Gugger. 2020. “Fastai: A Layered API for Deep Learning.” Information 11 (2): 108. https://doi.org/10.3390/info11020108 .
Smith, Leslie N. 2015. Cyclical Learning Rates for Training Neural Networks. https://doi.org/10.48550/ARXIV.1506.01186 .
Zhang, Hongyi, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2017. Mixup: Beyond Empirical Risk Minimization. https://doi.org/10.48550/ARXIV.1710.09412 .
