
Tomasz Kalinowski - Keras 3: Deep Learning made easy
Keras 3 is a ground-up rewrite of Keras 2 that refines the API and reintroduces multi-backend support. In this talk, learn about all the features (new and old) in Keras that make it easy to build, train, evaluate, and deploy deep learning models. Talk by Tomasz Kalinowski GitHub Repo: https://github.com/t-kalinowski/posit-conf-2024
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hello, everyone. I'm very excited to be here to tell you about Keras 3, our latest release of Keras, which is a framework that you can use to do deep learning in R.
I'm going to start high level with an introduction to Keras, and then I'll talk about what's new in Keras 3.
So deep learning is just one kind of machine learning, which is just one approach to building an AI system. And deep learning is not new. It's been around for over 50 years. And for most of that time, it was relatively niche. But today it's the dominant approach in a number of fields, including computer vision, speech recognition, and now with LLMs, natural language processing.
And I think it's worthwhile to pause and reflect and ask, why does it seem like deep learning is outpacing all these other or other ML techniques? And there are lots and lots of ways to answer this question. I want to give three answers.
The first is well articulated, I think, by this essay published a couple of years ago by Rich Sutton called The Bitter Lesson. In this, it's a short essay, you should go read it. In this essay, he reflects on the past decades of AI research and makes the observation that in the long term, because of Moore's law, it's the techniques that scale with compute that end up winning. And one of the techniques that scales the best that we know of is deep learning.
So over time, as computers have gotten faster, compute has become more available and deep learning has become relevant in more contexts. And that includes both the context of supercomputers and the context of your laptops. And I think we can safely expect this trend to continue. Computers will get faster, compute will become more available, and deep learning, we'll see deep learning become relevant in more and more contexts over time.
So that's one reason why deep learning is where it is today. Another reason is it's a general purpose technique. It's very versatile. You can easily apply it to all kinds of data, including text, image, audio, video, tables, time series, graphs. And you can apply it to all kinds of tasks, not just classification and regression, but also the many, many flavors of content generation.
And the third reason is that deep learning is forgiving and reliable. And this is where frameworks like Keras come in, because they help make it so.
And the third reason is that deep learning is forgiving and reliable. And this is where frameworks like Keras come in, because they help make it so.
What's in the Keras kit
You can think of Keras as a kit for doing deep learning, kind of like a Marble Run toy set. If you haven't played with one of these before, when you first get it and open the box, it's a little bit overwhelming. There are so many parts and some of them are truly complicated, or they seem complicated. But pretty quickly you figure out that all these parts compose with each other via a very simple interface. They just stack right on top of each other. It's plug and play.
And when you are composing these parts together, what you're doing is you're creating a path for marbles to flow through. And Keras is like this. It has a lot of parts, but they all compose with each other very nicely. And when you're composing them, you're creating a path for data to go through. And because everything works so seamlessly, the whole thing feels spi. It's fun to design your models and build them, and it's fun to feed them data and watch them learn.
So what's in the kit? Keras comes with over 100 built-in layers. These are the things that you'll use to actually build your model. It also has a lot of things that go outside the model. You have metrics to track model performance during training, loss functions and optimizers which you'll use to actually train your model, data loading utilities to help you work with data that doesn't fit in memory, and a family of operations which I'll talk about later. And all these pieces fit together in a high-level training API where you just call fit.
Hello World of deep learning
To illustrate how all these pieces fit together, I'm going to walk through the Hello World of Deep Learning, which is based on the MNIST image dataset of handwritten digits that look like this. One of the things you can do in a dataset is build a classifier. Given an image, predict what digit is it.
The way you would do this in Keras is you would start by creating an input object where you specify the shape, and then you take that input object and you pipe it through a couple of layers to get back an output object. And once you have your input and your output, you have all you need to create your model, and you have a model.
One of the things you can do with a model is print it, where you get this colorful summary table showing you what's in there. You can also plot it to see a flowchart of the path your data will take through the model. You can also plot it as a not-so-simple flowchart full of annotations.
And then once you've created your model and you're ready to train it, this is a two-step process. You call compile, where you specify the loss function, the optimizer, and the metrics. And then you call fit with your dataset, and your model starts learning.
Now this example model that I'm showing here is the simplest practical model you could make. It only has one trainable layer, it has no fancy tricks, but this remarkably simple model already trains to 92% accuracy on the MNIST image dataset. And this is what I mean when I say that deep learning is forgiving, because there are plenty of contexts I can think of where 92% accuracy is already plenty useful. And you don't have to be a deep learning expert to walk away with a model that's useful.
But of course we can get better. One of the things that Keras makes easy is experimenting with different model architectures. So let's say we insert one additional dense layer into this model. Now we get up to 97% accuracy. Create a 2D convolutional layer in front of that, and you get to 98% accuracy. Make that two 2D convolutional layers, and you get to 99% accuracy. And this is pretty close to state-of-the-art for MNIST.
If you're wondering what all this looks like when you're working interactively in the IDE, you get this colorful progress bar, and you also have a live metrics viewer where you can see your model learn in real time. And I think it's kind of fun to watch, it's a little bit like watching marbles go down a marble run.
Progressive disclosure of complexity
Okay, so that gives you a flavor for what first steps with Keras looks like. There is a lot of effort to make sure there is a smooth initial experience, and that might give you the impression that Keras is like a set of training wheels, where it's really helpful at first but will become constraining later on. But that's not the case. Keras is a framework that you can continue to use as your skills and your expertise develops. And in my opinion, where Keras truly shines is in research environments, when you're trying to do something new and custom because of how flexible the API is there. And that's because Keras ultimately has its roots in research.
The way Keras can accommodate these different user profiles is through this idea that's baked into the API called progressive disclosure of complexity, which is another way of saying there is no one true way to do anything. For any given task, Keras offers multiple ways to do it, each of them striking a different point in the balance between flexibility and exposed complexity.
So I can't illustrate all the ways that this is true in the API, the API surface is large, but to illustrate the point, I want to show just one. So let's say you want to create a custom layer, Keras comes with 100 layers, but none of them do exactly what you want, you want to make your own. The simplest way to do this is to call layer lambda and pass in a function there that takes a tensor, and you can do whatever work you want on that tensor in that function.
Another way is to use another model as a layer. So models are layers too, which means that you can compose layers both sequentially and recursively.
The next step up in complexity is to subclass the base layer class. And here you can implement some or all of the methods that are called at different points in the layer's lifecycle. And a custom layer created in this way has identical semantics to the built-in layers. And as your code base grows, you can use inheritance to help manage your code. So you can subclass not just the base layer class, but also other built-in layers and other custom layers.
And this subclassing API exists not just for layers, but for all the object types that you might work with in Keras, including training callbacks, metrics, learning rate schedules, and so on.
For R users, along this path of progressive disclosure of complexity is the fact that under the hood, Keras in R is implemented using reticulate. And reticulate is an R package that embeds Python in R. What this means is that everything available in Python is also available in R. So you can choose to use Keras in R or Keras — you can choose to use Keras from R or Keras from Python, and because of reticulate, that choice is simply a matter of preference for language syntax and development environment. You don't have to worry about one interface getting out of sync with the other, or one having more features than the other, or one being faster than the other, because when you're training your model, it's literally the same exact code that's running.
Reticulate itself has had a steady stream of updates over the past few years. There's improved error tracebacks, improved autocomplete, improved support for operators and generics, and probably most exciting of all, updated Python environment and discovery tools.
Quick aside, I've heard from multiple people tell me that they work primarily in Python, but they use reticulate to set up their Python environments, which I just think is so neat.
What's new in Keras 3
Back to Keras 3, what's new in Keras 3? The biggest thing to know is the return of multi-backend support, and that comes with a set of new features that let you write backend agnostic code.
A backend is something that Keras uses to compute gradients and to interface with a GPU. And for most of its life, up to about three years ago, Keras supported multiple backends, including TensorFlow, Theano, MXNet, and CNTKit. And then over time, those last three were retired, and three years ago, TensorFlow became the only supported backend when Keras became an official part of the TensorFlow project.
Now with Keras 3, Keras is once again a standalone package that's not part of TensorFlow. When you first load the package, you can call use-backend with the name of your backend, and the supported backends today are Jax, PyTorch, and TensorFlow. You can also use NumPy in inference-only mode, and there's some work to add MLX as a backend ongoing.
The benefits of multi-backend support are that it lets you write future-proof code. The high-level Keras API insulates you a little bit from some of the churn in the low-level framework space. You can easily switch backends at any time in a project, which means that if you encounter a bug with one backend, you can just sidestep it by switching to another. And more generally, it enables workflows where you can take advantage of the best that each backend has to offer.
So for example, you can develop your model with Torch as your backend, which gives you a nice interactive experience. And then you can switch to Jax to train it, which will give you the fastest performance typically. And then when you're getting ready to deploy, you can switch to TensorFlow, which has the most mature deployment options.
One of the things that makes this possible is the new saving and loading API that comes with a new backend agnostic file format. So say you train your model with Jax, you can then save that to a file with the .keras extension, and then in a different R session with a different backend, the TensorFlow backend, you can load that model, and you've effectively converted your Jax model to a TensorFlow model.
One reason you might want to do this is because you are getting ready to deploy. And one of the easiest ways to deploy a Keras model is to connect. Connect recently gained the ability to serve Keras 3 models using TensorFlow serving, which gives you access to a scalable concurrent serving with no R or Python runtime, with just these two function calls.
The Ops family
Another thing that comes with the multibackend support in Keras is the Ops family. This is a suite of 200 lower level functions for working with multidimensional arrays. These functions all work the same regardless of if you're working with TensorFlow, Torch or Jax tensors, if you're working in eager mode or in graph mode, and you can use them to write backend agnostic code.
You can use these anywhere where you operate with tensors, including at the top level REPL, and custom layers, custom losses, custom metrics, and so on. You can also use them as standalone layers. The Ops API surface aims to be comprehensive. It includes the NumPy API, the TFNN API, a set of linear algebra operations, image operations and more.
I think our users in particular are going to appreciate the Ops family because it provides a more ergonomic alternative to using the backend through reticulate. You get things like consistent one-based indexing, automatic coercion of arguments, and best of all, built-in R documentation with lots and lots of R examples.
Another reason why I think our users will like it is a little more abstract, but all the functions are pure, so they have no side effects. That makes for a much more consistent experience. It encourages experimentation at the REPL, and it aligns better with the mental model that R encourages.
Other new features and resources
There's a lot of other stuff that's new in Keras 3 that I don't have time to cover in detail, but I just want to mention briefly. There's the subclassing API, which I showed a little bit of earlier. There's a whole new API for building models that ingest tabular data, things like data frames. That's very nice.
There's a new API for generating random numbers inside your model that has support for usage in pure functions by passing a C generator. There's a new API for doing distributed training that now has support not just for data parallelism, but also model parallelism. There's support for quantizing models, which is important when you're dealing with very, very large models, like large language models. And there is expanded entry points for customizing training. We put in custom training steps or custom training loops.
To learn more, please visit keras3.posit.co. There you'll find three kinds of content, guides, examples, and references. Guides are long-form explanations of Keras concepts. Examples demonstrate Keras use to solve a particular task. And the reference section comprehensively documents all of the functions and arguments and classes and methods and attributes, including lots of R examples.
And finally, to learn more about deep learning, please check out the second edition of Deep Learning with R. Even though this book was written for Keras 2, it is still very much relevant for Keras 3, and it is still the best way to get started with deep learning.
Q&A
People are already, like, wondering where to get your slides. Where do I get my slides? Now, where will they get your slides? Oh, I can put them up on my GitHub. I don't have a link yet, because I haven't made that public. But I can put them up on my GitHub.
I can't make them public, but I will. There is interest.
Okay. The next one, I swear it is not my question, but does Keras play nicely with tidy models? Yes. I believe so. I don't think it's been updated for Keras 3 yet, but it will be shortly.
How much work will it be rewriting my old Keras code to Keras 3? That depends on how much custom stuff you did that's backend-specific. So if you were mostly using the high-level layer functions and the built-in losses and metrics and optimizers, then there's very little work for you to do. It might even work out of the box.
But if you were writing custom loss functions where you're using the TensorFlow API directly, you're going to have to translate that to using the ops family. And then more generally, if you're doing a lot of fancy serialization, the saving and serialization API is quite comprehensive now and very different than before.
It wouldn't be 2024 without an LLM question. Can Keras be used to fine-tune LLMs? Yes, it can. I did not include that in my talk, but there is a Keras NLP package which you can use to articulate, and there is support for either training from scratch if you have the resources, or fine-tune.
Why should a data scientist working exclusively with tabular data use deep learning through Keras instead of regression or tree-based approaches? So, you don't have to switch to deep learning. Deep learning is like duct tape, so it's not always the best solution for the tool, but more often than not, it's good enough.
Somebody asks if there are any concerns using Keras for commercial use. Not that I know of. It's used in plenty of businesses in commercial context.
Any plans for Keras 4 that didn't get implemented in Keras 3? No. Not yet. Keras 3 is out and you take a breath.

