Resources

Daniel Falbel | What's new in TensorFlow for R | RStudio (2020)

TensorFlow is the most popular open-source platform for machine learning and it's ecosystem is evolving incredibly fast. In this talk we will explore what's new in TensorFlow 2.0 as well as how to build data pre-processing pipelines using the tfdatasets package and how to use pre-trained models with tfhub

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello, I'm Daniel, and today I'm going to talk about what's new in TensorFlow for R.

What is TensorFlow?

First, a quick recap on what's TensorFlow. Well, TensorFlow is an open source platform for machine learning. It's especially useful for deep learning. So, it has fast implementations of most common operations in deep learning, like convolutions, for example. And it's very efficient for both CPUs, GPUs, and even TPUs, which is the Google hardware for deep learning. It provides automatic differentiation, which is also very useful for deep learning. And it's really production-ready. So, inside the TensorFlow ecosystem, there are multiple ways to deploy your models to even mobile devices, cloud platforms, and all of this.

Well, TensorFlow is mostly a Python project with a lot of things in the ecosystem. And I'm going to talk about the TensorFlow for R package. And we are spread across multiple R packages.

There is the TensorFlow R package, which provides mostly the basic access to the TensorFlow model and installation functions and things like this. There is the Keras R package, which wraps the tf.keras model, which is the recommended way to start using TensorFlow. It's like a higher-level API to use TensorFlow. And tfdatasets, which wraps the tf.data model, which uses it to load and preprocess that data for deep learning. And we have a lot of other packages in the ecosystem, like tf.hub, tf.probability, tf.js, tf.autograph, and tf.auto.keras, which I'm going to talk about.

TensorFlow 2.0 and eager execution

So, what's new? There's a lot of packages, and things are moving fast. So, we have support for TensorFlow 2.0. So, before 2.0, TensorFlow was working like you had to define all your graph computation, and then you executed it. And this is a programming model that is hard to think about, usually. And with 2.0, it's much easier because of the eager execution. So, before, you needed to write code like this. So, you write all computation, and then you executed it. And now, it works just like normal R arrays.

Other changes in 2.0 are like API cleanup and things like this. There is this new package by Thomas Kalinowski, which is called tf.autograph, which allows us to write conditional loops. And like if, and while, and for statements. Just like an R if statement, which is much easier. And this is very efficient, also, for TensorFlow. So, before, you needed to write code like this, with all functional conditionals. And now, you can just write an if statement, and it will be converted to TensorFlow code.

Feature spec interface in tfdatasets

In tfdatasets, which allows to load and preprocess that data, we have this new feature spec interface. And this is useful for tabular data, and for using tabular data in deep learning models. So, it works like, it has a similar interface to like recipes, the tidy models package. And so, you just define what transformations you want to do on your tabular data. And then, you fit this specification, which means it will find all possible vocabulary and like scaling, or find the normalizing constants for your numeric variables, for example. And then, you can just use this layer dense features function in Keras. And this will make all the transformations directly on the TensorFlow graph. So, it's much easier to deploy this kind of models.

Next, we have some minor changes in tfdatasets. Like, if you ever use a data set map, you needed to write a function to like per map. And now, you can use the per style lambda functions, which like simplifies a bit the code. And you can also pass, if you used tfdatasets before, you needed to use the make iterator one-shot function to create an iterator for your TensorFlow data set. And now, you can pass this directly to Keras, and it will just work.

tf.hub and pre-trained models

Another cool new package is the tf.hub package, which allows us to use any pre-trained models from this website here, tf.hub.dev. And so, people train their models in large data sets, and provide them the pre-trained models in a useful format. And then, you can just use it as a new Keras layer. So, you have the URL for a model. In this case, it's the mobile net model. And you have this layer hub function that just takes the model URL. And then, so, this model is taking an image and returning a feature vector representing this image. And you can just plug a dense layer, for example, to make a classification using these pre-trained models. It also includes models for text and video.

tf.hub also provides recipes integration, the tidy models recipes. So, if you are familiar with recipes, you have all these step functions. And there's a new step, pre-trained text embedding. And you can just plug a pre-trained text model, for example. For example, here, it's a pre-trained model in the Google News data set, which is a very large data set of news. And it will convert raw text to a feature vector that you can just use in your machine learning model. And then, you can feed logistic regression, for example, using this feature vector.

New Keras text preprocessing layers

Okay, in Keras, we have these new text pre-processing layers. Well, it starts in tf2.0, 2.1, sorry. And it changes. Before, the Keras pre-processing functions are all based in SciPy and NumPy. So, in order to use the functions in Keras, you needed also to have a Python... To deploy models that use a text pre-processing function in Keras, you needed to have a Python runtime. And now, they are built in the TensorFlow graph. So, you can build models using this layer text vectorization, and the deployment will be much easier. So, it works just like a normal Keras layer. You define what are the main parameters. And there's a new step called adapt, which will, for example, find all different words in your text data. Or it will find the maximum length of each text and things like this. And then, you can just plug this layer in your Keras model. So, your model will input strings, and you just plug like a normal Keras model.

tf.probability, AutoKeras, and TFDS

Another cool new package by Seagrid is tf.probability, which provides a lot of statistical computation and probabilistic functions. It's built on top of TensorFlow, so it provides GPU and CPU and even TPU implementation for most things, which is great and fast. And I want to show you just this layer distribution function that you can plug in your Keras model as an usual Keras layer. And instead of predicting a single value for each observation, you can, for example, predict a normal distribution for each observation. So, you can calculate standard deviation and stuff. So, this opens a great scope for deep learning.

And instead of predicting a single value for each observation, you can, for example, predict a normal distribution for each observation. So, you can calculate standard deviation and stuff. So, this opens a great scope for deep learning.

There is AutoKeras, a package by Juan Cruz. And it interfaces R to the AutoKeras package in Python, which uses AutoML techniques to build machine learning models. So, instead of defining all your Keras layers and how they are connected, you can just use this model image classifier and how many different models you want to try. And it will try to get a good model for your dataset.

There is TFDS, a new package, which is very experimental yet, but it allows you to load public datasets in the TensorFlow datasets format. Which, for example, you can load ImageNet using TFDS without touching all the, I don't know, 100 gigabytes of images and how to download them and how to preprocess and everything. So, it's much easier to just test your machine learning model. And it provides this split API that you can say, like, I want to split my data in training, validation, and test that directly. So, it's a nice package when you are learning new things, when you are trying your deep learning model.

Model packages and community contributions

We are also providing some model packages, which, like, we took some commonly used deep learning models and packaged R wrappers for them. For example, the GP2 package by Javier Luraski. And the GP2 model is a deep learning model by OpenAI, which takes, like, a prompt string. And it can, like, complete this string with a text that really makes sense, and it's pretty incredible.

We also implemented some deep learning models using raw Keras layers, just for... So, it's easier to learn, and you can see advanced modeling code in Keras. So, there's UNet, which is an image segmentation model. And DenseNet, which is a convolutional neural network architecture that was famous, like, maybe two years ago. And deep learning is very fast. And there are also community-contributed models, like the RBERT by Jonathan and John Harmon, which provides an implementation of the BERT model, which is a Google model for text embedding.

We also have the TensorFlow for Art blog, which Sigrid writes many, many articles showing the state-of-the-art of TensorFlow and deep learning with very detailed explanations.

Q&A

So, one of the questions on Slido is, classification and regression problems are very accessible via R's interface to TensorFlow. How about survival problems, such as time-to-event outcomes?

Usually, deep learning is very flexible, so these time-to-event models can be modeled by just changing your loss function. So, you can cover this with deep learning, with the Keras package. There's a blog post by Sigrid. Actually, that would not be deep learning, but that would use TensorFlow probability or the RREP IDF probability, and that would be Monte Carlo modeling to do that. We have a blog post. I think it's called, like, oh, how is it called? Anyone know? Censored data something. Yeah, censored data, right? You wrote this blog post. Yeah, and this uses TensorFlow probability, so not deep learning, but also cool stuff.

What are our studio's plans to support Torch or PyTorch? I don't know. We have worked. I have experimented with Torch, like, a year ago with the C++ API, but I'm not sure if we are going to move forward Torch for now.

Well, someone wants to know, what are you working on right now? Yeah. TensorFlow is a large ecosystem. Like, there's always a lot of stuff that we want to work on. Like, there's this dopamine library for reinforcement learning, which is something we would like to work, and there's also a lot of work with DF probability still, so, yeah.

All right, one more. Is the plan to track the Python interface to TensorFlow, or will the R extras mean that Python users won't recognize R-based TensorFlow code? Yeah. Like, Keras, our package, tries to keep track of all Python changes. For example, these new text preprocessing layers, and there will be image preprocessing layers and all this kind of stuff. We will try to always keep track of the Keras API. But we like to add, like, the R way of doing things also, like the feature spec interface, which doesn't exist in the Python implementation.