Sigrid Keydana | Why TensorFlow eager execution matters | RStudio (2019)

Transcript#

This transcript was generated automatically and may contain errors.

Hi, so like when I looked again at my slides, like before this talk, I was wondering, okay, so I say why TensorFlow eager execution matters, but I don't really start with what even is TensorFlow is eager execution, right?

So what is TensorFlow eager execution? So normally when we write Keras models, this gets compiled into a static TensorFlow graph, which makes for, can make for pretty unflexible code. And now eager execution means that the code gets executed dynamically, and this will be the default operating mode of the new TensorFlow 2.0, which is going to come in, well, I don't know, let's say two months or so.

So end of this month, we're going to have TensorFlow 1.13, and then the Google guys are already extremely busy with preparing us for 2.0. And the cool thing is that this TensorFlow eager execution, we can already do it in R since about last summer, so we've been preparing for it, so we're ready. And now in this talk, it's about like I want to show you why it's so cool that we can do that.

But after eight epochs, I get pretty nice-looking penguins. I think they perhaps even look better than the input. So it actually works.

Variational autoencoders

In this case, I was thinking I've generated these penguins, and so I want to generate some snowflakes for them. Unfortunately, it's really hard to get training data for snowflakes. I even emailed a book author who produced some nice photos of snowflakes, but so I had to generate my own training data with a method which is like using cellular automata, okay? Doesn't really matter. This is my input data.

And then, again, I'm constructing my model, and now you will see the same principles like before. I have my encoder model, which is a standalone model. I can define like that. The decoder model, I can define like that. And now with the variation autoencoders, the action really is in the loss function, because on the one hand, I want to reconstruct the input. On the other hand, I have a regularization term, which is different between different types of autoencoders. Here I have one specific loss can be a different one. But still, this ends up being like five lines of code or so, and I'm all set up.

And after I've set it up like that, again, I'm just doing the same kind of iteration over the code. So first time, perhaps you haven't seen that before, so it looks like strange, but it's gonna end up all the same always. And now let's see the snowflakes. So I have here, they don't look much worse than the input, actually.

Benefits of eager execution

So these were examples of models which are quite hard to code if you're using static execution, as we have been doing until now. But this gets a lot easier with eager execution. But there's more to it. More stuff, but important stuff when doing the coding. For example, what's really hard is to see what's going on in deep learning.

If I think of GitHub issues we tend to get, what may happen is, like, people want to code their own loss function or their own metric. You can do that in Keras, okay. But then you get a shape error somewhere in there. So the shapes don't match, can happen. Right now, people tend to get stuck because they just don't know how to debug this. And with eager execution, what you can do is you can just print it out. Like printf debugging. So it's, I mean, you're gonna remove it later, right? So why not do it?

That's one thing. Another thing is you can have more modular code, because you can isolate logic in small custom models. And then also, when sometimes we are getting, we want to have architectures where you can't just chain the layers one after each other. You need some interleaving of logic. I'm gonna show that. So these are three examples for how eager execution makes things easier, actually.

Now easy debugging, for example. Here you have this loop again where I'm doing the recording of the actions. And I'm gonna calculate the loss and then doing the back prop. And now really what happens when you actually code this, in practice, you're quite often you may get shape not matching shapes in deep learning. And then what I can do here is I can just really print it out. And I cannot just print out the shape. I can print out the actual images. So when I do this print generated images, I really see, like, float tensors, yeah? Like a 4D array of floats. And this is, like, invaluable for debugging, actually, when you're writing that code.

Also here with the gradients. So one thing if your network just doesn't train, yeah? One thing that can happen is that the gradients just vanish or get too big, for example. And so what I can just do, I can print them out. And there I have it.

Another thing, modular code. So what we often see in longer deep learning examples is, like, I have lots of layers. For example, the ConvNet. And all these convolutional layers, they want their parameters specified. And so it gets long and lots of duplicated stuff. And here what I can do, I can just take part of the logic and put it in a custom model for itself. Here what we often do is either upsampling or downsampling. So convolution or deconvolution. And a typical, for example, downsampling module could look like a Conv layer and some batch normalization and dropout, yeah? And perhaps another Conv layer, another batch normalization, another dropout. And now I don't have to type that again and again and again. I can just define it as an own submodule, so to say.

And then I can go on and use that. If I go back to my generator, this was from the variational autoencoder. This does a bunch of downsamplings and a bunch of upsamplings again. And here I use my custom module, which I defined on the previous slide. So code gets a lot more readable like that, too.

Attention mechanisms

And then, yeah, a nice example for this interleaving logic is, so like one or two years ago, actually since 2013, so a bit longer, a mechanism was described where we have in a network, where the network learns where to direct its attention. So let's say I have this task of describing image captions or learning image captions. So the network is shown pictures and is asked to generate a description for it. Like the bird flies over the lake or something, yeah?

And the architecture is I have a convent to extract the features, yeah? Normal image processing. And then I have an RNN, which is supposed to generate the caption. And now the idea is that as the RNN goes on generating this caption, so one word after another, another place in the image is important, right? Like first it's the bird and then it's the lake. So the network shifts its attention from around different locations in the image.

And this really you can code in static TensorFlow, but you can't code in straightforward Keras. So and it gets a bit nasty in TensorFlow. But now with eager execution, what you can just do is you have your decoder, which is the RNN, which is going to generate that sentence. And as this RNN generates every word, it's going to call the attention module to direct its attention. Things like that have really been really difficult to do without that.

Documentation and what's next

And now, okay. First documentation. So we already have lots of examples for that on the blog. So these are all links to blog articles using eager execution for things like image-to-image translation, image captioning, newer style transfer, and newer machine translation. And of course, we also have documentation how to construct an eager execution model.

Now finally, next up, so to say, we already have one article getting started with TensorFlow probability. TensorFlow probability is a separate module of TensorFlow, which is integrated with TensorFlow, which can run distributed on GPU and cover things like from on the bottom level distributions going up over probabilistic network layers until variational inference, MCMC, and things like that. And that also, we can, this goes together very, very nicely with eager execution. And with the same benefits, I can print out stuff and immediately see what's going on. And we're going to cover that in quite some detail in the nearer future.

Q&A

Thanks a lot for the great talk. I wonder if the TensorFlow execution also integrates into TensorBoard so you can have maybe real-time TensorBoard chart monitoring.

So that for sure will work together because this TensorBoard is such a central element of the whole, let's say, TensorFlow environment that I'm pretty sure the Google guys will take care that it works fine.

How helpful would be the eager execution for Keras users? Would it be useful for debugging or? Absolutely. That's an excellent question. Because right now, when we say Keras, we can mean two things, right? We can mean the standalone Keras or we can mean the TF$Keras or TF.Keras implementation. And these examples, at the moment, they are presupposed that you use that TF.Keras implementation, which is, as of today, is not the default implementation we're using, but you can already use it. And probably we're going to switch to that as the default implementation. And you can do all of what I said you can do with that TF.Keras implementation.

Sigrid Keydana | Why TensorFlow eager execution matters | RStudio (2019)

Transcript#

Deep learning basics and generative models

How GANs work

Coding GANs with eager execution

Variational autoencoders

Benefits of eager execution

Attention mechanisms

Documentation and what's next

Q&A

Featured software#

rstudio

tensorflow