Resources

Sigrid Keydana | Why TensorFlow eager execution matters | RStudio (2019)

In current deep learning with Keras and TensorFlow, when you've mastered the basics and are ready to dive into more involved applications (such as generative networks, sequence-to-sequence or attention mechanisms), you may find that surprisingly, the learning curve doesn't get much flatter. This is largely due to restrictions imposed by TensorFlow's traditional static graph paradigm. With TensorFlow Eager Execution, available since summer and announced to be the default mode in the upcoming major release, model architectures become more flexible, readable, composable, and last not least, debuggable. In this session, we'll see how with Eager, we can code sophisticated architectures like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) in a straightforward way. VIEW MATERIALS https://github.com/skeydan/rstudio_conf_2019_eager_execution

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, so like when I looked again at my slides, like before this talk, I was wondering, okay, so I say why TensorFlow eager execution matters, but I don't really start with what even is TensorFlow is eager execution, right?

So what is TensorFlow eager execution? So normally when we write Keras models, this gets compiled into a static TensorFlow graph, which makes for, can make for pretty unflexible code. And now eager execution means that the code gets executed dynamically, and this will be the default operating mode of the new TensorFlow 2.0, which is going to come in, well, I don't know, let's say two months or so.

So end of this month, we're going to have TensorFlow 1.13, and then the Google guys are already extremely busy with preparing us for 2.0. And the cool thing is that this TensorFlow eager execution, we can already do it in R since about last summer, so we've been preparing for it, so we're ready. And now in this talk, it's about like I want to show you why it's so cool that we can do that.

Deep learning basics and generative models

Okay. So I don't know how many of you have been to the deep learning workshop yesterday and the day before, and perhaps what you're thinking, having been to that workshop, is so deep learning is easy, right?

Because yeah, what is deep learning? It's like we're going to say, we're going to differentiate between these two cute creatures. Do we have a red panda or a giant panda? Do we have a cat or a dog? Do we have a boy or a girl? This is the standard task we see when we look at blog posts or whatever in the area of deep learning. And actually, yeah, that is pretty easy.

So we have the Keras sequential API. So if you were in the workshop, you've seen that a thousand times how we code that. We define our model, we compile it, and then we run the training.

So straightforward, right? Even if we want to do a bit more complex things, so what's always a popular question is, who do I look like? Actually, I tried that on a site called c-labs-like.me, which is by Microsoft. And it turns out, it seems like I look mostly like this actress there and like three male guys. So whatever.

So this is kind of a complex question, but still, if we want to code it, it's pretty doable. We have the Keras functional API, which lets us specify several inputs, several outputs, and we can do that.

But after we leave straightforward classification and regression problems, we may run into more complex code. For example, what's really all the hype nowadays is generative models. Perhaps you see that, I have seen that on Twitter, like big GAN. So you see faces in quite high resolution nowadays, which look like real people, yeah?

And the main models for this are either generative adversarial networks, GANs, or variational autoencoders. So if you've been in the distributed session, I don't know if he's here, but Kevin, he loves GANs. I love VAEs. So this is cool, cool stuff.

And what I want to do now, or what I wanted to do, normally when we generate stuff, what you see what's used is MNIST, these digits, zero to nine, because it's just the easiest thing you can do. But what I was trying here, so I want to generate penguins. And there's a cool dataset by Google, Google Quick Draw. We have different categories of images, and this is the input for my network, so I'm going to try to generate penguins.

How GANs work

Now, generative adversarial network is really pretty cool thing. It's game theoretic in a way. It's based on two antagonistic actors. So I have two agents in my drama. I have a generator who, based on noise, is going to generate real-looking images. So it's really getting random data, and then I run a network over it, and I get something real-looking.

And on the other hand, of course, I need someone who's going to judge if the result is good or not. This is the discriminator, and the discriminator is trained to tell apart real images and generated images. And so these two are going to fight each other, and each is going to get better every time. Because as the discriminator gets better discerning between fake and real images, the generator is going to generate better images.

So I think the concept for us humans is pretty graspable, right? It makes sense. Now, if we want to code it. So in that static graph way, which we have been doing until now, this is really not so easy because I need to define different models, and then at every step, I need to think about what actually I'm training and what I need to freeze weights. And actually, the code is pretty complicated. I wrote it down here as a kind of list of rules, but really, so if you ask me now to live code it, I'm probably just going to look it up in JJ's book, right?

Coding GANs with eager execution

So can we do that in an easier way? Yeah. We can. With eager execution, we can actually code in a way which is more like the way we think of things conceptually.

And there's one thing I need to introduce, which applies to eager execution as well as static execution. We can have Keras custom models in our, let's say, since last summer, approximately. And here you see the way we define them. We give them a set of layers, and then we define a call method, and there we say how the layers should be chained one after the other. You can use that with static execution too, but it's really central in eager execution.

And now, when I do this, what I really can do is I can, like the drama which is actually going on, I can write my code like that. So first I have my generator, so the guy who's going to try to produce the fake images, and I define it as such a custom model. So I set it up.

Then I go ahead and define my discriminator also as such a custom model, has its own logic. And now, again, in this antagonistic setting, both have different objectives. We always have this loss function in deep learning, and here each one has their own loss function, which you see is coded here.

And after I did that, I can already start the action. I'm going to iterate over the number of epochs, and then I'm going to iterate over the dataset, so I always have these two nested loops, and then I execute the action, and when you will look at code in this execution, it's always going to look like this, this way. So I'm iterating. Then I'm going to create this gradient tape, which is here recording what the actors are doing. I call the actors, let them do their action, I calculate their losses, and I'm going to do backprop.

It's always such a loop like this, and it pretty much matches how we are picturing the action mentally. And what will happen if I do that? Well, we can take a look here first. This is what I get after first epoch. Looks pretty not random, but not so good, really. But after eight epochs, I get pretty nice-looking penguins. I think they perhaps even look better than the input. So it actually works.

But after eight epochs, I get pretty nice-looking penguins. I think they perhaps even look better than the input. So it actually works.

Variational autoencoders

Similarly, if I use this other approach, the variation of the encoders, I don't have this antagonistic set-up of the generator and discriminator working against each other, but I still have an encoder and a decoder, so the encoder is trying to produce some latent code, and the decoder is trying to reconstruct the input.

In this case, I was thinking I've generated these penguins, and so I want to generate some snowflakes for them. Unfortunately, it's really hard to get training data for snowflakes. I even emailed a book author who produced some nice photos of snowflakes, but so I had to generate my own training data with a method which is like using cellular automata, okay? Doesn't really matter. This is my input data.

And then, again, I'm constructing my model, and now you will see the same principles like before. I have my encoder model, which is a standalone model. I can define like that. The decoder model, I can define like that. And now with the variation autoencoders, the action really is in the loss function, because on the one hand, I want to reconstruct the input. On the other hand, I have a regularization term, which is different between different types of autoencoders. Here I have one specific loss can be a different one. But still, this ends up being like five lines of code or so, and I'm all set up.

And after I've set it up like that, again, I'm just doing the same kind of iteration over the code. So first time, perhaps you haven't seen that before, so it looks like strange, but it's gonna end up all the same always. And now let's see the snowflakes. So I have here, they don't look much worse than the input, actually.

Benefits of eager execution

So these were examples of models which are quite hard to code if you're using static execution, as we have been doing until now. But this gets a lot easier with eager execution. But there's more to it. More stuff, but important stuff when doing the coding. For example, what's really hard is to see what's going on in deep learning.

If I think of GitHub issues we tend to get, what may happen is, like, people want to code their own loss function or their own metric. You can do that in Keras, okay. But then you get a shape error somewhere in there. So the shapes don't match, can happen. Right now, people tend to get stuck because they just don't know how to debug this. And with eager execution, what you can do is you can just print it out. Like printf debugging. So it's, I mean, you're gonna remove it later, right? So why not do it?

That's one thing. Another thing is you can have more modular code, because you can isolate logic in small custom models. And then also, when sometimes we are getting, we want to have architectures where you can't just chain the layers one after each other. You need some interleaving of logic. I'm gonna show that. So these are three examples for how eager execution makes things easier, actually.

Now easy debugging, for example. Here you have this loop again where I'm doing the recording of the actions. And I'm gonna calculate the loss and then doing the back prop. And now really what happens when you actually code this, in practice, you're quite often you may get shape not matching shapes in deep learning. And then what I can do here is I can just really print it out. And I cannot just print out the shape. I can print out the actual images. So when I do this print generated images, I really see, like, float tensors, yeah? Like a 4D array of floats. And this is, like, invaluable for debugging, actually, when you're writing that code.

Also here with the gradients. So one thing if your network just doesn't train, yeah? One thing that can happen is that the gradients just vanish or get too big, for example. And so what I can just do, I can print them out. And there I have it.

Another thing, modular code. So what we often see in longer deep learning examples is, like, I have lots of layers. For example, the ConvNet. And all these convolutional layers, they want their parameters specified. And so it gets long and lots of duplicated stuff. And here what I can do, I can just take part of the logic and put it in a custom model for itself. Here what we often do is either upsampling or downsampling. So convolution or deconvolution. And a typical, for example, downsampling module could look like a Conv layer and some batch normalization and dropout, yeah? And perhaps another Conv layer, another batch normalization, another dropout. And now I don't have to type that again and again and again. I can just define it as an own submodule, so to say.

And then I can go on and use that. If I go back to my generator, this was from the variational autoencoder. This does a bunch of downsamplings and a bunch of upsamplings again. And here I use my custom module, which I defined on the previous slide. So code gets a lot more readable like that, too.

Attention mechanisms

And then, yeah, a nice example for this interleaving logic is, so like one or two years ago, actually since 2013, so a bit longer, a mechanism was described where we have in a network, where the network learns where to direct its attention. So let's say I have this task of describing image captions or learning image captions. So the network is shown pictures and is asked to generate a description for it. Like the bird flies over the lake or something, yeah?

And the architecture is I have a convent to extract the features, yeah? Normal image processing. And then I have an RNN, which is supposed to generate the caption. And now the idea is that as the RNN goes on generating this caption, so one word after another, another place in the image is important, right? Like first it's the bird and then it's the lake. So the network shifts its attention from around different locations in the image.

And this really you can code in static TensorFlow, but you can't code in straightforward Keras. So and it gets a bit nasty in TensorFlow. But now with eager execution, what you can just do is you have your decoder, which is the RNN, which is going to generate that sentence. And as this RNN generates every word, it's going to call the attention module to direct its attention. Things like that have really been really difficult to do without that.

Documentation and what's next

And now, okay. First documentation. So we already have lots of examples for that on the blog. So these are all links to blog articles using eager execution for things like image-to-image translation, image captioning, newer style transfer, and newer machine translation. And of course, we also have documentation how to construct an eager execution model.

Now finally, next up, so to say, we already have one article getting started with TensorFlow probability. TensorFlow probability is a separate module of TensorFlow, which is integrated with TensorFlow, which can run distributed on GPU and cover things like from on the bottom level distributions going up over probabilistic network layers until variational inference, MCMC, and things like that. And that also, we can, this goes together very, very nicely with eager execution. And with the same benefits, I can print out stuff and immediately see what's going on. And we're going to cover that in quite some detail in the nearer future.

Q&A

Thanks a lot for the great talk. I wonder if the TensorFlow execution also integrates into TensorBoard so you can have maybe real-time TensorBoard chart monitoring.

So that for sure will work together because this TensorBoard is such a central element of the whole, let's say, TensorFlow environment that I'm pretty sure the Google guys will take care that it works fine.

How helpful would be the eager execution for Keras users? Would it be useful for debugging or? Absolutely. That's an excellent question. Because right now, when we say Keras, we can mean two things, right? We can mean the standalone Keras or we can mean the TF$Keras or TF.Keras implementation. And these examples, at the moment, they are presupposed that you use that TF.Keras implementation, which is, as of today, is not the default implementation we're using, but you can already use it. And probably we're going to switch to that as the default implementation. And you can do all of what I said you can do with that TF.Keras implementation.