Resources

Heather & Jacqueline Nolis | Push straight to prod: API development with R and Tensorflow | RStudio

Talk from rstudio::conf(2019) When tasked with creating the first customer-facing machine learning model at T-Mobile, we were faced with a conundrum. We had been told time and time again to deploy machine learning models in production you had to use Python, but our very best data scientists were fluent in building neural networks in R with Keras and TensorFlow. Determined to avoid double work, we decided to use R in production for our machine learning models. After months of work, wrangling our containers to meet cloud security compliance, and conforming to DevOps standards, we succeeded in creating a containerized API solution using the keras and plumber R packages and Docker. Today R is actively powering tools that our customers directly interact with and we have open sourced our methods. In this talk, we'll walk through how to deploy R models as container-based APIs, the struggles and triumphs we've had using R in production, and how you can design your teams to optimize for this sort of innovation. About Heather Nolis: Heather Nolis is a founding member of the AI @ T-Mobile team, focusing the conversion of cutting-edge analyses to real-time, scalable data-driven products. She began her career in neuroscience but once realized how heavily that field relied on software built by other people, she pivoted - deciding to make software herself. You can find her @heatherklus on Twitter, where she speaks about diversity in technology, the ethical implications of data, and cats. About Jacqueline Nolis: Dr. Jacqueline Nolis is a co-founder of Nolis, LLC, a data science consulting firm. She has over a decade of experience using data to help companies including DSW, Union Bank, Microsoft, and Airbnb. She has a PhD from Arizona State University where her research focused on electric vehicle route optimization. For fun she likes to use machine learning for humor

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, I'm Jacqueline Nolis. I'm a data scientist. I'm joined here today by Heather Nolis, machine learning engineer from T-Mobile, and this talk is about how we did something we're so excited about, and it's how we got R working in R in production environments.

So what does it mean to put something into production? And we heard about that a little bit already during the keynotes. But we think about it like this. With data science, there's like two types of data science. There's A-type analysis, where you're trying to figure out an idea, do an exploration, figure out something, and your deliverable is an idea.

And then there's the building type, where you make a machine learning model, and you want to run it. You want to run it over and over, like if you're designing a product recommendation engine for a website.

And so at T-Mobile, we really define putting code into production as making it so that customers can interact with it. And this morning, we heard about making dashboards that many users can use at once, maybe 20. Well, we have 70 million customers. And we just merged with Sprint, who has another 60 million customers. So we have lots and lots of people.

And when we're talking about machine learning, we're really talking about making machine learning models that all of those people can use. Hopefully not all at once, but all of them could use.

The AI at T-Mobile project

And so the project was called AI at T-Mobile. So T-Mobile has been doing data science for many years. And we have lots of people who are very good at things like that exploration, building churn models on our customers, that sort of reporting work.

But we didn't have very much of putting machine learning models that we developed into production environments so our customers can use them. So this project was to actually take machine learning and get it in front of our customers in a way that really improves the customer experience. And I say that because T-Mobile has a really big deal about being the uncarrier.

So we're not going to make you have to go deal with chat bots and have bad experiences. We want things to be pleasant for our customers.

So our first scope was customer care messaging. So right now, today, you can go on your phone and you can text T-Mobile. And you can say something like, I don't have my coverage is bad. And a human being at a desk will respond back to you and help you try to diagnose your problem. And so we can do that through text message. We can do that through Facebook Messenger. You can slide into T-Mobile's DMs and send a message and a human being will respond back to you or use our in-app messaging. So we have all of this text data. We have all of this stuff that's ripe for natural language processing. And our objective was let's get some machine learning around that that makes things better for the customer.

So our first particular use case was consider, you know, our customers, they talk in all sorts of weird ways. So suppose we had one particular customer come to us and say, this high bill shall not pass, which is an utterance we had never seen before. And, you know, we have the goal, if we have an agent at a computer who's responding to these messages and maybe has like eight conversations going on at once, we have the goal to try and prep them. That like, oh, it would be really helpful if that agent when they started the conversation already knew the status of the customer's bill. And so what is our method? Well, we're going to build a classification engine with machine learning. So in this case, this high bill shall not pass would be classified as a bill breakdown message. And further, we can improve that by actually using customer data. So if we know something about this customer, like if we know their recent account activity, their current signal strength, their bill status, like maybe it's overdue, that may change what we want to show the agent. So you can imagine if a customer just got their account suspended, like 20 minutes ago, and they come to T-Mobile and they say, hi, well, we probably know what they're talking about.

Building models in R

So we can use that data to help inform our machine learning prediction. So how do we create the models in R? Well, our workflow was like this. One, we would start by using R markdown for exploratory analysis. So we would take those many conversations that we have between customers and agents, and we would try and build an understanding of what happens on them. What are the things people kind of talk about? And then once we get a better idea, we would build a machine learning model in R, and then we'd save those models to flat files. And in particular, we would do the model building with R markdown so that we could actually have a log of exactly how it was built, what data went in, how well the fit went, and then we'd save all of that. And then before we even get to actually like deploying or anything, we need to get the business people to be excited about this.

So we show our model off with a shiny demo. So after we do the exploratory analysis, we would start building a model. We ended up building neural networks using Keras. So Keras is a really, really cool R package from RStudio that lets you very easily build deep learning models. And so for us, we ended up using a convolutional neural network, which is a particular type of neural network, to do the text parsing and understand what was the topic going on in the text. But we also had to bring in all those other data types too, like what's the order status or, you know, is their account currently suspended.

And so we had more traditional neural networks parse that data. And then we put that all together with one final neural network that outputs which classification. So in this case, we would train a neural network that would learn that unlock my phone with a recent order, that has an 80% chance of being categorized and unlocked.

So we built these networks, and we were pretty excited about them. But we have these problems because we'd go into meetings with the business people and be like, guys, we're having look at these ROC curves. 80.8. And we're like, whatever. And so we had one meeting with some business people, and 10 minutes before, we're like, let's just throw it in a Shiny demo. And this changed the game. The fact that we actually had Shiny demos where a business person could type in, I want to unlock my phone, and actually watch that would be classified as unlock. This got us, and I'm not exaggerating, millions of dollars of funding, multiple people added to our team. And it's because we could take this, show it to a business person who showed it to the director, who showed it to the VP, who eventually we ended up showing this stuff to the CIO, all because we had a really nice Shiny demo that we built around our models.

This changed the game. The fact that we actually had Shiny demos where a business person could type in, I want to unlock my phone, and actually watch that would be classified as unlock. This got us, and I'm not exaggerating, millions of dollars of funding, multiple people added to our team.

Putting R into production

So I'm a data scientist, and I'm very comfortable doing all the things I just showed, but putting things in production, deployments, when I started this project, a lot of that stuff was new for me. But thankfully, our team had machine learning engineers to help.

Okay. Am I on? Hello? Can anybody hear me? There we go. Okay. So when it came time to put our models in production, I was handed these R files, and I was a Java developer, and I was like, I really don't know what to do with this or how to get it in a place that can interface with our 70-plus million customers. And so every single person had told our team, if you want to do machine learning in production, you have to use Python. But I just handled a really cool R project, so I was like, hiring data scientists to redo all this work in Python and redo an analysis seems silly. Since the model's already built and it's very good and the business people really like it, I could convert it to Python and then put it in production, but that seems silly. So I was like, what if we just do it all in R, and what if we take out this double work that people assume that you have where you have to rewrite things for 70 million people using Python?

So, yeah, the radical idea here was what if we treat R like a real programming language? Because it is a programming language. And so how things work on our team is we have all of our stuff in repositories, as soon as you make a commit, and so this is the way any of our Java or Node projects would work, as soon as you make a commit to a repository, some software called Jenkins builds it up into its final project, Marathon, it's kind of like off-brand Kubernetes, orchestrates the container development and mazes hosts and replicates those containers for you, so we have a really nice system in place for Java projects where you can just make a commit and then instantly something's in production and it's facing with your customers, so I was like, well, let's make R do that and avoid this whole Python trip entirely.

The radical idea here was what if we treat R like a real programming language? Because it is a programming language.

And so the first thing that we had to do is we had to convert our R models into an API, and so what an API is, is right now if I was to go on my phone and type weather.com, it returns back some website for me, and that's an API, it's just a way of using the Internet to send a message, I want to go to weather.com and get some information back, here's the entire website, that's weather.com. You can also use it with model input and output, so we want you to predict how long this stem length is, send it to the model via an API and then it comes back, okay, I have a prediction for you, so it's very, very easy. So once you have a REST controller, which defines all the little slashes that you would put on your website URL, you just start plumber on your R model and now you can go into your browser and type in something and immediately get information from our model back despite not being close to it, which is really cool and that's how all of our services talk to each other in engineering is via API calls, so this way we can have Java talking to Python, talking to R, it all makes sense because it all uses the Internet.

Containerizing with Docker

And then the way that we package everything up on our team is with containers, and so what containers are is if I was to go to Best Buy right now and buy a laptop and I really wanted to put Jaclyn's model on it, I would have to go home, I would sit down, I would install, I don't know, my Internet and I would install Outlook and then I would install RStudio and then all of the packages needed for RStudio and it gets quite cumbersome to set up the machine and then of course I'm using a more recent version of the package than she was using, so her model still won't run even though she emailed me and I downloaded it, and so what Docker is essentially is just a way to write down all of those instructions so that way they're repeatable forever, so that way any time a new virtual machine is spun up, a little container on it can just repeat those Docker images and they have the exact same thing that I was looking at when I was on my laptop or the same thing Jaclyn was looking at when she was on her laptop, so I can then take my code to someone who doesn't have R installed on their computer and if they have Docker it will run on their machine.

And so the first problem that we actually had is I have this container, we used a rocker based image, rocker is a company that makes these R, I guess it's an organization that makes these R based Docker images, so you can just kind of copy the work that they've done and start from there, and the first thing is so I have R in a Docker image and I have Jaclyn's model, but the big problem is that R Keras actually requires Python, so you have to call Python on the end, so we're like, okay, we have this Docker image that contains R, it contains RStudio, it contains our model, so, okay, let's put in the actual Python that's required to make this run, so, okay, we finally get Python installed, everything should be good, but it's not, because plumber doesn't support HTTPS and we're a big enterprise and so we have to have really secure connections between stuff, so HTTP was not sufficient, so inside of our Docker container, okay, we'll add an Apache 2 server that sits in the same container as our R code and all it does is take HTTPS request, converts them to HTTP, does the model processing and then passes it back through, so now we have R and we have the model and we have Python and we have this Apache 2 server and so it's getting quite hefty, and that was actually the next trouble that we ran into, we deployed it, it went through our Python, it went out perfect, but it was nearly 5 gigs in size by the time that you get all the dependencies and everything that you need and so this is a quick breakdown of what our container actually looked like, so if you look, the bulk of the container ends up being Python and our DevOps team was like you guys can't keep running 5 gig containers, you've taken down entire T mobile clusters, like huge problems are occurring because you're throwing this out there, so we said, okay, we'll do some work and so we switched the full version of Anaconda Python, we removed RStudio from the Docker image and we spent a lot of time removing all these unnecessary Linux and R and Python libraries to get the container size small enough that DevOps would say, okay, we're totally fine with you using R now because you've made it small enough, it contains all of our security requirements, it contains everything it needs to run, it looks good to go.

And we did it, so we deployed a model in R, the first T mobile's first customer facing deep learning model and it was all completely written in R, so it's an R native API, it has TensorFlow, it's super, super small and it's just as secure as everybody wants it to be and is required, so really cool.

Lessons learned

But we learned a lot of stuff in doing this, and so I think that the first thing that we learned is, you know, people came to us in the beginning and said you have to write this in Python, we won't support you doing it in R, but we tested our APIs at 50 times T mobile's maximum load and R performed just fine, so for us, using R and plumber was at very close parity performance wise with Python and Flask, and then R was really good for us to do the quick data exploration that got us the idea to build this model in the first place, and then the shiny demos are what really, really sold our project, and there's not really an equivalent for Python, you can't just quickly throw a few lines of code and take it to a business person and have it passed around like that, and most importantly, language is never the fail point for our project.

It was always security concerns of the container being too big or Python being giant or some dependency projects, but it was never R that was the reason that we were struggling at any point.

And the next thing we learned is that it's really, really critical to work in a flat team, so traditionally how this works is data scientists, they build the model and they'll send it over to the engineers, the engineers rewrite in Python, and then they'll deliver it to the business, but the business wants it a little bit different, so it goes back to the data scientists, and we stopped that chain of telephone, and we're like what if we all just sit together, so our team was literally two machine learning engineers, three data scientists, a Java engineer, and then our business person all together at the same table every single day, so that way if the machine learning engineers can be developing the API and then the data scientists find something new they want to do and you can just talk to each other instead of waiting two months until you get a finished product and then trying to iterate on that, so it made it a very, very, very agile structure.

Where things stand today

So with all that work, it was a lot of work, you're very excited about it, fun work, where are we today?

The first thing is we have R in production right now in a way that any iPhone user in the audience right now, you guys can hit R code, make R code run, which is I don't know, I never thought there would be that day, never mind, I was part of that day.

So in particular, there's a product called Apple Business Chat. So about a year ago, Apple released this way for companies to actually use iMessage. So if you search for T-Mobile on your iPhone, you can start a chat with T-Mobile, and it's like a nicely colored branded box, but T-Mobile doesn't get to know your telephone number. All they know is that you're an anonymous person.

And so the first thing T-Mobile used to do was ask you, hey, are you a customer or not? Because they needed to know if they wanted to route you to customer carriers or sales. And so we'd ask the customer and get a yes or no back. And to us, this was ridiculous, right? If someone says, why is my bill so high? Of course they should go to customer. If someone says, I want to switch to T-Mobile, they should not go to customer. And so we built a machine learning model around that. And so now every time someone in iMessage starts a new conversation, it sends it to the R machine learning model. If the R machine learning model is confident they're a customer or confident they're not, then they don't see this picker at all. If we kind of get unsure, if they just say hi, then they would still get the picker like old. So we've actually used machine learning and AI to make the customer experience more streamlined rather than to do an annoying chatbot kind of like, please say unlock, you know?

So we're really proud of that.

And also you can have some of this work, too. So we took a lot of what we've done and we've really packaged it up and made it open source. So we've written several blog posts on how to get started with plumber, Docker, and our actual code. And then we created a repository that is a full Docker container that uses R running a TensorFlow neural network that you can install Windows R, you can download this and run it, and it just works. The neural network we used as our sample data generates pet names. So it's a lot of fun. I recommend you trying it out.

But this is really something that I highly, if you're interested in how to work, I highly encourage you just to look at it, because you may learn a couple things that might make it easier and things that we had to learn ourselves over a lot of time.

So thank you so much. The materials from this are all located at Nullusllc.com, the blog post, the GitHub repository, these slides. So that's really useful. Our Twitter handles are up there. We love being tweeted at. And thank you so much. And I'd like to also just thank Heather and I are here, James Ellison, our product person is in the audience, and we are so many other T-Mobile people who really helped with this project, too.

So we like to do a lot of talking, but it was not just us who did this project. And also we could talk about this literally all day. We were like, how can we condense it down into just 20 minutes?

And then also we are hiring. So if you think that what we're doing sounds really cool, we need really good data scientists, and we prefer our users, obviously. And so this seems like a really great place to kind of announce that, and you can hit us up on Twitter, whatever you feel like.

Q&A

So you said you show business people Shiny apps to demo your model. Do you have like an area where you just deploy those kind of Shiny apps for projects?

So our Shiny projects actually go through the same pipeline that I just showed you. So we have one single container right now that contains all of our different Shiny apps. And we have it up so that way any time that James needs to go take this to a business person and show something that we built off, the link is up and always ready for him to do that.

I would say we are going to switch to having a Docker container that has each app in it separately. Because we've run into trouble where like we've updated one app and it's broken another. So going forward, we're switching to one container for each.

Because this specific instance is a Keras model that isn't necessarily specific to R or Python, do you need R or Python at all? Is it because of like the API wrapper or text preprocessing?

So I think that from my perspective, it was that I was handed a model in R. Why would I rebuild that? Like it already exists. And so the exploration was done in R. And so that's why we decided to support our native model.

So what we do, you know, with us, we first take the conversation and then we remove the weird characters. And then we do this. And then we feed it to the model. And, you know, some of our engineers are like, wow, we could technically run TensorFlow in Java. But then we'd have to figure out what to do with all those steps. And it's like, well, we'd have to keep them in R anyway.

So I think that normally the argument that people put forth is like, oh, well, it's more scalable and we support Java. And it's like we just used all the Java tools that exist and did it in R and it worked just fine. And then you eliminate the double work.

I have a question for your neural network. So based on the application, it looks to me it's like a natural language processing. So it should be the sequence model. But you mentioned about the convolutionary neural network. Can you give me some explanation here?

So we tried actually started with a recurrent neural network. And we found that the recurrent neural network did not perform as well as a convolutional neural network. Our suspicion is because of the classification, usually someone says, Bill, so high. And those three words, it doesn't matter what's all around it. So that's our suspicion for why a convolutional neural network works slightly better. But we were actually surprised at how shallow the optimal networks ended up being. We were expecting to have five layers and recurrents, but it's really one convolutional layer, one or two dense layers, and then that's the whole network.

You spoke about making your app secure. Can you speak a bit about that? Two-factor authentication and how you're able to achieve that using R?

So all of our stuff right now is accessed from within our network. So we don't have to do an Apigee authentication. But if we did an Apigee authentication, it would be very simple. We would just make the relative API calls. One thing that we did do is that all of our messages have to be passed encrypted. And so we used a, there's a library that's Sodium that's really good in encryption. And then you can decrypt in Java. So we were being passed things from Java that we then had to decrypt and then re-encrypt in R. And so we use that all using the Sodium library to do that, which is AES-256 encryption. So it's pretty secure.