Diversify Your Career with Shiny for Python - posit::conf(2023)

Transcript#

This transcript was generated automatically and may contain errors.

So I think if you're here at this conference and you're like me, probably you feel that R has a first-class place within any modern data science team, right? You know, kind of posit, use a lot of R, and that was most of my career. But the thing is, is that a lot of the time that decision is not really up to you. And you can find yourself in a position where the tools that you kind of know and love, actually you're prevented for whatever reason from using them to accomplish your goals, right?

I wanted to start this out by talking about one of my kind of larger career mistakes, and aside from going to law school, this is maybe like my largest career mistake. So I was hired by a company called Socure to build Shiny apps and to improve their R tooling, right? R functions to help teach people to use R better, and I was good at that. And I was at Socure during this, I was lucky enough to be there during a time of really extraordinary growth. So this company hit right at the right time and doubled revenue basically every year for about three years. And with that growth came one important challenge, which is that in order to, the way our business was structured, every sort of marginal increase in revenue required more and more data scientists. So by the end I think we were hiring something like 40 a year. And finding data scientists who knew something about identity fraud and also were good enough at R was very difficult.

So the company at some point decided that we're gonna standardize on Python to have access to a larger hiring pool. Right around this time my boss came up to me and said, Gordon, you know, I love the work you've done with R, can you do the same thing with Python? What I should have said was, yes, of course, it's Python. You know, it's not that far away. I know this problem domain very well. I feel like I have a good intuition about how people analyze data. I can do that. But I didn't do that. I didn't do it because I had a lot of doubt about my own intuition about how to build good Python tools, and just kind of like a basic lack of familiarity. My R skills were so good at that point that whenever I tried using Python I felt like I was all thumbs.

And the final thing is it proves that you know Python. And this is something that I think is really unique about building web applications, is that they serve as a kind of visual proof to other people and to yourself that you know how to provide value in Python, right?

Why Shiny is better than other Python frameworks

And the thing about that proof is that I really do think that the thing that you will produce, especially if you're somebody who has a little bit of experience with Shiny for R, is going to be really impressive to somebody who's looked only at Python web application frameworks. Because I think Shiny is better. And this is something that I'm probably saying in a little more stark way than I have said in other contexts. And better, of course, means different things to different people. But my work on the Shiny for Python team right now, I wouldn't be doing that if I didn't think it was the best, right? So I just wanted to clarify what do I mean by better?

And for this I have another story. Which is after we made this transition to Python, we put together a web API that was for people to navigate our machine learning feature store. Like, get information about features, change them, et cetera, et cetera. It's a great important thing of centralizing this thing that we dealt with all the time. It caused a huge amount of friction. And it needed a front end. And we sort of looked around and said, okay, well, we're a Python shop now, how are we going to build this front end? And, you know, looked at various different options and the decision was, like, we're going to do it in React. We're going to build it in a, you know, there's going to be a web application and JavaScript front end, you know, be super pro, right? Wouldn't have any React developers on the team. So this implied hiring. And hiring implied budget. And budget implied approval. And ten months later we still didn't have this kind of value, realized. We didn't have the product, right? And that's the thing that I really hate personally, as whenever I've worked on a data science team, is that I want my team to be able to deliver the whole project, right? I don't want to have the situation where we built the model, we built the API, we did something, and then there's another team that we have to wait on or get cajole or get approval or get roadmap space to finish the project and actually show it to the user, right? I want to have something where we can do the whole thing, right? This is what Shiny does and I think what makes Shiny so special.

So the way I kind of talk about this is an idea that you might call programmatic range, right? It's a framework that can do a lot of things and can do it smoothly. And a lot of the other web application frameworks that you talk about, they sort of say like, look, we can do simple things, we can do complicated things, but when you dig in, you realize that they're doing the simple thing and the complicated thing in totally different ways, right? It's almost as if they're like several frameworks in a trench coat, you know? And there's like, you have this simple thing and then like, oh, you're a little more complicated, like, let's learn a whole new pattern and you're a little more complicated than that, oh, we're gonna do a whole new thing, right? And Shiny is not like that. Shiny has the same pattern for every single Shiny app, reactivity, right? So if you know it, I taught a workshop on Saturday, the things that we taught them that day, first time people experiencing Shiny, those are the same things that are done in the most complicated Shiny apps. And so Shiny has this ability to serve you from very, very simple things to very, very complicated things. So that means for like that API example, if we wanted to build a prototype in Shiny just to get us to that 10 months until we have that React developer who can build something real, you know, that works, right? It's fast enough to do that, easy enough to do that. But if it turns out we actually don't want to hire that person or we can't, that prototype can smoothly grow into the product and Shiny has all the tools for doing it. So you don't need to hire anyone, right? That's my kind of definition of better, I guess.

Shiny has the same pattern for every single Shiny app, reactivity, right? And so Shiny has this ability to serve you from very, very simple things to very, very complicated things.

And so I just kind of wanted to give a little bit of a, there's about 20 different Python web application frameworks, so I'm not going to go through all of them, but I wanted to go through the ones that I've sort of seen being most commonly used in the businesses that I've talked to.

So Streamlet is a very popular one, it's a wonderful language, it's optimized for, a wonderful framework, it's optimized for very, very simple applications. And that's kind of their load stars, like let's get the simple stuff to work and we will go, we will sacrifice anything for that goal, and the way they've done this is by running the entire script from top to bottom on every user interaction. So this is a little bit like if you're running a parameterizer or a markdown or something like that, where every time anything changes, the whole thing runs, right? And if you think about this, that probably seems pretty inefficient, it is, and that's because it's sort of where their design space is, it's very, very close to this beginner experience of getting something working. But this means that complex apps, in my opinion, are not really possible, right? Like at some point you reach this cliff with Streamlet where you have to throw it away, right? And you cannot grow into a product at some point, right? Not to say you can't do impressive things with it, but there's a sort of line where it doesn't work. And simple things are oddly difficult, like updating a slider, changing a button color, these things require JavaScript or at least state management.

Dash is built around this idea of statelessness, and what that means is that each graph and table and component of your application has to be able to do everything independently. It's very difficult to share data between these two things. So if you had CSV that generated 20 plots, a kind of usual way of accomplishing that in Dash, not the only way, but a usual way, is to have that CSV be read in for each plot 20 times, right? And you can sort of share data, but it involves caching it in the browser or storing it into a database and having it reread and lots of things like that. So this is like if you imagine Shiny without reactive expressions or any reactive value or server state, it would kind of be something where it's like pretty flat experience, right? You don't have this idea of like caching an intermediary value and using it in many places. This is important. It does solve an important problem, which is Dash apps are incredibly easy to scale horizontally. You can have one app be served by many processes on a server or even many servers, so that's it's really good. My guess is that I've never had that problem. I'm not sure that many people actually do.

And finally, these are kind of the Django, Flask, FastAPI, the sort of combination of an API layer that's written in Python with some front end. This is really good for many kind of larger applications, very steep learning curve, and you need to write and maintain your own front end.