Resources

Diversify Your Career with Shiny for Python - posit::conf(2023)

Presented by Gordon Shotwell A few years ago my company made a sudden shift from R to Python which was quite bad for my career because I didn't really know Python. The main issue was that I couldn't find a niche that allowed me to use my existing knowledge while learning the new language. Shiny for Python is a great niche for R users because none of the Python web frameworks can do what Shiny can do. Additionally, almost all of your knowledge of the R package is applicable to the Python one. This talk will provide an overview of the Python web application landscape and articulate what Shiny adds to this landscape, and then go through the five things that R users need to know before developing their first Shiny for Python application. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Data science with Python. Session Code: TALK-1138

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So I think if you're here at this conference and you're like me, probably you feel that R has a first-class place within any modern data science team, right? You know, kind of posit, use a lot of R, and that was most of my career. But the thing is, is that a lot of the time that decision is not really up to you. And you can find yourself in a position where the tools that you kind of know and love, actually you're prevented for whatever reason from using them to accomplish your goals, right?

I wanted to start this out by talking about one of my kind of larger career mistakes, and aside from going to law school, this is maybe like my largest career mistake. So I was hired by a company called Socure to build Shiny apps and to improve their R tooling, right? R functions to help teach people to use R better, and I was good at that. And I was at Socure during this, I was lucky enough to be there during a time of really extraordinary growth. So this company hit right at the right time and doubled revenue basically every year for about three years. And with that growth came one important challenge, which is that in order to, the way our business was structured, every sort of marginal increase in revenue required more and more data scientists. So by the end I think we were hiring something like 40 a year. And finding data scientists who knew something about identity fraud and also were good enough at R was very difficult.

So the company at some point decided that we're gonna standardize on Python to have access to a larger hiring pool. Right around this time my boss came up to me and said, Gordon, you know, I love the work you've done with R, can you do the same thing with Python? What I should have said was, yes, of course, it's Python. You know, it's not that far away. I know this problem domain very well. I feel like I have a good intuition about how people analyze data. I can do that. But I didn't do that. I didn't do it because I had a lot of doubt about my own intuition about how to build good Python tools, and just kind of like a basic lack of familiarity. My R skills were so good at that point that whenever I tried using Python I felt like I was all thumbs.

Why "learn Python" is bad advice

And since then I've kind of been wondering, like, what was it, you know, that prevented me from having that confidence? Like, what was it that I didn't have? Of course, my whole career people have been telling me to learn Python. And I tried. I'd taken online courses, built little projects, etc., etc. And I think there's a couple of reasons why it's really difficult, or why I found it really difficult. The first one is that this goal is just absolutely terrible advice. Learn Python in the generic sense is a terrible advice. Because of course there are many Pythons. You know, like, you talk to somebody who's a data scientist and they say the important skill is pandas. You talk to somebody else who is a, you know, does a web serving project in Python and they'll tell you that pandas is a horrible dependency that'll ruin your life, right? So it doesn't really have any particular meaning. And because it doesn't have any particular meaning, you don't know the answer to a lot of these questions. When are you done? You know, what particular value does that provide to you in the short term? And how do you prove it, right? And all these things kind of prevent you from making progress, right? It's not a good goal.

Secondly, R and Python are extremely close substitutes. They do, especially in the data science space, they do almost identical things. You know, we just heard that from Plot9, right? You have ggplot in R, Plot9 in Python, right? They're substitutes. You can do the same thing with them. And I would say for almost every task that you can point me to in Python, I can say I can do it in R and vice versa, right? And the cost of that, what that means is that there's this incredible awkwardness when you're trying to learn something that's almost exactly like what you already know, right? Because you're using it and you feel like you should already know how to do this thing, right? They're close enough that you feel like you should know it, but they're far enough away that you really don't, right? And because they're such close substitutes, in the short run, it doesn't feel valuable. You don't get that feedback that you get when you first start programming or when you start programming in a language that's truly different from one you know, which is that I have gained some new ability that I didn't have before. So I learned, my second language was actually React Native when I wrote a mobile application, and that was easy to learn, right? Because I was able to sort of do something I couldn't do before, write something that goes on somebody's phone, you know, that was amazing. Can't do that in R, couldn't do it in R.

Value-based learning

So there's a lot of different ways of kind of coming up with a better goal for learning these systems. Like you often hear SMART, like Specific Something Something Actionable, I can never remember. Lots of different acronyms. But the one I kind of gravitate towards is really simple, which is, you might call this like value-based learning. And really what this means is that you need something, whenever you're learning a new thing, you need to find the part of that thing that provides you with real value really quickly. When I say real value, this is, I think, crucial, right? A lot of times when you learn a new thing, you use a toy example or one that's provided by the sort of generic way of educating a shelf, and it doesn't really kind of do the job, because you're not really stressing your understanding of it. You're not trying to accomplish a goal that you personally care about. So real value here, I think, usually means, in a work context, something that generates money or saves time. And it doesn't necessarily need to be that, but most of the time that's what it is. So picking a part of that project where you're able to show that to somebody else in your life, your boss or your co-worker, and they're impressed. They see that there's some value for themselves and their organization. And the second thing is that you need to have that part where you realize that value extremely quickly. I think in a matter of weeks, you should be able to take something from that educational experience, show someone else, and justify the time that you've spent on it. I don't know about yourself, but I have a fairly busy job and two young children, and all of my time is spoken for. So I need something where it gives me that value right away.

Why Shiny for Python fits the bill

So why is Shiny for Python a good option under this framework? I think there's three reasons, and this is what most of this talk is going to be about. Shiny for Python is the thing if you're an R developer, you want to diversify your career, don't end up like Gordon, have something where you're in your back pocket if you ever end up in this situation. Shiny for Python is the ideal thing to do. There's three reasons for this. The first is that it gives you something that you don't have. The second is that it's better than other Python web application frameworks. And finally, you already mostly know it.

So what do you get? And I should specify here that the R Shiny application framework is not going anywhere. It's the leader on our team. It's where features probably land first and are going to land first for a while. And we have a huge community of R users, people who are building Shiny, R Shiny applications in incredible ways, doing incredible extensions. So those aren't going anywhere. So feature-wise, it's not like you get a bunch of features that you don't have in R. But you do get a couple of things. So the first one is you get much easier deployment in various different entities, right? So this is something that, over my career as a Shiny developer, has come up again and again and again, which is I build something that's useful, great even, impressive, and then you kind of go to deploy it, and you run into all kinds of barriers. And these apps, these undeployed apps that live on my laptop, mostly worthless, right? And R deployments are a little bit fraught, right? Because it's a niche language. It's often adding a dependency to any of your deployment environments. And that can be pretty difficult. So you get into a lot of these questions. And whether you think these questions are right or wrong, valid or invalid, they're things that people believe and that prevent you from deploying your apps, right? DevOps doesn't know how to use R. Will this scale? What happens if you leave? Who will I hire to do this? Right? Shiny for Python doesn't really have these problems because it's a kind of, it's built on foundational Python web application frameworks. So people will at least think that they can deploy it and manage it, right? Because a DevOps person will look at it and say, oh, it's just a Python, it's an ASGI Python application. I can probably handle that, right? In a way that when you're trying to deploy an R application of any kind, you can kind of often run into those problems.

Second reason is that it's much easier to integrate into Python, the Python ecosystem. And especially over the last year, you've seen just so many incredible things that have come out of Python frameworks, large language models, things like that. And, well, it's, you can integrate these things with something like Reticulate in a lot of way, a lot of the time, like 95% of the time, that 5% of the time can be really painful, right? So, because Reticulate runs Python in R, so it's adding like two layers of things on top of your framework. You can kind of end up in these edge cases that are pretty difficult. Not everything works. Environments can be challenging. And especially if you're working in industry, it's very hard to get help. If you're having trouble installing somebody's kind of internal janky Python package and they come over to your desk and you're like, hey, I'm trying to install it in R, like whether it's the R's problem or their's problem, they're going to say it's R's problem, right? So having something where you just have a single language at runtime is really helpful for integrating those things. One language is simpler.

And the final thing is it proves that you know Python. And this is something that I think is really unique about building web applications, is that they serve as a kind of visual proof to other people and to yourself that you know how to provide value in Python, right? Even if your code is not the best, right? Even if you don't feel like you really know how you do it, the fact of the existence of that artifact is a way of sort of proving to other people in your life that, yeah, you know how to do this, right? And validating it for yourself.

And the final thing is it proves that you know Python. And this is something that I think is really unique about building web applications, is that they serve as a kind of visual proof to other people and to yourself that you know how to provide value in Python, right?

Why Shiny is better than other Python frameworks

And the thing about that proof is that I really do think that the thing that you will produce, especially if you're somebody who has a little bit of experience with Shiny for R, is going to be really impressive to somebody who's looked only at Python web application frameworks. Because I think Shiny is better. And this is something that I'm probably saying in a little more stark way than I have said in other contexts. And better, of course, means different things to different people. But my work on the Shiny for Python team right now, I wouldn't be doing that if I didn't think it was the best, right? So I just wanted to clarify what do I mean by better?

And for this I have another story. Which is after we made this transition to Python, we put together a web API that was for people to navigate our machine learning feature store. Like, get information about features, change them, et cetera, et cetera. It's a great important thing of centralizing this thing that we dealt with all the time. It caused a huge amount of friction. And it needed a front end. And we sort of looked around and said, okay, well, we're a Python shop now, how are we going to build this front end? And, you know, looked at various different options and the decision was, like, we're going to do it in React. We're going to build it in a, you know, there's going to be a web application and JavaScript front end, you know, be super pro, right? Wouldn't have any React developers on the team. So this implied hiring. And hiring implied budget. And budget implied approval. And ten months later we still didn't have this kind of value, realized. We didn't have the product, right? And that's the thing that I really hate personally, as whenever I've worked on a data science team, is that I want my team to be able to deliver the whole project, right? I don't want to have the situation where we built the model, we built the API, we did something, and then there's another team that we have to wait on or get cajole or get approval or get roadmap space to finish the project and actually show it to the user, right? I want to have something where we can do the whole thing, right? This is what Shiny does and I think what makes Shiny so special.

So the way I kind of talk about this is an idea that you might call programmatic range, right? It's a framework that can do a lot of things and can do it smoothly. And a lot of the other web application frameworks that you talk about, they sort of say like, look, we can do simple things, we can do complicated things, but when you dig in, you realize that they're doing the simple thing and the complicated thing in totally different ways, right? It's almost as if they're like several frameworks in a trench coat, you know? And there's like, you have this simple thing and then like, oh, you're a little more complicated, like, let's learn a whole new pattern and you're a little more complicated than that, oh, we're gonna do a whole new thing, right? And Shiny is not like that. Shiny has the same pattern for every single Shiny app, reactivity, right? So if you know it, I taught a workshop on Saturday, the things that we taught them that day, first time people experiencing Shiny, those are the same things that are done in the most complicated Shiny apps. And so Shiny has this ability to serve you from very, very simple things to very, very complicated things. So that means for like that API example, if we wanted to build a prototype in Shiny just to get us to that 10 months until we have that React developer who can build something real, you know, that works, right? It's fast enough to do that, easy enough to do that. But if it turns out we actually don't want to hire that person or we can't, that prototype can smoothly grow into the product and Shiny has all the tools for doing it. So you don't need to hire anyone, right? That's my kind of definition of better, I guess.

Shiny has the same pattern for every single Shiny app, reactivity, right? And so Shiny has this ability to serve you from very, very simple things to very, very complicated things.

And so I just kind of wanted to give a little bit of a, there's about 20 different Python web application frameworks, so I'm not going to go through all of them, but I wanted to go through the ones that I've sort of seen being most commonly used in the businesses that I've talked to.

So Streamlet is a very popular one, it's a wonderful language, it's optimized for, a wonderful framework, it's optimized for very, very simple applications. And that's kind of their load stars, like let's get the simple stuff to work and we will go, we will sacrifice anything for that goal, and the way they've done this is by running the entire script from top to bottom on every user interaction. So this is a little bit like if you're running a parameterizer or a markdown or something like that, where every time anything changes, the whole thing runs, right? And if you think about this, that probably seems pretty inefficient, it is, and that's because it's sort of where their design space is, it's very, very close to this beginner experience of getting something working. But this means that complex apps, in my opinion, are not really possible, right? Like at some point you reach this cliff with Streamlet where you have to throw it away, right? And you cannot grow into a product at some point, right? Not to say you can't do impressive things with it, but there's a sort of line where it doesn't work. And simple things are oddly difficult, like updating a slider, changing a button color, these things require JavaScript or at least state management.

Dash is built around this idea of statelessness, and what that means is that each graph and table and component of your application has to be able to do everything independently. It's very difficult to share data between these two things. So if you had CSV that generated 20 plots, a kind of usual way of accomplishing that in Dash, not the only way, but a usual way, is to have that CSV be read in for each plot 20 times, right? And you can sort of share data, but it involves caching it in the browser or storing it into a database and having it reread and lots of things like that. So this is like if you imagine Shiny without reactive expressions or any reactive value or server state, it would kind of be something where it's like pretty flat experience, right? You don't have this idea of like caching an intermediary value and using it in many places. This is important. It does solve an important problem, which is Dash apps are incredibly easy to scale horizontally. You can have one app be served by many processes on a server or even many servers, so that's it's really good. My guess is that I've never had that problem. I'm not sure that many people actually do.

And finally, these are kind of the Django, Flask, FastAPI, the sort of combination of an API layer that's written in Python with some front end. This is really good for many kind of larger applications, very steep learning curve, and you need to write and maintain your own front end.

You already know Shiny for Python

So the last reason, so those are kind of what you get, right? You get to do something you can't do, and you give your boss or your company something that they can't do. But the cost side is also really low, because I think most of the people who are kind of familiar with R Shiny actually already know Shiny for Python. And the reason is that when you kind of are learning Shiny, you feel like you're learning an R package, right? Because there's only an R package. But most of the concepts and things that you're actually dealing with are sort of generic parts of the components. So you can think about Shiny and Shiny for R and Shiny for Python as being client libraries to the same general framework. So most of what you know from Shiny for R is transferable, like how reactivity works, when to use reactives versus observers, when to use a module, how to style applications. And then the wonderful thing about today's world is that ChatGPT can really fill in the rest. So if you know how to use Shiny, but you don't know how to draw a plot in Python, like me a little while ago, you can use ChatGPT to translate your ggplot code into Seaborn or Plot9 or whatever you want. So this kind of, like, I think is a perfect value-based learning tool, right? To get you over that hump and give you that confidence that you can actually build it, right? It gives you value, provides integration options, gives your employer value because it does things that they can't really do right now. And there's a very fast time to value because most of the knowledge that you maybe already have from the R framework is transferable over to the Python framework. You're already an expert. If you're an expert in Shiny for R, I really do believe that you're already today an expert in Shiny for Python.

If you're an expert in Shiny for R, I really do believe that you're already today an expert in Shiny for Python.

So everybody go tell Jared Leonard that. So we're still in the early stages of sort of developing this project. So I just wanted to give a little bit of call out for things that you can do if you want to jump in. So joining the Discord is great. Porting an R Shiny extension over to Python. This is a very easy thing to do and provides a lot of value because you already have the API that's working. You already have the JavaScript which you can use. And it's really just translating the R functions to the Python functions. And if you're at an organization that has a large Python group or just a small Python group and wants to talk to some of the Shiny for Python team, feel free to contact me and book a webinar. And thank you.