
Sean Lopp | R & Python: Going Steady | RStudio
While there has been a lot of excitement about the R and Python love story, there are still misconceptions that individuals, teams, or organizations must pick between R or Python. This talk will explain why this false choice exists, debunk the myths that cause teams to be stuck with only one tool, and clarify how data scientists can use both languages to be more effective. We will explore this love story's blossoming relationship by looking at updates to RStudio's packages and products that make it easier to develop and collaborate in R and Python. This talk is for individuals who want to uncover the benefits of multilingual data science, IT professionals who are skeptical their life can get better by supporting more languages, and data science managers interested in enabling their teams instead of forcing their data superheros to be subservient to particular tools. About Sean: Sean has a degree in mathematics and statistics and worked as an analyst at the National Renewable Energy Lab before making the switch to customer success at RStudio. In his spare time he skis and mountain bikes and is a proud Colorado native
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, my name is Sean. I'm a product manager at RStudio. As a product manager, I get to talk to data science teams, both small and large, all over the world. And unfortunately, in interacting with these teams, we often hear that they're confronted by a false choice to either pick R or Python.
This choice might occur when they're writing a job description to hire a new data scientist. It might occur when they're trying to decide what to learn as individuals who want to upskill themselves, or it might occur when they're interacting with IT, asking for resources to support a project.
So why do teams face this false choice? Do we really need to pick between one language or another? Well, the answer is no. And today we're going to talk about why. We're going to look at some of the common myths that lead to this choice, how to debunk those myths, and ultimately how data science teams can be most effective when they choose to use R and Python together.
The screwdriver analogy
Now, to set some context, imagine you're a craftsman, someone who is handy and builds things. As a craftsman, you'd probably be familiar with screwdrivers. I mean, who hasn't used a screwdriver to put something together? And you might be aware that screwdrivers come in a variety of different shapes and forms. You can have a flathead screwdriver, a star bit, a Phillips head, and each of these screwdrivers is designed to serve a specific purpose.
Now imagine as a craftsman, if you were told that you have to choose for the rest of your career between using one type of screwdriver or another, you'd probably look at that person and say, that's crazy. There's no way I can practically make this choice. For some projects, I'm going to need a flathead. For other projects, I'll need a Phillips. It doesn't make sense for me to pick one screwdriver to use for the rest of my career.
Instead, what you might do as a craftsman is to say, I want to opt in to using a smarter tool, a tool that's going to be more powerful and allow me to take advantage of all these different bits that are out there. Specifically, as a craftsman, you might be interested in something like a drill, a tool that regardless of what bit you're going to need for a project, allows you to work faster and to accomplish more and allows you to work in an easier fashion.
So craftsmen have this drill. What about data science teams? Well, I believe as data scientists, we should refuse that same false choice between R and Python and other languages, just as the craftsman refuses to pick one type of screwdriver. Instead, we should work alongside of folks in IT and the leaders of our teams to build something like a drill, something that regardless of what language we use, is going to give us the power to accomplish our projects faster and easier.
Debunking myth one: more languages means more work
But first, I want to address some of these common objections that you'll hear. People that say, no, no, no, there's no such thing as a drill for data science. We have to pick a single language or a single screwdriver. So where do these objections come from? What's the biggest objection that data science teams face?
Well, the first one is this belief that if we are to support more than one language, we'll end up doing a lot of duplicative work. So for example, if I were a team that wanted to use R and Python, IT might be worried that now I have two times the amount of work to do. Instead of supporting one language, I now have to support two. That means twice the number of installs, twice the number of support tickets, twice the money spent on IT resources.
And luckily, this line of thinking simply isn't accurate. The reason for that is because regardless of what language you use, the core things that IT needs to support are the same. Things like computation, logs, authentication, security, data access. These provide a common core, that drill, that we can invest in regardless of which drill bit, which language, we end up using.
These provide a common core, that drill, that we can invest in regardless of which drill bit, which language, we end up using.
Development: RStudio Server Pro
So the first thing I want to take a look at is in the development space. When we're data scientists going to write code in either R or Python, often what we want is to choose an editor that's purpose-built for one of those languages. So in R, that might be the RStudio IDE. In Python, that might be something like a Jupyter Notebook, or more recently, Jupyter Labs, or perhaps I want to use a full-fledged development environment like VS Code.
Now IT might be thinking, oh no, that means we have to support all these different editors, all these different environments. But luckily, that's not true. With tools like RStudio Server Pro, you can install a single infrastructure that supports those different editors. And so what you're seeing on the screen here is the homepage of RStudio Server. When a data scientist enters through that common front door, they're able to pick the different editors that they might want to use for a certain project.
So IT only has to stand up one server environment, they only have to do one set of configuration, they only have to supply data access to one common entry point, but a data scientist is still able to use whatever editor makes the most sense for their project. You'll also see here that we're taking advantage of cloud-native tools like Kubernetes to provide elastic scale, and a Docker backend to provide explicitly those dependencies that we might need for a project.
Production: RStudio Connect
Now that's development. What about production? What about when it comes time to create these different artifacts and share those with others? Well, luckily, the same concept applies. So at RStudio, what that looks like is a tool like RStudio Connect. So RStudio Connect allows you to deploy a wide variety of data artifacts, regardless of what language they're written in.
So for web applications in R, that might be something like Shiny. For web applications in Python, that might be something like Dash, Streamlit, or Bokeh. But regardless of which of those engines you choose to use, RStudio Connect allows you to quickly deploy them onto a web server so that you get a URL that you can share with others. And Connect takes care of things like authentication, security, logging, and scale.
Similarly, if you wanted to create an API, say you have a model that you want to expose to other services, well, that exposure might take place in R through a package like Plumber, or might take place in Python through a tool like Flask. But either way, Connect provides that common core infrastructure, that drill. Finally, the same thing applies for automated reporting. So in the Python side, that might be Jupyter Notebooks. On the R side, it might be something like R Markdown. Regardless, again, we're able to deploy those things to Connect.
And Connect can handle things like scheduling those notebooks to be re-rendered on a regular basis, emailing stakeholders with the new results, and even customizing those emails so that you can send exactly what you need to a stakeholder's inbox, regardless of what language you're using for a project. And again, IT is only setting up this infrastructure once. They're not doubling their work just because you're multiplying by five or tenfold the number of different data products that you can use.
And as a data scientist, that flexibility is key. When it comes to communicate your work, getting buy-in to the modeling that you're doing, or working alongside of a domain expert in real time in a meeting, you need to have the flexibility to use all these different tools to be an effective data science team.
Debunking myth two: multilingual teams can't collaborate
Let's talk about a second one, which is that multilingual teams can't collaborate. Imagine you are a data science team leader, and you have three or four data scientists on your team that have built out something really awesome in Python. In fact, I was speaking to a data science team leader named Wayne recently, who was exactly in this situation. But he needed to hire someone new. His context was a marketing project. He really wanted to hire someone with marketing domain knowledge.
But unfortunately, he couldn't find that candidate who had both the Python expertise and the marketing background. But he did know a few people who had strong marketing chops and knew a little bit of R. Is he presented with a false choice here? I think so, because it turns out that even if your code base is mostly in Python, you can bring someone who has a background in R up to speed really quickly. And again, it comes down to having those right tools, a common drill bit, to allow for that type of inter-team collaboration.
And so I want to quickly show you what investments we've made in RStudio to make that type of collaboration possible. So what we're looking at here is the RStudio IDE. And the new version of the IDE has a what-you-see-is-what-you-mean editor for R Markdown. And inside of that editor, you're able to incorporate both R and Python code chunks. So here you're looking at Python code that we're able to run, but that code takes advantage of objects coming from R. See the r.cars object.
So once we have that Python code executed, it's actually going to write objects back into the environment that we can see in the IDE. So in the environment pane in the object explorer, we can view that pandas dataframe we created. Once we're happy with the result that we have in Python, we can use the what-you-see-is-what-you-mean editor to insert an R code chunk that takes advantage of that Python dataframe. So just like we used the R object in Python, we can use the Python object in R and create a plot with ggplot2 of that Python data.
Now, this interoperability extends beyond R Markdown through the reticulate package. So here we have a shiny application that's going to take a Python function and actually source it and make it available to the R engine without any conversion. It's pretty magical. We can then call that Python function inside of our shiny app, and that allows us to build something like this, which is a shiny application where the front end, all the input and controls are driven by shiny, but the backend simulation is occurring in those Python functions.
So we're able, as a multilingual team, to really glue these two things together, whether we're bilingual or not. We can take advantage of a code base written in either language to achieve a really unique outcome. And that's what debunking these false choices is all about. It's finding ways to combine the strengths, the best of both languages, as well as the best of our team to come together to create something faster and easier than we could if we were in isolated silos with either language.
Optimizing for people, not tools
And that leads me to my final point, which is that we need, as data scientists, to optimize for people, not for tools. What do I mean by that? Well, think about the story of Wayne trying to hire, as it turned out, her name was Vicky, a data scientist coming from the R space instead of the Python background. Because Wayne was able to set aside the distinction in tools, he could come up with a data science team that together was more effective than if he had limited his job criteria to only people that knew Python.
And so we see that story play out time and time again. At the end of the day, when we talk to data science team leaders, their most important asset and their most expensive isn't a particular tool or platform. It's the data scientists that they're hiring to solve complex problems.
Said another way, one of the big reasons organizations invest in data science in the first place is because black box solutions like Tableau or Power BI simply don't meet the needs of the rich, complex real world problems that they're facing. So why would it make any sense for those same data science teams that have recognized they need the flexibility and power of code to achieve outcomes to then sabotage themselves by limiting themselves to only one language? It simply doesn't. It's a false choice.
Said another way, one of the big reasons organizations invest in data science in the first place is because black box solutions like Tableau or Power BI simply don't meet the needs of the rich, complex real world problems that they're facing. So why would it make any sense for those same data science teams that have recognized they need the flexibility and power of code to achieve outcomes to then sabotage themselves by limiting themselves to only one language?
And so today, my final plea is to pick the people that will make your data science team effective and then supply them with what they need. Don't make people subservient to tools. It should be the other way around. Allow data science teams to pick whatever language or tools can be most effective. And to help you do that at RStudio, we've invested in building out drills so that regardless of what drill bit you need, you can effectively build something faster and easier. Thank you so much for your time, and I look forward to your questions.
