Resources

Sean Lopp & Lou Bajuk | R & Python: A Data Science Love Story | RStudio (2020)

Many Data Science teams today leverage both R and Python in their work, but struggle to use them together. Data Science leaders and their business partners find it difficult to make key data science content easily discoverable and available for decision-making, while IT Admins and DevOps engineers grapple with how to efficiently support these teams without duplicating infrastructure. Even experienced data scientists familiar with both languages often struggle to combine them without painful context switching and manual translations. In this webinar, you will learn how RStudio helps organizations tackle these challenges, with a focus on some of the recent additions to our products that have helped deepen the happy relationship between R and Python: - Easily combine R and Python in a single Data Science project using a single IDE. - Leverage a single infrastructure to launch and manage Jupyter Notebooks, JupyterLab, VSCode and the RStudio IDE, while giving your team easy access to Kubernetes and other resources. - Share and manage access to R- and Python-based interactive applications, dashboards, and APIs, all in a single place. Webinar materials: https://rstudio.com/resources/webinars/r-python-a-data-science-love-story/ About Lou: Lou is a passionate advocate for data science software, and has had many years of experience in a variety of leadership roles in large and small software companies, including product marketing, product management, engineering and customer success. In his spare time, his interests includes books, cycling, science advocacy, great food and theater. About Sean: Sean has a degree in mathematics and statistics and worked as an analyst at the National Renewable Energy Lab before making the switch to customer success at RStudio. In his spare time he skis and mountain bikes and is a proud Colorado native

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Today, Sean and I are going to be talking about R and Python, a data science love story and why we're very excited to bring this news to you.

So starting off, giving a bit of an overview, we'll be talking about why we're really excited to bring R and Python together in RStudio and provide it to the data science teams that we work with.

In this webinar specifically, what we're going to want to show you is how data scientists can use R and Python together on a single data science project, how DevOps and IT can support this bilingual group of data scientists in their development efforts in a single development infrastructure, and how these bilingual data science teams can efficiently collaborate and share their work with their business stakeholders so that the insights that they generate can actually be used to make better data driven decisions.

We'll wrap up then talking a little bit about our ongoing community investment from RStudio in both the R and Python communities. And then we'll talk about how you can get more information on any of the points we discussed today, as well as answer as many questions as we can.

RStudio's focus on integration

So over the years, the focus of RStudio has been how do we help organizations really realize the full value of their data science investments, complementing the large investments in open source data science that we've made at the beginning of our company.

As part of that, it's been in our DNA to really focus on how do we integrate with all the tools and environments that data scientists need to use to do their work.

Early on, we introduced Shiny, which is a way of building interactive web-based visual applications using pure R code. We also have done a lot of integration with Spark through our Sparkly R package. And we've also done a lot of work to make it easier to access data, use SQL and integrate with databases. So all along, we've had this focus on integrating with all the tools that a data scientist might need.

In the last few years, as we've talked to our data science customers, we've seen that there are a lot of bilingual data science teams, a lot of teams that really need to use both R and Python together in order to exploit the best of both environments.

And so with the last couple of years, we've added a focus on integrating with various capabilities on the Python side. That started off with integration with TensorFlow, which led to our development of the reticulate package as a way of providing general integration with Python from the R language, which Sean will be talking about in a moment.

And then most recently, over the last year or so, we've introduced a number of new capabilities around Python, including the support for Jupyter Notebooks and JupyterLab within our commercial products, which Sean will be talking about.

Challenges for bilingual data science teams

So in talking to our customers and other data science teams out there, we've seen a lot of common challenges for these bilingual data science teams, a lot of the same problems that they struggle with.

Data scientists, for example, are focused on using and finding the best tool for the job. And when they're trying to use R and Python together, that means they often need to switch context between these multiple different environments. And they struggle with just the cognitive switching and doing that.

To support that switching, DevOps and IT have to spend time and resources to maintain and manage and scale these separate environments for R and Python in a cost effective way, often struggling with that because they might not have the skills and the open source tools that underlie both environments.

Data science leaders, the people who manage and really try and maximize the value and efficiency of these data science teams, wrestle with how to share results consistently and deliver value to the larger organization because ultimately the data science team is there to provide insights so they can help the organization make better decisions.

The data science leaders also want to focus on how do we provide the tools for collaboration between these R and Python users? Because if these users are siloed, then they might end up reinventing the same algorithms, the same processes, the same analyses in these different environments and wasting time and effort doing that.

And then the business stakeholders are ultimately not interested typically in the underlying details in whether or not the data science products that they receive are based on R and Python. They just want to make sure that data science is credible and that they're able to leverage those insights in order to make better decisions.

So these are the types of challenges that we've seen throughout many, many conversations with different data science teams. And so we're very excited today to bring you a view into how we help solve those challenges. And with that, I'll hand it off to Sean.

Combining R and Python as a data scientist

Awesome. Thank you so much, Lou. And again, thanks everyone for joining. We're going to start by looking at some of the ways that a data scientist might combine R and Python. And this is something that's near and dear to my heart because before joining RStudio, I sat exactly in this seat.

I was working with some really amazing engineers who were doing all of their stuff in Python. I came from more of a stats background, so I was more comfortable with R. And I found myself spending a lot of time on pretty tedious tasks, running Python scripts, saving data to disk, loading it back up in R to make a plot, debugging when type conversions didn't work, just things that weren't very fun.

And I'm excited today to show you, as a data scientist, how in RStudio with the open source reticulate package, you can really work around a lot of that and very efficiently combine these two languages together.

So we're going to start with a couple of examples. So I'm going to switch over to RStudio and actually want to show you a shiny application. And this is an application I inherited from that prior life.

Basically what it allows someone to do is compare the fuel economy of two different vehicles going from point A to point B. And so maybe as an example, we could look at a Ford Explorer versus a Toyota Highlander hybrid.

A simulation occurs behind the scenes here. And then we get the result. In this case, the Ford Explorer uses more gas than the hybrid.

So this is an interesting application because all of that simulation is occurring in Python. In fact, the code that I inherited from my engineers is this really huge thousands of lines of Python code that handles the simulations here.

And what we struggled with before the reticulate package was how to make this code accessible to others. No one on our team had the skill to build out a full-fledged web application framework in Python. And so what we would end up doing is taking requests from different stakeholders. We would run these functions manually, and then we would return the results. And it was a really time-intensive and laborious process.

So when the reticulate package came out, I was excited to be able to build using the amazing tools available in R around Shiny. I was excited to build this type of application that would allow those stakeholders to run experiments interactively themselves.

So I want to show you a little bit of the code just so you get a feel for what this looks like. Inside of the application, we're using the reticulate package to source this Python file.

And this Python file has a whole bunch of functions inside of it that do all the heavy lifting. You can see one of them here is a sim drive.

And once you've sourced that file, all you have to do is call it, call that function as if it were an R function. You can see that call here.

And the reticulate package takes care of all the dirty work for you. So you don't have to worry about managing a subprocess yourself. You don't have to deal with type conversions. Our data frames on the R side are going to become those dictionaries that the Python function needs. And then the Python function is going to return results that are nicely handled back into a tidy data frame.

And so a lot of that heavy lifting is done for you. And it becomes really seamless to get started calling that Python code from the R interface.

And it becomes really seamless to get started calling that Python code from the R interface.

Now, when you're doing that type of work, combining R and Python together, it's not only enough to be able to do that seamlessly, but you also typically don't want to have to context switch.

And so that leads to the second thing I want to show you, which is that inside of RStudio, again, based on the open source work that we've done with the reticulate package, you're able to interact with Python files directly. And so this is a pretty simple Python file.

And what I can do inside of RStudio is execute the code kind of line by line. And in this case, we're using NumPy and matplotlib to create a bivariate distribution plot. You can see the matplotlib image shows up in the plots panel.

Our Python code is being executed line by line in the console. And we even get some of the really nice ergonomics that you would expect from an IDE, such as autocomplete in the Python context, as well as help.

And this is really exciting because if you are inheriting those Python files or gluing them together with R code, it can be really nice to be able to interact with the Python files in one place without having to switch contexts to a totally different editor or a different tool.

So RStudio, by no means, is going to be a full-fledged IDE for Python development at this time. But in my experience as a data scientist working with both of the languages, it meets my needs and has a lot of the things I'm looking for.

So that's how you might interact with a Python file directly. There's one other integration that we built into the IDE to make working with R and Python together pretty seamless, and that's through R Markdown.

R Markdown, if you're not familiar with it and you're coming from the Python side of the world, it's very similar in its goal to a Jupyter notebook. It allows you to do what we call literate programming, where you're going to combine text and prose alongside the output of code, all in a single scientific notebook or computational document.

That's what R Markdown allows you to do, but it's a little bit of a misnomer, R Markdown, because you can actually use a whole bunch of different languages inside of an R Markdown document. Of course, as you might guess, we're talking about Python today, so what I'm going to show you is how you can use Python and R inside of this document.

And we'll start by inserting a Python code chunk, and then let's load some data from the Seaborn Python library. And specifically, we're going to look at our favorite iris dataset.

You can see the IDE's autocomplete at work, helping me get these function calls right. And then we're going to use some Pandas syntax to subset this dataset that we've loaded, and specifically, we're going to look at one species of iris called Cytosa.

Actually, let's not overload our variable names here. We'll call this Cytosa. And we can execute this right inside of RStudio.

In the Python context, we can double check that we got things right here. And so, this does look like the iris dataset with the Cytosa species. So, we're looking good so far.

What I want to show you, and where things get kind of magical and really exciting, is that from our R code chunk, we can now access everything we're just doing in the Python environment.

So, specifically, let's load the ggplot2 library. And we're going to take a look at plotting this dataset.

And the way this works is there's a special object in our environment called pi that gives us access to the Python environment and all the variables or data that is in that environment. So, here we have pi $Cytosa is going to give us access to that Pandas data frame.

And we can go ahead and plot it. We don't have to, from there, we can do everything as if we were in R.

So, we'll just make a simple plot here.

And what's really interesting about this is it allows us to use the languages for what they're really good at. So, as an example here, we're using Pandas, which is a pretty powerful tool for data manipulation. And then ggplot2, which is really excellent for visualization.

And so, we can run this R code. And, of course, we get an error here. Let's maybe see if we can figure out what's going on with this error real quick.

One of the things that's really nice is with that Python environment, I can actually use some of the IDEs features to explore what's going on. So, as an example, I can view the dataset that's in Python using the view command. And it looks like this is the dataset we saw. It's the iris filtered down to the Cytosa species.

And actually, I think I know what I did wrong. It looks like on the Python version of this dataset, we have underscores. I used periods. So, if we replace those periods with underscores, there we go. We get our nice ggplot2 of the relationship between length and width.

So, those are just a couple of ways that a data scientist can combine R and Python inside of RStudio. All of that is using the open source tools that's going to be available wherever you're running RStudio. At the heart of it is that reticulate package.

The reticulate package uses the Python and R C++ APIs. So, the data conversions are pretty fast. It does use a little bit more memory than if you were using a single language. But it allows you to take advantage of tools in each language. So, you can use R for what it's really good at and Python for what it's really good at.

And just to kind of recap what we saw there, there is a combination of R and Python in R Markdown that we just looked at. We also saw at the very beginning how you can use Python functions and execute Python code in the context of R scripts or functions, which is really useful if you want to build like a shiny app that pulls in Python code. And then you can also just edit and play with Python code directly in RStudio as the IDE.

Supporting bilingual teams with RStudio Server

Now, that's really useful for a data scientist to glue these two languages together. But what we see is that often it's not just a data scientist who has to worry about R and Python and the combination of the two. Especially in organizations, there's another really important team that gets involved with these two languages.

And that's the DevOps or IT administration group that is responsible for creating that environment where data scientists can be productive and have access to data and collaborate with one another.

What we tend to see working with a lot of these organizations as they're adopting R is that it can be a challenge to create an infrastructure that meets the needs of all these different languages.

And what happens in the worst case scenario is that data scientists will just continue doing work on their own desktops where they're most comfortable. And that becomes really hard to troubleshoot.

We've talked to IT admins who have hundreds of tickets in their system along the lines of this version of a Python package isn't installing and so I can't share a notebook with my colleague.

And so what we're excited to show you is RStudio's professional products, RStudio Server and RStudio Connect, which together kind of make up RStudio Team. And these professional products allow the DevOps and IT group to provide that single infrastructure where data science teams can use R and Python together. And they can do it in their favorite tools, whether that's RStudio or Jupyter without resorting back to their desktop.

That makes it easier for them to collaborate and work together. But it also means that IT only has to configure, maintain, scale a single environment instead of dealing with different tools for different languages across different teams.

So I want to show you what that looks like. I'm actually going back to RStudio and one of the things that those of you with really astute eyes might have noticed is that I'm actually using RStudio inside of a web browser, which might be a little bit different, especially if you're used to using RStudio as a desktop application.

The reason that you see RStudio presented in a web browser is because all the computation here is happening on a server. And what that means is that Lou, myself, and the other data scientists at RStudio are all working on a consistent platform. So it's easier for us to share work with one another. It's easier for us to get access to the production data that we use at RStudio.

And kind of one of my favorite parts is that I have a single entry point for all the different types of projects that I work on. So this is the RStudio server homepage. You can see the Shiny application, that project we were just in, is sitting idle now on the server. I have a job that I kicked off a couple of hours ago that's kind of running in the background. I don't have to worry about desktop updates interfering with that job.

And the kind of new thing that we wanted to showcase is that when you go to start a new session on RStudio, you now get to select what editor you want. So, of course, you can choose the RStudio IDE, and RStudio server will run that for you. But you can also pick between things like JupyterLab and the Jupyter Notebook.

I'll select Jupyter Notebook, and we'll kick this off in a second. But before we do that, I also want to call out another choice that's really powerful, which is where you want the computation to actually run.

So traditionally, you would normally run R and Python kind of on the same environment in the same location where RStudio server was running. But in one of our goals to make the server and this infrastructure more extensible and easier for IT, we've added the ability to offload some of that computation to other locations.

And so specifically what you're looking at here is an integration between RStudio server and a Kubernetes cluster. Our Kubernetes cluster happens to be running an AWS, but you can be running Kubernetes on any of the cloud providers or even on premise yourself.

And what this allows me to do as a data scientist is without learning any new tools or new work benches, right from within RStudio server, I can pick the profile of tasks that I'm going to have. So what CPUs I need, the size of memory that I want. And then my IT group has put together a couple of images, Docker images that will control the environment where this Jupyter notebook is going to run.

So this gives a lot of flexibility for our team to specify exactly what we're looking for while still scaling elastically. And that's all inside of RStudio. We don't have to jump between different tools or different environments.

And so if I start this session, you'll see on the homepage, the RStudio ID, as well as the Jupyter notebook, I can fire up this Jupyter notebook and it's going to give me the same files, the same access that I had from RStudio.

And so like any data scientist, I didn't do a great job cleaning up my workspace. I have a whole bunch of things going on here, but I can open up a specific set of examples that I created for today.

All these examples in code are going to be online and we're happy to share them after the webinar. But I just want to show you what it looks like to run a Jupyter notebook here.

And it actually might be a little bit of a letdown because what it looks like is a Jupyter notebook. You're able to execute the code chunks inside of Jupyter. You can see the results. And we've tried really hard at RStudio to give you as short a path as possible to the tool that you're comfortable with.

So we don't require a steep learning curve around anything else. We just give you access to Jupyter right away.

Inside of this notebook, though, we are still in that server side environment, which is really nice because it means that the Python kernels that we have are kind of shared and can be shared amongst different data scientists. So if you're onboarding someone new to Python, they're going to have access to that shared environment that your IT group has set up, in this case, that shared Docker image, so that we're all kind of playing from the same playbook.

And while that's desirable a lot of the time, for advanced Python power users, you still have the ability to create your own kernels, set up your own virtual or con environments for specific projects. So all of that same tooling is available to you.

The only difference between this Jupyter notebook launch through RStudio server and a Jupyter notebook you might run yourself are two buttons that you'll see here. So one is this RStudio icon, and you can tell it really is a love story. The two icons are even sitting next to each other. And this just takes you back, if you want, to your homepage, where you can see all the different projects and contexts that you're working on.

Inside the Jupyter notebook, if we go back to that running notebook, the other icon that you'll notice is this publish button, and we'll talk about that in just a second. So remember that icon.

So to summarize, for DevOps and IT, one of the things that's really critical when you have R and Python users is being able to provide them a consistent entry point that has their favorite tools without introducing a whole bunch of headaches and hurdles.

And RStudio team is an easy way to do that. In a single product, only configured and integrated once, you're able to give folks access to the RStudio ID on the R side, which folks know and love, as well as their favorite tools on the Python side, such as Jupyter notebooks.

And we'll be extending that suite of editors as well, and that corresponds to an extension in the number of backends, too. So I showed you Kubernetes as one way to run and execute this content. There's also plugins if you want to run these different editors on more traditional HPC infrastructure that you might have at your organization.

Sharing results with RStudio Connect

So we've covered data scientists. We've covered IT and ops. There's a third set of important folks that Lou mentioned at the beginning who care about R and Python, and that's the data science team leaders or data science managers and ultimately the business stakeholders.

And we combine these together on a single slide because we believe their concerns are largely two sides of the same coin.

On the one hand, data science leaders, they're looking at how do I onboard new users into these different ecosystems where they can collaborate with the team without having to reinvent wheels or spend time doing translation.

I was talking to one manager at a pharmaceutical company, and they're trying to get more than 500 SAS users into these open source tools. So it becomes really important that as you do those trainings and onboardings, it's a seamless process.

Business stakeholders, meanwhile, they share some of the same goals of the data science leader. They really want to see a return on their investment in the data science team. And ultimately what they care about more so than whether you use R and Python is just getting insights from the team so they can make good decisions and not rely only on their intuition.

And critical to doing that is their ability to interact with the team quickly. We want to avoid slow iteration times and emails back and forth asking for requests between the different team members.

And so the last thing I'll kind of show you is how RStudio team in combination with R and Python allows you to address some of these challenges. And specifically, we're going to look at what it means to take some of those Python and R data products and put them into production on a tool that we call RStudio Connect.

And so I'm going to switch back here to the Shiny application that we started the demo with. And this application, as we mentioned at the beginning, one of the things that's really nice about a Shiny application is that it gives you the ability to allow stakeholders who may have specific domain expertise to play around with your code and run experiments.

The challenge here is that right now this application is kind of trapped on my development environment. It's not easy to share. If I close out of RStudio, the application goes away.

And so what we need to do is move it into production, into a place where others are going to be able to interact with it and rely on it. And in RStudio, it's easy to do that through that publish icon we were talking about.

So all I have to do to deploy this to my production Connect server, which is a server that's running on premise, I'm not locked in to a platform as a service, I'm not sending my data outside my firewall, I'm just deploying to another server with a production intent, is click publish. And what's going to happen is that the R and Python environments are going to be identified, all the different files and packages that are in use are going to be listed, and then sent to RStudio Connect, where that environment is going to be restored and the application is going to be run.

So I'll kind of, like those old cooking shows, this is the ingredients. And I'm going to jump to the end, pulling the final product out of the oven. But this is where the application will live and what it'll look like after it's deployed to RStudio Connect.

And so the application here will load for us. That same application we've been playing with the whole time, but now it's in a production context. And so what does that mean? Well, very critically, it means that I have a stable URL I can share with stakeholders, and then they can go and access this dashboard and not even need to know that it was written in R or in Python. They don't need to know the details. It just becomes a web application.

I also have a lot of those things that IT is going to care about. So I can specify specific users so that this environment is secure and locked down to specific users or groups. I can look at the logs so I don't have to email someone in IT to try to download a log file and send it to me. I can actually look at those and kind of iterate and debug in real time.

And then also, very critically, I can scale. So we've used RStudio Connect to scale to more than 10,000 users at a single time. So 10,000 concurrent users looking at a dashboard that involves both R and Python. So it's really good at managing all of those processes and connections for you.

So that's an example of a Shiny application that is going to allow business stakeholders to get value really quickly and to be able to iterate and run their own experiments. But dashboards and applications aren't the only thing that data science teams create.

In fact, one of the things that we were talking about at the beginning was those Jupyter notebooks. And so you can publish Jupyter notebooks to RStudio Connect. And that becomes a really powerful way for teams to share their work. You can have a notebook that documents your experiment, your thinking, and that'll be available to everyone.

And we have that support for Jupyter notebooks, but also for R Markdown. So it doesn't matter what language you're using. Your work can be shared in a single place that becomes a ground truth for the knowledge that the data science team has.

The other thing that you can do with these notebooks that gets really powerful is once they're deployed to Connect, Connect can run them on a schedule. And so I actually want to show you an example of an R Markdown document. This document is scheduled to run twice a week.

We can look at the historical versions of this document that have been published over time. But one of the things that's really interesting about this R Markdown document is that it's designed to communicate with stakeholders. So essentially what it's doing is every twice a week on Mondays and Wednesdays, it checks inventory supplies. It runs a forecast and a burndown analysis. And then if there's a discrepancy between the inventory and the forecast, it actually uses code in RStudio Connect scheduler to send an email to stakeholders, in this case, the supply team.

And that email is fully customized through the code. So you can include results, you can include plots. And those are going to show up in your stakeholders inbox so they can look at it right on their phone. And that's helpful in those cases where people want to get alerts or monitor something without having to constantly refer back to a dashboard.

That's available for R Markdown documents that are, again, allowing you to combine Python and R together. So it's a powerful way to operationalize some of that work that the team is doing.

But one other piece of content or pillar of content that I want to show you on Connect is how you might go about deploying a model. So often data science teams are responsible for creating models. And we've kind of seen how models could be shared with people in something like a dashboard or a notebook.

But often you also need to share those models with other services. Maybe it's a website, maybe it's an application written in something like Java. But those tools want to consume the smart insights that you've created in R and Python. And the way that you can do that easily is by hosting your model as a RESTful API.

So I just want to give you an example of what that looks like on RStudio Connect. The code for this API, so you can get a sense for what it looks like to actually write it, is something we'll share after the webinar.

But in this case, our model is a sentiment predictor. So essentially, the model takes as an input a string, so maybe something like the word debugging. And we call this model on Connect using a RESTful request. It looks like this. So this is something a software engineer would be really familiar with.

And then the model returns the sentiment, zero being negative if the phrase was not very happy, or one being really positive. In this case, debugging is not very positive. If we send the model something like great beer, turns out that great beer is quite a bit more positive, much closer to one.

And so this model is using R, it's also using Python, a Python library called Spacey is actually what's giving us the sentiment scores. And just like we saw with the application, the Shiny app, RStudio Connect is handling the access controls, the logs, scaling the number of processes to handle requests. So there's a lot of flexibility here as you operationalize these models.

And so to kind of summarize, really what we're after is making it easier for people to make important decisions with data. And that's going to require teams to use R and Python together.

And so to kind of summarize, really what we're after is making it easier for people to make important decisions with data. And that's going to require teams to use R and Python together.

And there are some considerations for team leaders and stakeholders when they set out to do that. The big ones are being able to access those insights consistently and reliably, which is really what production means. And for teams, it's being able to deliver that content regularly and seamlessly.

And so that's what RStudio team and specifically RStudio Connect is designed to do, is to help those decision makers really rely on those data insights and to ultimately get a return on the investment from the data scientists work in any language.

So with that, I'm going to hand things back over to Lou to kind of recap what we've talked about and look ahead into the future a little bit.

Recap and community investment

Thank you, Sean. I appreciate that. That was a great series of demos illustrating the way that we help our customers tackle the major challenges of bilingual teams.

So just recapping what Sean just showed, through our products, we allow data scientists to combine R and Python in a single project using the Reticulate package. We make it easy for these data scientists to launch Jupyter Notebooks or JupyterLab from the same infrastructure where they launch the RStudio IDE.

From the DevOps and IT perspective, we make it possible to provide all this capability in a single infrastructure for both R and Python and Jupyter, which means users can continue to use their favorite tools, whether it's the RStudio IDE or whether it's a Jupyter Notebook.

And very importantly, from the DevOps perspective, they're making their users happy while making their own lives simpler and less expensive by configuring and integrating and scaling and securing a development infrastructure just once as opposed to multiple times.

From the data science leader's point of view, we've shown how it's easy now to collaborate across your team, sharing your R and Python work between those team members, maximizing their productivity, and how to really repeatably, reproducibly deliver value to the business by delivering regularly updated reports, custom results, and self-serve applications through a single portal or directly in your stakeholders' inbox via email.

And from the business stakeholders' point of view, it's possible for them to access all these up-to-date interactive analyses, dashboards, and emails, making sure that these results are current since they can be updated on a scheduled basis, and so that they can get the answers, the insights they need when they want them in order to make better decisions.

All these capabilities around integrating with Jupyter Notebooks, supporting Jupyter environments, supporting these diverse bilingual teams, these are all available in our commercial products. Our commercial products are bundled together in RStudio team and made up of three main components, RStudio Server Pro, which provides the development infrastructure for R and Python, RStudio Connect, which is a platform that allows the data scientists to publish their results to business users and other collaborators so they can use those insights, and RStudio Package Manager, which manages all the complexity around R packages to surface in the other two platforms.

And to date, we've had well over 1,000 large organizations that use our commercial products to solve their day-to-day data science challenges and really scale up their data science usage into production so they can leverage the value of their data science team, really maximize the value of the investment in all the different data science products they've done, whether it's R or Python or Spark or Kubernetes or something else.

Leverage all that value to get better answers, to make better decisions, and really, really maximize the impact of their data science work.

As part, and we've talked now a lot about what's available in our products, both open source and commercial, in addition to that, RStudio for many years has been a major supporter of the R community, and now we're supporters of the Python community as well.

The RStudio community is a great portal on our website for asking and answering questions around open source data science, R and Python, and our products. Great place to get information.

Every year, we sponsor RStudio Come, which is the biggest gathering of open source data science users in the world. We've got our next conference coming up just in a couple of weeks, if any of you are going to be in the San Francisco area at that time, where we have a couple of slots available left in that. If not, we're also going to be live streaming the presentation, and we'll provide that link at the end of the presentation.

The RStudio education team is devoted to helping train the next million open source data science users. We provide capabilities around not only pointers to other learning materials and our own learning materials, but we also provide a train-the-trainer certification so that we can train people in the open source data science community in teaching on R, and really encourage them to scale out and provide training to the others, because we feel like that's the best way to really get the capabilities for using R out across the community.

We're also a member of a number of cross-vendor groups that help support the data science community. We were one of the founders of the R Consortium, which focuses on delivering valuable infrastructure and supporting working groups for the R community.

We're now major sponsors of NumFocus as well, which is another cross-vendor group for supporting investment in open source data science. They're the umbrella organization that provides the primary funding for the Jupyter project.

And we're helping incubate Ursa Labs, providing operational support and infrastructure for this industry-funded development group, which specializes in developing open source data science tools that cross languages, including most recently the Apache Arrow project.

Resources and Q&A

And so before we wrap up and head to questions, just a few links on where you can find more information to get an overview of what we talked about today and links to a lot more detailed information. Your one-stop landing page is rstudio.com slash Python. Lots of information there.

You can also contact us to learn more. That's a great way if you want to get a detailed demo, if you want to get some questions answered or just do a deeper dive. That's the best way to do it. We will get to as many questions as we can today, but there are a ton, which is awesome. We appreciate the engagement.

If we don't get to your question today, we will follow up. The link to the webinar recording, as well as the slides, as well as the scripts that Sean used today, all that will be provided to the attendees. We've got some several questions on that. As I mentioned, our conference is coming up in two weeks.

In the first section of his demo, Sean focused on using the reticulate package to call Python. And so here's a couple of links on the website for the package, samples, and a deeper dive webinar for using that.

We also got a number of questions around configuration and versioning and whatnot. And I'll hand a couple of those off to Sean in a minute. But for some of the deeper information there, again, our documentation is a great source for that. We can provide you information. And as always, the RStudio community is a great place to ask questions.

So with that, we'll dive into questions. As I said, we will do our best to get to all your questions. If we don't, we'll follow up. If you'd like to set up a conversation, this is a great, this link is also in the slides, but you can click on this and they'll give us a chance to directly set up a follow-up conversation with you.

So diving into the questions, as I said, there's a ton, I appreciate it. I also want to apologize to everyone who suffered any audio problems. We had a handful of people with issues on that. If you had any audio problems, then this recording, the full recording will be sent out along with the slides.

So Sean, we've got a number of questions here. One that comes up a few times is, can you clarify the difference between what's available in our open source products and the work we provide there and what's unique to our commercial products around support for Python and Jupyter?

Yeah, absolutely. So we're dedicated to making open source tools available for data scientists. And so everything