Jonathan McPherson | New language features in RStudio | RStudio (2019)

Transcript#

This transcript was generated automatically and may contain errors.

Good afternoon everyone, by good afternoon I of course mean good morning. My name is Jonathan McPherson, like she said, I am a software engineer on the RStudio IDE and today I'm going to talk to you about some of the new language features that we've built into RStudio 1.2.

Before I do that though, a bit of background. So it's been our experience that as people work with R, they are often working with more than one tool at the same time and they're creating projects that have bits and pieces that are composed of different languages. For example, if you're working with an R project in RStudio, you might need to get some data from a SQL database and so you need to open up another window to create a SQL query and look at the data there so you can analyze it in R and perhaps by the time you're done with this query, you think of the fine work that Wes McKinnon did on Pandas and think maybe you'll use that to analyze the data so you open up PyCharm and then you remember that you've actually been asked to visualize this data in a bubble chart which you can't think of an R package to do so you start visualizing that in D3 and then you start thinking about modeling and you start browsing Wikipedia for modeling languages like Stan and then you play that game on Wikipedia, have you ever played that game where you click on the very first link of every article and you always wind up at philosophy?

Every time. Try it out. And then the next thing you know, you're at the zoo and you are shaving a yak.

So this is, this will not do. So this is the kind of problem, one of the kinds of problems we are trying to solve with RStudio 1.2.

So our goals for RStudio 1.2 are many but one of them is to be a more comprehensive workbench for your R projects. So we're going to embrace some of the languages that we commonly see people use in their R data science projects and we're going to make the interoperability between those languages a lot more seamless so that you're not doing so much context switching. If you think about that slide that I just showed with all those windows, like every time you have to switch between tools, switch between windows, every time you have to import data from one tool, export it into another tool, that takes a lot of your time and mental energy and kind of breaks you out of the state of flow while you're working on your data science project. So we're hoping that we can take some of these workflows and make them really easy to do seamlessly right inside of RStudio.

We also have some non-goals. We don't want to make one of these things. We are not trying to become a general purpose IDE. We're not trying to fully replace your dedicated tools or lose focus on R. So RStudio is and always will be first and foremost a workbench for data science with R.

So RStudio is and always will be first and foremost a workbench for data science with R.

So here are our agenda. We're going to cover just a few languages. They are as follows, SQL, Python, we'll do D3, we'll do Stan, and if we have time at the end, I'm going to show you a couple of the other fun things we've added to RStudio 1.2.

SQL demo

So for most of the rest of this talk, I'm just going to do a live demo of some of these new language features in RStudio, and I'm going to start with SQL. This will look pretty familiar if you went to the keynote this morning. So one of the things that we built into RStudio is a data connectivity, and here I'm connecting to a database for a record store. For those of you who still remember what a record store is.

So this database has information about all of the records that we sell, all the artists who made those records, that is information about customers, our employees, how much people have paid for the records, et cetera. So I'm going to start out by creating a simple SQL query report that tells me all of the albums that I sell and which artists created those albums. As you can see, this data is currently in separate tables.

So I'm going to start by pressing this new SQL button you'll see here. And we'll call this Albums, okay, and you'll notice right away that RStudio has suddenly become an interface that should be pretty familiar to anyone who has spent time working on SQL queries. Notice I have a list of tables over here, I have a query over here, and then I have the results of the query right here. And it's quite easy for me to work on the query and to get real-time feedback about the results of the query here.

So I just selected from the artist table as well. As you can see, this has a Cartesian join result, which is not at all what I want for this query. So I'm going to add a where clause. And you'll notice that as I'm typing, I'm getting real-time autocompletion of the results of the fields in the tables. So I can say where albums, and I can get autocompletion there to artist ID equals artists.artistID. And there we go. So now I've got a quick little SQL report that tells me all of the albums that I sell and all the artists who made the albums. Again, fairly straightforward, but really easy and very fluid for me to do this without opening 12 windows to look at my schema and my database and my results.

Python and reticulate in R Markdown

The next thing that we're going to do is we are going to look at a little notebook that I've created. So this is a notebook that does a little bit more in-depth analysis. It's going to help me figure out who the top customers are at my failing record shop so I can figure out how I can get more money out of them.

To start off with, we're just going to connect to the SQL database, and I'm going to use the reticulate package. Now, a lot of the things you're about to see are a direct result of the reticulate package. This package doesn't require our studio to use. It was put together by JJ and Kevin, and it powers a lot of the Python interoperability stuff you're about to see.

But before we get to the Python stuff, I wanted to note that all of the really nice things that I just showed you are not only in the SQL window. They are also available to you inside of our markdown documents. Here I have a SQL query, and what this query does is it simply goes out and gets me a list of every customer and every invoice that that customer has ever had. So all that nice stuff that I just showed you is still available, right? So I can get the same kind of auto-completion that I'm used to.

And I actually want to save this query into an output variable so I can use it later. So now I'm going to use Pandas. I'm going to use Pandas to summarize the data that I just created. So I made a list of invoices, now I'm going to use Pandas to group together and summarize that data to figure out who has spent the most total money on invoices. Let's go ahead and run that. And you can see here are my top five customers.

Now I wanted to show you about a couple of things that you can do while you're authoring these chunks. First of all, you get the same kind of auto-completion you do in R. So just like you can type library R in R and you can get a list of your R packages, you can type import in a Python chunk, you'll get a list of your Python packages. You also get the same kind of method completion that you get inside of R. So for instance, we've got variables, we can see the names of the functions here. If we use a function, we can actually get help for that function right in the help pane. You can jump to the definition of that function right in its Python file. So we think that these capabilities will make it really easy for you to very fluidly and naturally author Python chunks inside of R Markdown or other bits and pieces of your R project.

The other thing I'd like to point out here, and some of you who are a little bit more astute might have already noticed it, is that I didn't have to do anything to get that data from SQL into my Python chunk. And the reason is this. Notice that when I created an output variable here in SQL, it went into my global environment right here. See, here's that data from SQL. And the only thing I needed to do to reference that data in Python is this. R.spinning. That just takes the variable from R called spinning and allows me to use it right inside my Python code.

In previous versions of RStudio and R Markdown, Python chunks basically ran in a vacuum. You had to import the data at the beginning of the chunk from wherever and save it at the end. And that's no longer necessary. Reticulate actually embeds a Python session inside of your R session. So it's very easy and seamless to get data in and out of Python.

Reticulate actually embeds a Python session inside of your R session. So it's very easy and seamless to get data in and out of Python.

Speaking of getting data in and out of Python, it's also quite easy to do the opposite of what I just showed you. Just like you can say R dot in Python, you can say py dollar sign in R, notice here I'm getting a nice realtime autocompletion that not only tells me what data from Python is available but also gives me some information about that data. So it's here I can run that chunk and now I've got the summarized data from Python.

Now let's do a little bit of visualization. This is something you also kind of saw hinted at in the keynote. I'm going to use Matplotlib again right inside of RStudio to visualize these top five spinners at my fancy record shop.

Now another thing I want you to notice here is that each one of these Python chunks is running in the same Python session. In previous versions you basically had to run each Python session independently. Every one of these chunks would start a new Python session, run all of the code, and then quit. So now that we have an embedded session, these things can build on each other. You can actually, just like you can with R chunks, you can build seamlessly chunk to chunk inside of your Python chunks.

Jonathan McPherson | New language features in RStudio | RStudio (2019)

Transcript#

SQL demo

Python and reticulate in R Markdown

D3 visualization

Python scripts and Stan

Other new features in RStudio 1.2

Summary and Q&A

Featured software#

rstudio

rstudio-conf