GitHub Copilot in Rstudio, it's finally here!

Transcript#

This transcript was generated automatically and may contain errors.

and I'll paste it again. So I just pasted the short link to these slides in the chat for folks joining after the fact. It is pos.it, posit, slash rstudio -copilot, and that will get you to a copy of these slides that will be up in the future. So yeah, thank you so much for having me, excited to be here and talk about new things in RStudio and integrations such as GitHub Copilot.

I joke that this could really be a one-page presentation in terms of GitHub Copilot is in RStudio now, it's finally here, mic drop, walk away, that's the end of it. But of course, like any tool, you need to learn a little bit about how to use it best and be productive with it as opposed to just use the tool and kind of not know how to use it the best way.

So this is a talk that officially closes one of our most highly requested features ever, GitHub Copilot integration with RStudio, which is issue number 10,148. It had over 500 upvotes and was, again, the most popular feature request we've ever had for the RStudio IDE. So really excited that we're able to deliver this in the most recent release as a preview feature in RStudio 2023.09, which just came out about a week and a half ago.

What is generative AI?

But before we kind of get into Copilot and just saying it's available, you know, that's great, but we need to kind of figure out what is Copilot and what does it mean in the broader sense of generative AI or large language models and what does it all mean together? So let's first talk about what's generative AI. Generative AI is a set of category of models and tools designed to create new content, such as text, and that's what we're using for Copilot. But it could also be things like images, code. And generative AI is ultimately using a variety of techniques to identify patterns and then generate novel or new outcomes based on them.

While I was preparing for this talk, I used a different AI tool called MidJourney, which is used for creating images or graphics. So my dog, my Boston Terrier, Howard, is my little Copilot that's always with me as I'm coding or doing development or managing products here at Posit. And so I use this prompt or this kind of set of text, send it over to MidJourney and said, hey, give me back an image that is described by this. And if we look at it, you know, what we're really asking for is a seated robotic Android Boston Terrier wearing Pilot goggles. So this is my little Copilot buddy. And so in the span of, you know, 10 or 12 words or so, I can get a remarkably complex output out of that. And this is kind of the promise of generative AI is taking something small and all this context that it was trained on and creating something new for you to use.

What we're here to talk about, though, is more about generating text or in this case, code as a specific type of text. So generative AI for text, what it wants to do here is really just predict the next word or the token or a string. So you might have, say, like an iPhone and you're typing out and you see at the top of your keyboard that it's inputting like, oh, it's predicting the next word. You can kind of at a basic level, think of generative AI for text as a super powered version of that. So I might go to ask something like chat GPT, hey, complete the sentence every good. And so it's like, OK, well, what does every good mean?

And it comes together and says, oh, well, the next word after every good is thing. And then it's must come to an end. So as it builds up this context, it starts with just complete the sentence every good. And it has kind of a lot of variation about what the next word might be. But then as it gets more context about what it's predicted, it gets to these high, high probabilities of it is predicting something down this very specific path.

So this general idea of take something small, keep adding to it and predicting this next token or this next word or this next snippet of code, and then building upon the context of what's come before it is largely how some of these tools can work. And Copilot uses a similar approach to generate code and not just text.

Copilot vs. autocomplete

As far as what Copilot is, if we ask the Copilot developers and their documentation, GitHub Copilot is an AI pair programmer that offers autocomplete style suggestions as ghost text. And this is based upon the context of the surrounding code that is in your script or that it's been trained on. Now, I want to differentiate a little bit, because if you think about RStudio, it's had autocomplete for forever. It's got rich autocomplete and helps you do things and be more productive as you're typing out your scripts. Ghost text is a new type of context or a new type of way to create these predictions.

So we think about just autocomplete. It's parsing the code in the environment. It'll take that code of where you're typing exactly where you are and supply a list of possible completions. They have to be possible based upon the characters you've typed so far. And this is a static set of completions that has a little pop-up. And it's provided from your ID. It's literally provided from the computer you're working on or on disk. If we compare and contrast that to Copilot, Copilot will still parse the code in the environment, but it has billions of examples of training data and billions of examples of code that has been trained on that can be incorporated into the prediction, that next token or that next script that's trying to spit out. And importantly, it's supplying a list of likely completions. They don't even have to be possible. It could be something completely outside of your script, but it could be helpful for your problem.

And it's not a static set of completions, but rather a dynamic set that's kind of non-deterministic and is prepared and delivered via ghost text as opposed to a pop-up. And ultimately, Copilot is calling out to API endpoints. It's a generative AI tool provided via that API endpoint.

So if we think about Autocomplete versus Copilot, Autocomplete might look something like this. I have an R script that's in RStudio. I start typing mean. And so I type M-E-A to take the mean of this value. And it will do this Autocomplete pop-up showing me what's possible. If I compare this to Copilot, it'll not only complete what's possible, but also the context within the script. So we look here on line one, I have a comment, take the mean of the MPG column in the empty cars dataset. So here with only typing one letter M, it automatically not only completes the function I'm trying to call, but also the data that I've indicated I want to use.

So this is a possible or kind of a example solution it might give. And it's doing more than just Autocompleting the text that's available. And if you look at it, this is ghost text. That's a little light gray differentiation here. If we look at this, the first letter is what I've typed. And then the remainder of this word is the ghost text that pops up. So this ghost text, if we look at a bit longer example, might look something like this, where in the same script, I've said, oh, well, I don't want to just use mean and a dollar sign to get my column. I want to group by another column and use dplyr to do that. So in this case, I'm providing the context of a comment about what I'm trying to accomplish. I've also written one line of code, which is just loading the dplyr package. And then Copilot will generate ghost text trying to solve the problem that I've prompted it with, or this context that I provided. So from line five on, this is all Autocompleted from ghost text via Copilot all at once. So a multi-line output that it's providing. So it's not just limited to small, short snippets, but can do a little bit more complex. And importantly, this ghost text doesn't actually exist in the document until you accept it. So if you don't like the output that it predicted, you can back out and start typing something else and get another prediction out. And you can keep working or write your own script and ignore what Copilot is giving you.

And importantly, this ghost text doesn't actually exist in the document until you accept it. So if you don't like the output that it predicted, you can back out and start typing something else and get another prediction out.

So ultimately, when you send a question through chatter to one of these back end models, yeah, you have the question, which is the actual thing you're trying to solve. But in addition to your question, it's enriched with the path to data files, the data frames you have loaded in the environment, some additional prompts like use these packages or use these books, as well as the chat history.

So that's kind of a summary of the couple different models you can use in RStudio. You have chat GPT style chat interfaces with chatter. You have GitHub copilots available inside RStudio for ghost text. And again, if we want to be successful with these tools, we want to follow our simple specific and use comments or S2C. Importantly, GitHub Copilot is an optional integration. So if you don't want to use it, you don't have to use it. And it's available as a preview feature, essentially like a public beta in the 2023.09 release of RStudio and workbench.

So this is available. You can enable it if you want to, and you will need a subscription to activate it. If you have feedback or you've run into bugs using Copilot, you can open up a GitHub issue on RStudio repo. And don't forget about the chatter package if you want to do more of the chat style interface with either remote APIs or even locally hosted models such as Llama. And if you have other backends, you can always open up an issue to ask for how do you interact with that.