Resources

GitHub Copilot in Rstudio, it's finally here!

Thomas Mock, PhD, Workbench Product Manager at Posit PBC. In this webinar, part of a new quarterly R/Med seminar series, Thomas demonstrates how to set up Copilot in RStudio and then provides examples of using it to generate code by providing context and comments. Some key points covered include: - How Copilot works by predicting the next token based on context. - Tips for using Copilot effectively, like breaking problems down simply, specifically, and using comments. - Examples of Copilot generating functions, tests, and repeating tasks. - Using other tools like Chatter to ask questions when stuck. Main Sections 00:00 Intro 01:35 What is generative AI? 04:27 What is Copilot? 08:50 Copilot in RStudio 10:15 Get started 13:01 Getting the most out of the generative loop 15:14 Simple and specific 24:29 Getting stuck? 28:53 {chattr} package 31:17 Generative AI tools with Posit Workbench and RStudio 32:43 Examples using Copilot inside RStudio 51:42 Q&A and RStudio User Guide More Resources R Medicine Virtual Conference 2023: https://www.youtube.com/playlist?list=PL4IzsxWztPdlpR3NqGzUI01M4_jqzIWqo R Consortium https://www.r-consortium.org/ Blog: https://www.r-consortium.org/news/blog Join: https://www.r-consortium.org/about/join Twitter: https://twitter.com/Rconsortium LinkedIn: https://www.linkedin.com/company/r-consortium Mastodon: https://fosstodon.org/@RConsortium

Oct 5, 2023
54 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

and I'll paste it again. So I just pasted the short link to these slides in the chat for folks joining after the fact. It is pos.it, posit, slash rstudio-copilot, and that will get you to a copy of these slides that will be up in the future. So yeah, thank you so much for having me, excited to be here and talk about new things in RStudio and integrations such as GitHub Copilot.

I joke that this could really be a one-page presentation in terms of GitHub Copilot is in RStudio now, it's finally here, mic drop, walk away, that's the end of it. But of course, like any tool, you need to learn a little bit about how to use it best and be productive with it as opposed to just use the tool and kind of not know how to use it the best way.

So this is a talk that officially closes one of our most highly requested features ever, GitHub Copilot integration with RStudio, which is issue number 10,148. It had over 500 upvotes and was, again, the most popular feature request we've ever had for the RStudio IDE. So really excited that we're able to deliver this in the most recent release as a preview feature in RStudio 2023.09, which just came out about a week and a half ago.

What is generative AI?

But before we kind of get into Copilot and just saying it's available, you know, that's great, but we need to kind of figure out what is Copilot and what does it mean in the broader sense of generative AI or large language models and what does it all mean together? So let's first talk about what's generative AI. Generative AI is a set of category of models and tools designed to create new content, such as text, and that's what we're using for Copilot. But it could also be things like images, code. And generative AI is ultimately using a variety of techniques to identify patterns and then generate novel or new outcomes based on them.

While I was preparing for this talk, I used a different AI tool called MidJourney, which is used for creating images or graphics. So my dog, my Boston Terrier, Howard, is my little Copilot that's always with me as I'm coding or doing development or managing products here at Posit. And so I use this prompt or this kind of set of text, send it over to MidJourney and said, hey, give me back an image that is described by this. And if we look at it, you know, what we're really asking for is a seated robotic Android Boston Terrier wearing Pilot goggles. So this is my little Copilot buddy. And so in the span of, you know, 10 or 12 words or so, I can get a remarkably complex output out of that. And this is kind of the promise of generative AI is taking something small and all this context that it was trained on and creating something new for you to use.

What we're here to talk about, though, is more about generating text or in this case, code as a specific type of text. So generative AI for text, what it wants to do here is really just predict the next word or the token or a string. So you might have, say, like an iPhone and you're typing out and you see at the top of your keyboard that it's inputting like, oh, it's predicting the next word. You can kind of at a basic level, think of generative AI for text as a super powered version of that. So I might go to ask something like chat GPT, hey, complete the sentence every good. And so it's like, OK, well, what does every good mean?

And it comes together and says, oh, well, the next word after every good is thing. And then it's must come to an end. So as it builds up this context, it starts with just complete the sentence every good. And it has kind of a lot of variation about what the next word might be. But then as it gets more context about what it's predicted, it gets to these high, high probabilities of it is predicting something down this very specific path.

So this general idea of take something small, keep adding to it and predicting this next token or this next word or this next snippet of code, and then building upon the context of what's come before it is largely how some of these tools can work. And Copilot uses a similar approach to generate code and not just text.

Copilot vs. autocomplete

As far as what Copilot is, if we ask the Copilot developers and their documentation, GitHub Copilot is an AI pair programmer that offers autocomplete style suggestions as ghost text. And this is based upon the context of the surrounding code that is in your script or that it's been trained on. Now, I want to differentiate a little bit, because if you think about RStudio, it's had autocomplete for forever. It's got rich autocomplete and helps you do things and be more productive as you're typing out your scripts. Ghost text is a new type of context or a new type of way to create these predictions.

So we think about just autocomplete. It's parsing the code in the environment. It'll take that code of where you're typing exactly where you are and supply a list of possible completions. They have to be possible based upon the characters you've typed so far. And this is a static set of completions that has a little pop-up. And it's provided from your ID. It's literally provided from the computer you're working on or on disk. If we compare and contrast that to Copilot, Copilot will still parse the code in the environment, but it has billions of examples of training data and billions of examples of code that has been trained on that can be incorporated into the prediction, that next token or that next script that's trying to spit out. And importantly, it's supplying a list of likely completions. They don't even have to be possible. It could be something completely outside of your script, but it could be helpful for your problem.

And it's not a static set of completions, but rather a dynamic set that's kind of non-deterministic and is prepared and delivered via ghost text as opposed to a pop-up. And ultimately, Copilot is calling out to API endpoints. It's a generative AI tool provided via that API endpoint.

So if we think about Autocomplete versus Copilot, Autocomplete might look something like this. I have an R script that's in RStudio. I start typing mean. And so I type M-E-A to take the mean of this value. And it will do this Autocomplete pop-up showing me what's possible. If I compare this to Copilot, it'll not only complete what's possible, but also the context within the script. So we look here on line one, I have a comment, take the mean of the MPG column in the empty cars dataset. So here with only typing one letter M, it automatically not only completes the function I'm trying to call, but also the data that I've indicated I want to use.

So this is a possible or kind of a example solution it might give. And it's doing more than just Autocompleting the text that's available. And if you look at it, this is ghost text. That's a little light gray differentiation here. If we look at this, the first letter is what I've typed. And then the remainder of this word is the ghost text that pops up. So this ghost text, if we look at a bit longer example, might look something like this, where in the same script, I've said, oh, well, I don't want to just use mean and a dollar sign to get my column. I want to group by another column and use dplyr to do that. So in this case, I'm providing the context of a comment about what I'm trying to accomplish. I've also written one line of code, which is just loading the dplyr package. And then Copilot will generate ghost text trying to solve the problem that I've prompted it with, or this context that I provided. So from line five on, this is all Autocompleted from ghost text via Copilot all at once. So a multi-line output that it's providing. So it's not just limited to small, short snippets, but can do a little bit more complex. And importantly, this ghost text doesn't actually exist in the document until you accept it. So if you don't like the output that it predicted, you can back out and start typing something else and get another prediction out. And you can keep working or write your own script and ignore what Copilot is giving you.

And importantly, this ghost text doesn't actually exist in the document until you accept it. So if you don't like the output that it predicted, you can back out and start typing something else and get another prediction out.

Getting started with Copilot

So if you want to try out Copilot, and we'll go into some more deeper examples throughout this, you can get a subscription to get a Copilot personal or for business. Importantly, Copilot is a paid third-party service, so you do have to have a subscription to use it. And once you acquire that subscription, you can activate it within RStudio by going to Tools, Global Options, and the Copilot tab, and it will walk you through downloading and installing the Copilot agent, and then signing in with your Copilot account or your GitHub account. So go in there, you'll click on Sign In here in the Copilot tab, and then it'll take you through an offload where you drop in a verification code, you authorize Copilot to access your account, and then it will sign you in as your username. I'm jthomasmock on GitHub, and I'm now logged in as myself to work with it.

The generative loop: context, intent, output

So, Copilot is another generative AI tool, and it can predict output text, or in this case, more specifically, code. But importantly, generative AI doesn't understand anything. It's just a prediction engine. It's just trying to get to that next word and continue finishing out a sentence or a script. So, in this case, to get the most out of generative loop, we can think about it in three different ways. We have the context, which is what prompts or what code or what comments have been provided. Basically, what all is inside the script that it can use. The intent, which is what I actually think in my head of what I want to do, but I have to codify that or write it into my scripts to get the most out of it. And then I have the output, which is actually what Copilot will return in this example.

So, these three components in my mind are the generative loop. I have the intent initially about what I'm trying to do, the context about what prompts I then provide to Copilot, and the output of, again, what is actually returning. So, to get the most out of this, I can try and supply better context, which means it gets closer to my intent of what I want to do and leads to a better output or a better prediction to help solve my problem.

So, this is really what I meant by, like, scoping the problem and a prompt. The prompt is really anything in your script, like code snippets, names of files, names of data frames, little comments throughout, and other things that have been loaded in the environment. To simplify this even further, I have my little buzzword for the day, which is S2C, or making your problems simple, specific, and use comments throughout your document. Using this approach, you can generally be productive with tools like Copilot and, you know, allow it to use, be closer to what you're actually intending to do and get better outputs.

Solving the Keyword game with Copilot

So, in this case, again, our toy task that we're trying to solve today is how can we solve the keyword game with R, RStudio, and Copilot. So, we'll take this complex task and break it down into simple and specific problems. So, I might provide, like, a high-level description of the project goal at the top level and then build off that with more specific tasks. For example, like, I know how to play keyword, and I know you can solve similar word games with R. I've seen other people solve games like Wordle with R. So, I might write out a long set of comments like this. And this is my first prompt or my first context that I'm providing to Copilot. And it's also useful for me. It's telling me, hey, this is what I'm trying to do.

So, I'm creating, ultimately, a function to solve the keyword game. This is a six-letter horizontal word, and it's got the intersection of those other vertical words. And we're trying to find each of the missing letters from those vertical words to spell the horizontal word. And then I say how to play guess six letters across. And the letters must spell real words up and down. So, this context, this is still a pretty open-ended problem, but this is just me setting the initial stakes of what I want to do with my overall script.

So, the simplest way to solve this problem is to be clever. And in this case, we'll cheat really briefly and then get away from it. So, I know that a keyword is run through a JavaScript front end on Washington Post. And I bet that there's some data there that I can look at. So, I can get the URL for a specific date, grab the JSON data from behind the scenes, and then print that out. And then I can get the answer. So, the answer for this day was staple. And then I solved it immediately. But that's cheating. Like, I don't want to be that simple or that clever. Like, I actually want to solve the problem with R.

So, while the answer is available in that JSON data that's available from the website, I can also find something that's more useful for actually solving the problem, which is the words with the missing letters. So, here we have a underscore indicating the missing letter and then the rest of the words spelled out. So, if I use staple, it would be sea, chant, bear, spur, really, scale. So, those are the words I'm trying to solve.

And if I have, you know, two or three or four of the letters, I can probably try and guess the rest of them with R and try and figure out which words are possible. So, this is my first kind of break down the larger problem into simple steps, which is solve each of the individual words. So, in this case, simpler is not cheating, but just get the hint words. So, let's break it down into the component parts and work with the hint words.

So, here I have a long kind of set of comments. So, in this case, six lines of comments. I'm telling myself and Copilot, hey, this is what we're trying to do. I'm creating a function called JSON URL that takes a date and then responds back with a structured match to this URL to get the JSON data. And as I start typing out JSON underscore URL, Copilot will provide this prediction of saying, oh, hey, here's the function we think you're trying to write. And it provides a useful function. By providing all of this context in what we would call like a prompt or kind of the context we're supplying to Copilot, I get a very useful output out of pasting the date into this URL so I can really solve it for any day, not just one specific day.

So, we've solved step one, which is getting the URL, and then we can use that to get some of the additional words out of this. So, in this case, we're going to use expressive names and comments alongside variables, functions, and other objects to basically use as another type of prompt. So, here I have a shorter set of comments, given a word like underscore hack, return a regex that will match the no letters or replace any lowercase letter for the underscore. So, as I start typing out regex, it'll give me regex from Word. And so, if I supply a word, it will replace that underscore with any lowercase letter and then spit it out as a regex that can be applied to a database of words. I'm using a Mac, and I know that Macs actually have a database of words built into them. So, I can supply a whole bunch of words and filter it down into the ones that actually match this problem.

So, now I not only have the hints, but I have a regex I can use to filter out my words and try and predict. Now, where I can go further is after defining regex from Word, I can use it. And here, without using any comments, but typing out matched words, Copilot can then take that and say, oh, this is an expressive name. I bet he wants the matched words. And so, it uses the string R package to subset, uses another function I've defined called limit words, and then applies the regex against that. We can basically read this as subset the entire database of words to only words that are four characters long and are possible to match underscore ac. So, you might think of like back, lack, whack, all these different things that match that. And so, this gives me my subset of matched words that we can continue working with.

But importantly, while we've been writing lots of comments above, we've got more context built up. And again, Copilot's reading the entire script from top to bottom. It's not just relying on what you have here.

To use the expressive names a bit more, I can write a little bit of, you know, being specific about the comments and what I'm trying to do to get the top 50 most likely words. And then I start typing top underscore words. And it gives me a function that applies that match words that I just defined. It basically rolls that into a function and then does a whole bunch of different steps to basically apply the splitting of these characters, scoring them around which ones are most common. Which ones are most common in the English language, sending a name, sorting them, finding unique words, and then returning the head or the top 50, which is the default value for this function it's created.

And so, overall, I'll wrap this into a function called guest keyword and then use it for a specific date. And it says, hey, the keyword is one of the following words, recipe or repipe. And as much as I love pipes, I'm a Tidyverse fan. I love that Base R has its own pipe now. I don't know if repipe is a real word, but it's in my database. So I bet the word is recipe. And that actually was the word of the day. And so it took all of these words, guessed, you know, the possible letters that were there and then limited the guesses to ones that actually spelled a horizontal word.

So this was kind of a fun problem. I played around. It was low stakes. If I messed up, fine. But I was really just trying to see how far I could get with just trying to prompt and minimal use of my own code. You can kind of think of these comments as, you know, setting the intent of what you're trying to do, providing this context, and almost like describing pseudocode that you want to write out.

Tips for when you get stuck

So while keyword is fun and solving problems with Copilot is fun, ultimately, you know, there were times that I got a little bit stuck in terms of the problem was very broad and I needed to kind of like dive into the problem a little bit differently. Again, the best way to solve these problems if you're trying to get enhancements or help from Copilot is to add a bit more context, do more comments, do more code, do more in your script. So add more context, follow that protocol of simple, specific, and use comments, S2C. Break down the problem into simpler problems, solving a very specific task, and use comments to help describe what you're trying to do or get.

Another way to work with this is prompt again or in a different way. Or if I'm trying to add a function into like a dplyr pipeline, I might write an inline comment to help scope the problem or prompt it in a specific way. And these adding of top level, meaning like at the farthest left or inline comments to your other code is a great way of again codifying or writing down what your intent is for your script or for your problem that you're working on.

And ultimately, you need to build off your own momentum, right? Like you're going to write some of your own code. Copilot's not replacing who you are as a coder or as a developer or whoever you are. It's just helping you write some of your code. It's helping you be a little bit faster or solve problems that you don't necessarily have all the answers to, and it can kind of guide you in a direction.

And maybe sometimes you turn off Copilot for a bit, right? Like there's sometimes I'm like, I'm in a flow state. I know exactly what I'm trying to do. I don't want Copilot for right now. But then when I get stuck, I can turn Copilot back on. So in RStudio, there's a command palette or what's called Command Shift P or Control Shift P on Windows that allows you to open up this command palette and you can turn Copilot on or off really, really quickly.

Using Chatter for chat-style AI in RStudio

Now, ultimately, again, while this is cool, we've got a little more context, I do want to at least show you one example. There's more than one way to generate text, right? There's ghost text, which is really cool. You provide like a little comment, you start typing out the name of a function, and it gives you a nice clean function. So calculate the circumference of a circle. And you can say, well, if you supply me the radius, you know, I can give you two times pi times the radius. And here's your function for calculating circumference.

But, you know, maybe you want to ask a question, you know, Copilot's really good at generating code, not great for answering questions, because it's not really intended to be used that way. So maybe you want to ask a question of something like chat GPT. Or maybe you want to say, oh, I have, I got stuck with an error, explain this error to me.

So chatter, or the chat R package is an R package as an interface to a bunch of chat style, APIs or models. So you can think of like chat GPT from open AI is a very common tool for chat, chatbots or chat style, large language models or generative AI tools. But there's many different types of these. What chatter provides is a way to call it from R code and interact with these, or to actually have a chat style interface in RStudio via the viewer pane. So here I might say, how do you calculate the circumference of a circle? And rather than just giving me a two line function, it says, hey, to calculate the circumference of a circle, you can use this formula. And then it shows me how to do this with R code. So you can approach the problem in a couple different ways. And both types of tools are helpful in their own right.

So the chatter R package, again, I like using it as the chatter, chatter app, which allows me to call it and display it in the viewer pane of RStudio. And so this allows you to not only can you use copilot for generating code, but you can ask questions or get answers back to these problems you're trying to solve. And you can use that to again, help you be a better coder, even if you don't want the predictive text inside RStudio.

chatter does something really cool in that it's doing what we call enriched requests. So I don't want to go too deep into all of this. But you can you know, load the library chatter, you can attach some data sets, maybe empty cars or the iris data set. And then you can say, hey, send a request out to chatter, or in this case to send it to chat GPT with the GPT 3.5 turbo model. So this is what it actually sends across as a prompt or as an ask to the model, they'll say, hey, you're a helpful coding assistant. Use these books. So tidy modeling with our our for data science, use some tidyverse packages or the tidy models package, and a couple other things to say limit the response to make it a little bit more efficient to use. And then it would inject your actual question here, but enriched with the rest of this context that is always making your questions better for doing our related tasks.

So ultimately, when you send a question through chatter to one of these back end models, yeah, you have the question, which is the actual thing you're trying to solve. But in addition to your question, it's enriched with the path to data files, the data frames you have loaded in the environment, some additional prompts like use these packages or use these books, as well as the chat history. If you're using chat GPT of what you've asked before. So it has all this context that it can use to submit to the large language model and then give you a nice response back in your IDE to actually solve problems for you.

So ultimately, when you send a question through chatter to one of these back end models, yeah, you have the question, which is the actual thing you're trying to solve. But in addition to your question, it's enriched with the path to data files, the data frames you have loaded in the environment, some additional prompts like use these packages or use these books, as well as the chat history.

So that's kind of a summary of the couple different models you can use in RStudio. You have chat GPT style chat interfaces with chatter. You have GitHub copilots available inside RStudio for ghost text. And again, if we want to be successful with these tools, we want to follow our simple specific and use comments or S2C. Importantly, GitHub Copilot is an optional integration. So if you don't want to use it, you don't have to use it. And it's available as a preview feature, essentially like a public beta in the 2023.09 release of RStudio and workbench.

So this is available. You can enable it if you want to, and you will need a subscription to activate it. If you have feedback or you've run into bugs using Copilot, you can open up a GitHub issue on RStudio repo. And don't forget about the chatter package if you want to do more of the chat style interface with either remote APIs or even locally hosted models such as Llama. And if you have other backends, you can always open up an issue to ask for how do you interact with that.

Live demo

So to kind of close out and then we'll get into some live examples and I'll show you a little bit about how it works inside RStudio. You know, I've got a couple images of different Copilots you might have. So maybe you have a cat that you're really close with and that's your little Copilot or a dog. Or maybe you're a fan of Totoro or other kind of mystical mythical creatures that you want to work with. And these are all generated with an AI tool called Mid Journey.

So that's the end of the slide. I do have some other slides we can get into if people have questions. But just briefly, I do want to show a couple examples of actually using Copilot inside RStudio.

So importantly, I'm going to go in here. I've got Copilot loaded and I've got RStudio loaded here. I'm going to start a background chatter app, which is again that R package for running a large language model interface inside of RStudio. So now here in the viewer pane, I have a place to ask questions. For chatter app, it will run as a background job if you have that set. So I have it set to run in the background so I can still use my console to do other things without interrupting the model or the model interrupting my console.

So here I have a couple of kind of open ended tasks that I want to work with. And I'm going to use Copilot to initially solve them and then ask maybe a question or two of chatter. So here I have kind of a base example of how I can I repeat common tasks with Copilot. So maybe I'm subsetting a bunch of data or vectors. I'm trying to grab them one by one. So I have this empty car six cylinder, which takes the empty cars data frame subset it to say, oh, only find the rows were cylinders equal to six. But then maybe I want to go to the next one and say empty cars, eight and see what does it give me? And it says, oh, okay, well, you probably want the same thing but for eight cylinders. And then let's see what else it gives me. I say empty cars, four cylinders, and it gives me the four cylinder ones.

So in this case, this is probably not necessarily best practice, but it's helped me get multiple repeats or kind of speed up my ability to do some of this repeat code. But if you look at this, this is really 99% the same code over and over. Really, the only thing is changing is I'm changing the rows I want to look at in the name of the data frame. So I might say, okay, well, thanks for giving me that but convert it to a function, right? So let's take that same thing. And let's say empty cars, cylinder. And now I start typing that and says, okay, function data frame cylinder. So let's see what it does for that. I'll execute it. And I'll do empty cars cylinder, we'll do empty cars, and it autocompletes, I probably want to use the six cylinder. If I do that, and execute it, it gives me just the rows that match the six cylinders. So every single one of these rows is six cylinder. So you can imagine that yeah, sure can help you rewrite boilerplate code. But more importantly, it can take existing examples and help convert that to a function.

Now the next thing I do whenever I write functions is maybe I want to write some tests. So might do something like library test that I'll load the test that library. And this will help me write some of these tests. So I'll say, okay, write a test for that function with test that. And it gives me this prompt or this output of test underscore that empty car cylinders works. This is the function I have. So let's see what it gives me, it's giving me an expect equal, it's saying the number of rows. Empty car cylinder should be seven. And it actually guessed right. Importantly, it's not doing math here. But you can imagine there's probably lots of examples of empty cars in the training data. So it's actually really good at solving those problems. So let's just execute that. And it gave me a passing test pretty much immediately.

Well, this is all working really nicely. And I've kind of chosen simple problems. I hope you're seeing some of the possibility here of again, like, oh, I started with this script or someone handed me off this script. It's like 30 lines of repeat code. Let's roll that into a function. I also want to make sure because I'm following best practices that I can test it. And let's test that. And even if the number was wrong here, I could go in and edit it and it's given me useful test or useful structure of a test.

So I might want to write a more complex function as opposed to these little empty cars. So here, let's see, create a my summary function that takes a data frame and column name as arguments, as well as a logical argument for whether include any values and a summarizing function as an argument and use the embracing operator to pass the summarizing column. Now, embracer or the embracing operator, these double brackets here are a concept in tidyverse or rlang of referencing bare column names. So I might do something like my summary start typing that out. And it gives me this. Let's see if this works. So my summary.

My summary MT cars MPG. And then it completes the rest of it and it says, Oh, empty MPG was not found. So it did not give me a working example in this case, right, we ran into our first problem of it. Hey, here's a response, but it's not actually a valid response. So let's try that again. You know, we can delete it. It doesn't matter. I can try again.

Now I don't have any functions to find in my environment. And I can say let's try one more time and then we'll keep going if we can't get it. Ah, now this is looking better. So here we're actually using the embracing operator. So we're going to accept this and it says take a data frame summarize it we're using across which I don't think I really need, but it does have the embracing operator. Allow me to pass a column in this example. So my summary empty cars MPG and the function we're going to use this mean.

And, ah, it probably useful if I loaded the dplyr package. And let's see what it says. The dot dot dot arguments of across is deprecated. So let's just drop across and see what it gives us now. So this is the process of where things can go a little bit awry. Okay, so now what I've done is deleted part of the response and said, well, most of this is right. But I actually want you to give me this different way. Like I've tried to prompt it or guide it in a different direction by deleting part of the code and saying, well, most of this is right. But give me this rest of it so I can accept that. So now let's see. I need a closing parentheses there. Let's try running that again. And now it's working right so now I got a tidy eval column that gives me back the empty cars data set summarized for the mean of miles per gallon. I can try this with median instead of mean and I get a median instead or a min and I get a min or a max and I get a max.

So ultimately, like this is part of the life process, right? Like it's non deterministic. I can't always get it to give me what I want, but it can help guide me down a path and then I can help guide it down the path I want it to go.

So ultimately, we spent a bit more time on that example because it kind of went a little awry, but let's just for fun, see if we can test that. So we've already got test that loaded, but just in case we'll load library test that and test. Let's see if it generates a response. It's thinking and now it gives me test that my summary works. So here, let's see. First off, we'll just run it and says stop if needed. Names for target but not for current. Okay, let's just run one example and see what it gives us. List. Okay, so the short of the answer is if I look at this, it gave me a useful output, but it's returning it as data frame versus a vector or as a single value. So again, this test is actually pretty good, but we would need to pull the data out. And now let's try that. Pull and one more pull. So now we're getting a vector back instead of a data frame. And now I actually have a passing test that was created very quickly.

Overall, it's again, you can see I'm injecting myself a little bit. I'm accepting what it gives me looking at the errors is prompting and saying where can we go. So Copilot's pretty fun. It was nice to kind of go through some examples, but I can also say, you know, what is the embracing operator. So maybe I say, what even is the embracing operator in the tidyverse and see what it says. The embracing operator in the tidyverse is used for non-standard evaluation allows you to pass a variable expression as an argument without evaluating it immediately. And then it gives me an example of how I could use this. And if we look back at my function. This looks really close of it allows me to do this filter column equal to value.

Now, importantly, this has multiple uses of the embracing operator, as opposed to one. And so it's a little bit different than my example that just repeated column twice, but same kind of idea of it explains it gives me an example and I can work with it. And maybe this is useful. Or maybe I could ask it like what are these errors mean like it's saying mode list equals numeric. What does that actually mean I could ask it to explain this error. So explain this error. And let's ask it and see what it does with that.

It's giving me the two parts of what the error actually is. So it tells me a that it's failing and be what modes list equals numeric. So it's saying that the results are different. So it's actually expecting a numeric value, but it's returning a list. So, you know, help you guide and interpret some of these errors, you might be seen.

So, yeah, so this is kind of a different ways you might use it. And again, this was very quick, just kind of going through some examples, but you could see how copilot is useful for kind of quickly iterating on your code. So copilot is useful for kind of quickly iterating on your code. But then if you hit a problem, you might need to re engineer it or re ask the question or the problem you're trying to solve. And then with something like a chat interface, you can use kind of basic syntax and kind of real questions you're asking and get responses back to them.

Q&A

With that, we're at about the 50 minute mark. So I'm going to pause and see if there's any questions. I see a couple questions, I think here from Peter, and then one question from Lubav that I'll answer. And if you do have other questions, you can ask them in the Q&A section here on zoom or in the chat.

So there was a question from earlier about what is a effectively scoped prompt. So in this slide, there's this idea of saying, hey, you know, how can I provide a scoped and specific prompt. In this case, that's this S2C, simple, specific, scoped, and comments. So scoping it or specific, meaning this is what I'm trying to solve right here. So sure, there's this other context of, I have a probabilities vector, I have a vector of numbers. I have a vector of numbers that I'm going to eventually use. But what I'm trying to do in line nine is return a function to calculate the mode of a vector or create a function called mode. And so this scoped and specific prompt or basically this very specific task to solve was solved in line nine. Even though this additional context is helpful. We're just trying to do this right here. So that's where I really like using comments to help break down the problem so I don't go in a weird direction.

There's another question about, is it helpful to think of working with Copilot as a bit like writing pseudocode? Absolutely. And if you look at some of the prompts I gave or the context I provided, I did provide pseudocode. So I said here in line five, using Glue, create a URL that matches this string. And because I provided this string, that's what was used in the actual code it spit out. So again, I don't have to put this just into my script, I could put it into a comment. And then that kind of pseudocode along with this description of the problem can be built out into the function I'm writing. It's also helpful for me, like even if I didn't use Copilot, just writing down and getting it out of my head. So if I pause, go away and come back to solve the problem, I can actually keep working with it.

And then there's a question about if I get stuck, how does Copilot do as a debugger? I hope you saw a couple ways of like how I did get stuck and I could change a few things, try and restart the task. Or in some cases, I had context up above that was really bad. Like, I didn't want to write code like this. So maybe once I've converted it over, I can just delete that section, right? Like I don't have to keep bad code if I want to use it. And so you can also use a chat style interface to actually ask a question of like, explain this error to me.

The question in the chat was what training data or type of training data is Copilot trained on? So that's kind of up to the Copilot group. So again, Copilot is a third party integration to RStudio. We're not doing anything to actually train it or fine tune it. It's just kind of whatever you're getting back from Copilot. But it was trained on billions of example of code and some text. So again, the primary purpose of Copilot is to generate code, but it will actually generate text or comments or other things. And you can even use it inside something like a Quarto document to generate code or text.

So I might say, today we are talking about Copilot. Copilot is, and I wait for a second. I wait for another second. And of course, now I'm stressing it out to say to do something. And let's delete that. Copilot is a tool that uses machine learning to help you write code. So it gave me back a lot of text. So this is all one line, but that's why it's taking a little while. It's just generating dozens and dozens of words. So this explains what Copilot is and kind of how it was trained and what the OpenAI codex is.

But then just as easily, I can say, use ggplot to create a scatterplot. So here, run that. And then let's see what it gives me. It gives me ggplot empty cars. And now I have a graphic. But maybe in the same script, I'm like, well, I'm learning about Python too. So I want to use the Plot9 library in Python to create a similar scatterplot. So I might do something like from Plot9 import ggplot. Plot9 is an implementation of graphics in Python that adheres to ggplot standards. So I can execute this. Inside RStudio, it will use reticulate to run Python code from R. And then it gives me this similar syntax you might be used to of ggplot.

So in this case, we have code that looks like it did before. But this is, I have to read in empty cars equals, but this is Python code. So read import pandas as pd empty cars equals. All right, so there is an empty. There we go. There we go. And now I have a giant graphic in RStudio of a Python library actually printing out some Python code for a graphic.

So all together played around that for a little bit. And that's an example of using it inside a Quarto document and it can do more than just generate the text, but also generate some of the outputs of, you know, inline text as well as code.

While I'm waiting around. I will just call out that if you go to docs.posit.co there is a RStudio user guide. I'll drop this into the chat. There's the link to the RStudio user guide or you can just search for RStudio ID user guide. So if you do want to learn more about how to use tools like this, you can go to the tool section underneath guide, click on GitHub Copilot, and it talks about the process of using GitHub Copilot in RStudio as well as interfaces like Chatter. And it walks through some examples of how to be productive and how to use it and a lot of different things about what you can do with Copilot in RStudio.

Can these tools be used in PositCloud? So they can in the future.