Resources

AI-Powered Data Science in Positron (Ryan Johnson, Posit) | posit::conf(2025)

Ryan Johnson introduces Positron, an AI-ready multilingual IDE designed specifically for data science. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Good morning, everybody. We're going to go ahead and get started. Before I dive into my stuff, I'm going to hand the mic over to Rachel real quick.

Thanks, everybody. I just wanted to let everyone know, if you want to ask questions today in the lounge room, if you go to the Positron Lounge channel on the Discord, I'm going to put the slide open over there, and so you can ask questions there either with your name or anonymously. If it's with your name, you can also just raise your hand here in the room, and we can answer questions that way too, but only in the Positron Lounge channel on the Discord.

Welcome, everybody, to our very first Learning Center Lounge area. It's good to see everyone here so early in the morning. It's great to see all the virtual attendees as well. My name is Ryan Johnson, and today we're going to talk about Positron and doing some really cool AI-powered stuff inside of it.

Now, if I've not met you already, and I see a lot of familiar faces already, but again, my name's Ryan. I'm a data science advisor here at Posit. Part of my role is just to make sure that you all know about all the cool tools that we're creating here at Posit, our open source tools, our professional tools, as well as the IDEs that we're creating. That's going to be the real focus for today.

What is Positron?

As a quick show of hands, who here has used Positron already? Maybe I can probably figure this out, but maybe entertain me. Who here has never used Positron before? We're actually like 50-50, so that's pretty cool.

If you've never used Positron before, maybe you have, and you're just a little curious about the history of Positron, what is Positron? You can give it a lot of definitions, but what I'm going to call it is an AI-ready, multilingual IDE specific for data science. For those that have never used Positron before, I'm going to maybe take a guess that you've probably used RStudio before. This is an IDE that's been around for about 15 years, probably more than that. It's our baby. It's the first tool that we ever created here at Posit, and we love it. The reason why our users tend to love it is that it really is tailored to R, and it's very much optimized for data science.

There's other great IDEs out there. Who here has used VS Code before? VS Code is a great all-purpose editor. If you're using R, if you're using Python, if you're using Julia, HTML, JavaScript, you can probably find a home in VS Code. It is multilingual right out of the box. It's also extremely customizable, so you can do whatever you want. There's that rich ecosystem of extensions where you can just make VS Code your own personal flavor. In more recent versions of VS Code, it's also AI-equipped, so things like GitHub Copilot come with it right out of the box.

When designing Positron, we essentially wanted to take the best of both worlds. We looked at RStudio, and we're like, what do data scientists really like about RStudio? They really like that variables pane in the top right, so they can see the variables they're creating. They can interact with the data they have stored in memory. They also really like that plots pane in the bottom right, by default, so you can actually see your plots. You can see your tables and interact with the insets that you're creating. With VS Code, people just love that it's extremely customizable. Again, multilingual, AI-equipped. We took the best of both worlds, and we made Positron.

When designing Positron, we essentially wanted to take the best of both worlds. We looked at RStudio, and we're like, what do data scientists really like about RStudio? We took the best of both worlds, and we made Positron.

If you've never seen Positron before, hello, Positron. This is what it looks like right out of the box. If you're a VS Code user, you're going to immediately see some stuff that looks pretty familiar. If you're an RStudio user, you'll also see some other stuff that's familiar. If this is the first time you're seeing Positron, there's a really easy way to just break it down into four different components.

On the left-hand side over here, we're going to call this your manage area. This is where you can do things like read in files. You can have your extensions, and manage them over here on the left-hand side. You can do version control. All that stuff is over here on the left-hand side. As you begin writing code, maybe you're creating things like R scripts, Python scripts, Shiny applications, Quarto documents. That's all done right up here in the top middle. This is your writing zone where you can actually edit and write your source code.

As you're writing your code, you're obviously going to want to run it at some point. That's all done down here at the bottom. This is a console right here, very similar to RStudio. What's also really cool about Positron, again, it is multilingual. At the very top, you can see in this screenshot, we're running R version 4.4. If I wanted to, I could click this little button in the top right and easily switch over to Python. That will switch over my console to Python as well. It's extremely easy to switch from R and Python. You can have an R and Python session running at the exact same time.

Then the last part of Positron, which is what we pretty much adopted from RStudio, is your understand portion of the IDE. This is when you start creating variables, storing data sets, interacting with data sets, plots, tables, Shiny apps, rendered Quarto documents. You can literally interact with them and see them directly in the Positron IDE off on the right side here.

Positron Assistant

That's Positron, but today's focus is really about highlighting some of the really cool AI LLM integrations with Positron. The first one that we're going to introduce you to is something called Positron Assistant. This is an AI client that's actually built directly into Positron. It's nothing you have to add additionally. It's right there. You can start using it.

I also want to give a small note here that this is still a feature that's in preview. You can certainly test it out, but it's not quite ready for production just yet. Within Positron, you'll see on the left-hand sidebar, there's a little robot face. That's how you can access Positron Assistant.

What can you do with Positron Assistant? You can do a lot. You'll see there's a little text box on the bottom. That's where you can just literally interact with Positron Assistant. Ask it questions about your data, your analyses, and it will get a response. You can chat that way. You can also leverage in-line code edits, which I'll demonstrate here in a little bit.

Something that I'll talk a little bit more about in a second is that Positron Assistant and pretty much all the AI features we have in Positron are model-agnostic, meaning we're not going to force you to say, hey, you need to use ChatGPT or you have to use Gemini. Chances are your team may only approve certain tools. We want to make sure that you can bring those tools to Positron and use them. Positron is not in the business of creating our own LLMs. We just want to make sure you can use the LLMs that you can use in Positron right out of the box.

Lastly, Positron and Positron Assistant are designed for data science. The way that Positron Assistant does this is pretty cool, and it's also a little creepy. Here I have the Positron IDE, and I put purposely some very creepy eyes on it. When you interact with Positron Assistant, it can actually see your Positron environment. It can know that, hey, you have this Quarto document open, or, hey, you have some variables stored in memory. I can see the plots that you've created. I also know what code you're in in your console and what error messages or output that you got.

That way, when you make suggestions or questions to Positron Assistant, typically the suggestions you get and the responses just tend to work immediately, because it automatically knows everything it needs to know about your environment.

There is another way. This is known as inline code suggestions. If you have a Shiny application open or a Quarto document or a Python script or a FastAPI, you can typically hit Command-I if you're on Mac or Control-I if you're on a Windows machine. You can open up this little chat assistant window where you can make edits directly to your source code that way.

Currently right now, Anthropic is really just the main model that we support, and we are actively testing out other models. As Positron Assistant continues to evolve, we hope to support additional models in the future.

DataBot

Positron Assistant is one tool that we're going to talk about in this short demo. There's another one, and this is called DataBot. I'm curious. Anybody here try out DataBot? Okay, we've got a few hands. That's cool. The vast majority have not. That's expected. It's a really new tool. It's a very exciting tool. Honestly, it's probably one of the more fun tools that I've ever interacted with.

Unlike Positron Assistant that's built into Positron, DataBot is actually an extension. If you're a VS Code user, you'll know that there's a very rich library of extensions you can import into your Positron session to customize your development space. DataBot is, in fact, an extension.

The way that DataBot works is pretty cool. We've coined this internally, called the WHERE loop. It's really designed for exploratory data analysis. You get a brand new data set thrown at you, and typically the first thing you want to do is just understand what that data is, what it's telling you, maybe extract some insights. That's where DataBot comes into play.

If you have a new data set, you can open up DataBot, and you can just say, hey, DataBot, tell me about this data. What it's going to do is kick off this WHERE loop. The first thing you'll do is write R or Python code on your behalf, again, with your approval. This is also something that, again, the way we interact with these LLMs is all code-based, because we know that code is the most auditable way to interact with LLMs. DataBot's going to start writing some R or Python code, and it will execute it immediately. It'll actually analyze those results, and then regroup and offer some additional suggestions. Then you can give it the next instruction, and this WHERE loop will just go around and around and around in circles until you are confident you know everything you need to know about this data set.

This is also something that, again, the way we interact with these LLMs is all code-based, because we know that code is the most auditable way to interact with LLMs.

Live demo

For the rest of the session today, we're going to do a demo, and I'm going to have a lot of audience participation. Don't be shy, because I'm going to ask a few questions here. This is Positron. Just to orient you all, again, we have our little manage sidebar here on the left-hand side. I have open in the main center, this is a Quarto document with some R code inside of it. I have my console down here, ready to run some code. You'll see, again, the variables pane, the plots pane, the things that we love about RStudio right here within Positron.

Again, you can see I am running R in the very top right corner, but if I wanted to, I could easily switch over to a different version of R, or even a different version of Python, which is a really powerful feature of Positron. Then over here on the left-hand side, there's a little robot face. That is Positron Assistant. We can open it up, and we can always drag, just like in RStudio and VS Code, open it up a little bit so we can see it in all of its glory.

What we're going to do first is I'm just going to run through this Quarto document. If you've never seen Quarto before, they typically have these QMD extensions, very similar to R Markdown in a lot of ways, but Quarto is also designed to support R, Python, and other languages, like Julia, Observable, so it's a bit more of a language-agnostic tool. This Quarto document is not particularly complex. It's only about 40 or so lines of code, and these gray boxes are code cells, or code chunks, with some actual R code inside of it. We're going to step through this, and we're actually going to hit some error messages, and we're going to use Positron Assistant to help us debug these errors.

The first thing that we're going to do, we have the tidyverse package, we have the scales package, we're just going to load some packages into our environment, and then we're going to import a dataset from TidyTuesday. Who here is familiar with TidyTuesday? All right, about half or so. TidyTuesday is this GitHub repository, and every Tuesday, we release, well, the TidyTuesday folks release a new dataset that you can just play around with. It goes all the way back to 2018, so it's a pretty rich repository of data, and the data is extremely diverse, going from palm trees to Pixar films. We've got Pokemon in there, which is pretty cool. It's a great, if you want to do some exploratory data analysis, if you're teaching and you would need to leverage some data, this can be a really helpful resource as well.

So what we're going to do is, I'm going to specifically use this TidyTuesday R package to import a dataset from the week of July 1st of this year. And then I'm going to extract, so this dataset is actually looking at gas prices over time, and I'm going to extract a specific dataset called gas prices. So let's go ahead and run this code cell. I click Run Cell at the very top here. So I do that, you'll see it runs all the code down here in the console, and then in the top right corner, you can see the environment variables that have been created, and again, I can literally come up here and interact with them, which is a really great feature of Positron.

Now we have some data loaded from TidyTuesday. Now what we're going to do next is just create a plot. We're going to use a very popular ggplot package to create a really nice visualization. So let's go ahead and run it. And I told you we're getting an error message, and here it is. At the very, very bottom of your screen, but we get an error message which isn't particularly helpful. It's telling us that there's no applicable method, for filter, I guess that's kind of pointing us in the right direction, but if you're new to R, or Python, or Positron, you see this error message, you're going to be like, I don't know what to do now.

So what we're going to do is use Positron Assistant to help us debug what's going on here. Now again, Positron Assistant is aware of my environment, and if you look down here in the very, very bottom left, you're going to notice that it already has some context about my environment, like, what R version am I running? 4.4. And you'll also see it automatically picked up the Quarto document that's in my source pane here in the top right. So that is some of the context that's going to be used when I interact with Positron Assistant.

I don't have to do anything particularly fancy here. I can literally just say, help me fix the error. That's it. Now if you were using something like ChatGPT, or Gemini, or Anthropic, and you're using the web chat interface, you would probably have to, like, okay, let me grab all of my code, and I'm going to copy and paste it over, which is a huge pain in the butt. With Positron Assistant, you don't have to do that, because it already is aware of your context. So let's go ahead and run this, and let's see what Positron Assistant does for us.

You can see it's used one reference at the very top here. So it's letting you know it's using the Quarto document to help inform its response. So the error occurs because gas prices is null. Okay, that was unexpected. So what it actually did here, which is pretty cool, is that it knows about my Quarto document, but at the same time, it was looking at my environment variables. It sees this Tuesday data set over here, my environment, but it also sees this gas prices, but it's set to null. That's something I didn't actually pick up. And you'll notice that Positron Assistant, right on this line right here, is like, hey, you actually extracted this incorrectly. So instead of gas prices here on line 15, it really needs to be weekly gas prices. So again, this is how Positron Assistant knows about your environment and can start making really helpful suggestions. And then, as a little bonus, it actually said, hey, I actually found some other errors in your code.

So it's saying, okay, you used labels, it should be labs, and then date, you're using capital D here for date, but it should be a lowercase d. And then at the very bottom here, you have an entire code snippet that if I wanted to, I could run it directly within Positron Assistant. So you can copy it, you can copy it into a new script, or what I like to do is the second icon over, you can apply an editor. So I'm going to go ahead and click on it, and I'm going to add it to my QMD document that I currently have open.

And it'll take just a couple seconds here, but it's going to give us a diff, something where we can actually see the changes it's suggesting, we can approve them, we can reject them, it gives you a lot of control. So anything you see in red here are going to be suggestions that Positron Assistant's like, okay, we no longer need this line, and the green stuff are going to be the suggestions that it suggested here. So you can go one by one and either approve them or reject them, but if it looks good to you, and I think Positron Assistant's doing a good job here, we can just keep it all. So let me go ahead and keep it, and now we have a revised script here, and let's go ahead and run it.

And there we go. So now if we look down here, we can see our beautiful plot down here in the bottom right. And just like RStudio, you can drag these dividers here. You can also pop it out, so if you want to, just click here. And we can see our beautiful plot. This is looking at gas prices over time on the x-axis, which I think looks really good so far.

And then again, at the top right here, we can see the environment variables. Gas prices is no longer null, and we can actually interact with this data set. So if you click this little icon off to the right, we get the really, really handy data viewer within Positron. So if you've never seen this tool before, it's similar to what you'll see in RStudio, but it is just insanely feature-rich. So you can see what columns are in your data set. You can learn about all the various metrics, like a quick little glimpse into the various columns here. There's also, you'll see a little icon here to let you know what percent of the data is actually missing. So this is a really, really awesome tool for exploring your data.

Okay, so I've created a cool plot here. I think it looks pretty nice, but I think we can make it look even better. So what I'm going to do here is I'm going to put my cursor on the lines of code where we generate this plot, and I'm going to hit Command-I on my keyboard. And this opens up this window here where we can interact with Positron Assistant in line with the rest of our code. So I'm going to go ahead and ask you all, for some brave volunteer, look at that plot. What's something you would like to do to this plot to make it better or worse?

Change the X-axis labels so they are vertical. All right, so there it goes. And right there, it just automatically went really fast. But you'll notice down here at the bottom in green, that is what it's suggesting. We could accept it. We could close it and reject it. What you can also do here is we can just rerun this code cell with that inline still there and see what the results are giving us. All right, there we go. Again, I'll make my plot a little bit bigger. You can now see the labels are straight up and down. All right, and that's how easy. So I can accept it, and we are good to go.

So, I'll come in here. Again, I'll do Command-I, and I'll say remove the background grid. I'm going to say remove the background grid and add a trend line. Make the trend line... purple. The Positron system is pretty smart, and we'll know where it needs to make those edits. I think it might be helpful to put it in the general area of where you're trying to make the edit, but for the most part, it's pretty smart, and we'll know where to add these edits. All right, so let's go ahead and try this one out and see what we get.

Uh-oh, got an error message. That's okay, because how can we debug error messages? We can use Positron Assistant. So, let me go ahead and accept this, and I'm going to say, back down here in the bottom left, hey, I got an error. Can you fix this?

Well, now it doesn't want to behave for me. Curses of the live demo, but that's okay. But hopefully you all can understand now how Positron Assistant, with just some natural language, you can really dramatically improve the data science, your plotting. You can do a lot more with Positron Assistant, and I encourage folks to play around with it, because it's really cool.

DataBot demo

But for the sake of time, I do want to demonstrate DataBot, because this tool is insanely cool. So, I'm going to go ahead and just restart my R session, which you can do down here in the bottom right. Just restart R. It's going to clear your plots, clear your environment variables. I'm going to close out of this specific script, and open up another one. And this script is extremely complex. It's got two lines of code.

So, what we're going to do here, is I'm going to, again, ask for another volunteer. I'm going to load this package, and on this line, on line three, we're essentially going to import a data set from TidyTuesday that I've never seen, you all have probably never seen. So, I want someone to just give me a random year from 2018 to now. 2020. And someone give me a week. 23. Perfect. Alright, let's go ahead and run this code, and let's see what we get.

Hey, we're going to be analyzing marbles. Alright, how exciting is that? So, we have this marbles data set, you can see marbles.csv at the very bottom. It's actually living in this Tuesday, Tuesday data variable in my environment. And, I don't know anything about this data. And so, we're going to use DataBot to help us out. So, I'm going to open up DataBot, which is an extension in Positron. So, I'll do open DataBot, and it looks something like this.

We have this Tuesday data in our environment, and this is DataBot, which again, is perfect for exploratory data analysis. I'm going to go ahead and kick off this analysis by just saying, help me explore the Tuesday data. Hi, I'm DataBot. And there we go. The first thing that's going to happen is DataBot always wants to make sure that you stay in control. So, if you're okay with DataBot running R or Python code on your behalf, you can allow once, or you can always allow. So, I'm going to go ahead and click always allow, and we're going to let DataBot work its magic. So, again, it's running all this R code for me. Alright, DataBot is extremely positive. It'll always give you really happy messages, which I enjoy.

Alright, excellent. You have a good understanding of your Tuesday data marble racing data set. So, again, I have never seen this data set before, but we have 256 marble racing records from Jelly's marble run. I don't know if anyone's familiar with that, but I'm certainly not. We have 16 different teams, races, some key findings. So, letting you know there's some missing values. But then also what DataBot does is it provides some suggestions. Like, hey, where do you want to go next? So, we could show me the fastest and slowest marble times overall. Which teams perform best?

Anybody feel strongly about one suggestion or something else? Visualize. Not surprising. That one's always very fun. We're going to create a visualization showing team performance over time. So, I can click on this and just tell DataBot to do it. Alright, so it's going to go ahead and write some text, and then it's going to start running some R code for us.

Key insights from the visualization. So, time-based chart shows dramatic differences between qualifiers. There's fast, and then there's long endurance races. I am so confused about this Marbles dataset and what's going on here. But, again, DataBot is trying its best to help us out here. Season one rankings. So, the savage speedsters. They seem to have won season one for these Marble races, which is pretty cool.

Just for the sake of time, because we only have about four minutes left here, again, the whole idea here is that you can keep going. Keep learning about your data. You can create visualizations. You can have it run some statistics on your data. It can do a lot of stuff. But, when you're done with your analyses, you might want to just document what you did. And, DataBot has a really cool feature where you can do forward slash and say report. And, it will create a Quarto document summarizing your analysis. So, it's going to go ahead and say, like, I'll be happy to create one for you. It's going to give you a proposed outline. And, if you're happy with it, it'll actually create it for you.

For this one, I am using Anthropic Claude on it. You can see in the very bottom down here. And, if you look right here, it's actively creating my Quarto document, summarizing the results that we did.

Wrap-up and Q&A

Alright, this may take a little bit, because we did a lot of cool things. But, just for the sake of time here, just to do a quick recap, again, Positron, this is just an awesome tool that we're insanely excited about. Optimized for data science, as we talked about. Multilingual, supports R, Python today. It's designed to support other languages in the future. So, we expect Positron to be around for a very, very long time. It's LLM flexible. Again, we're not going to force you to use one model versus another. We want to make sure you can bring your own model. With Positron Assistant and DataBot, it was designed to make just working in your projects easier, creating files super easy, like we did with that Quarto document. And then, faster, exploratory data analysis with DataBot.

So, with that, I am happy to take any questions.

It's a really great question. So the question's about data security and using LLMs. If you work on sensitive data like is it okay to use Positron Assistant? Is it okay to use DataBot? Again, we are not building any models at Posit. We are not consuming your data. It is a bring your own model mentality. So we would hope that your team internally who's done the homework to know like hey, we can approve Gemini, we can approve ChatGPT because that's what we know we can do internally and then you can just bring that to Positron. But again, it is a large language model. You are making requests to an API so it is something to be aware of.

There's also, I should say, there are ways where you can interact with models that run all on-prem. So you can do something like Ollama and interact with Positron that way so you're never actually reaching to an external API. Everything is staying in-house.

That's a great question. For DataBot, given some sort of like background setting or context for exploring data, right? Sometimes, you need help brainstorming ideas of what questions to ask and what to explore. Can you, not train the model, but can you give it that context of like, oh, you're a healthcare data master guru or something like that? Yeah, great question. So the question is, you can use DataBot right out of the box and you can see it works pretty well but you may need to customize it a little bit more based on your data, based on what you're doing. I believe there is a file. I think we call it like LLM.txt which is something that DataBot and Positron Assistant will actually consume when it first boots up. And it's a way to add additional prompt messaging to tailor how DataBot and how Positron works out of the box.

So to summarize the question, DataBot is an extremely exploratory data analysis tool and there is a high likelihood that you may have one dataset and if two people analyze it, you might, I wouldn't say get different results, but you may just go down different avenues. Maybe someone's going to look at correlations in the data and other people are going to do some more visualizations. So it is very much who's driving the analysis and where you go from there and what you're hoping to learn. I would hope that if people set out with the same goal and are using DataBot that they're going to get to the same results. But, you know, it is an LLM. These are inherently chaos machines so there may be a little bit of just fluctuations there.

But, you know, it is an LLM. These are inherently chaos machines so there may be a little bit of just fluctuations there.

The question was, when will it be possible to use custom OpenAI APIs or GitHub Copilot for the Positron Assistant? Is that something that will be announced during the conference? It's a good question. I certainly don't want to speak on behalf of our Positron project manager. If you all see Tom Mock, I'm sure many of you know Tom Mock around. He's the product manager for Positron. We have a bunch, an entire engineering team of Positron and those that work on Positron Assistant are here as well. They may be able to provide more of like the roadmap for Positron Assistant and what's coming to it. All I can say for right now it's in preview support so it's there. You can use it and try it out and have fun with it but, you know, we'd probably hold off on doing any production work in Positron Assistant for right now.

The question is, is there MCP support? I want to say yes but I don't know confidently. I think all of that is pretty good. Like, you can use tools today in Positron Assistant and I'm pretty confident that they're going to come to Positron in the future.

But also, I'm very happy to ask a lot of these questions in Slido for the keynote that's about to happen. It's going to land on Jonathan McPherson and you'll get them, you'll get much better answers, within the next hour. And I'll also say that later on here in the Learning Center we're going to have Winston Chang who's the creator of Positron and we're going to have the creator of DataBot who can probably do a much better job answering these questions than I can. But my goal is I'll just hopefully get you excited about these tools and using them for your AI stuff and data science stuff.