Resources

Introducing Positron, a new data science IDE - posit conf 2024

Positron is a next-generation data science IDE that is newly available to the community for early beta testing. This new IDE is an extensible tool built to facilitate exploratory data analysis, reproducible authoring, and publishing data artifacts. Positron currently supports these data workflows in either or both Python and R and is designed with a forward-looking architecture that can support other data science languages in the future. In this session, learn from the team-building Positron about how and why it is designed the way it is, what will feel familiar or new coming from other IDEs, and whether it might be a good fit for your own work. Talk by Julia Silge, Isabel Zimmerman, Tom Mock, Jonathan McPherson, Lionel Henry, Davis Vaughan, and Jenny Bryan Slide deck 1: https://speakerdeck.com/juliasilge/introducing-positron Slide deck 6: https://speakerdeck.com/jennybc/positron-for-r-and-rstudio-users

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome everyone! I am so excited to welcome you here to this session where we get to introduce you to Positron. This is a very early stage project that was just made public about six weeks ago and I am the first of seven speakers in this session. We're each going to speak for roughly ten minutes and at the end we will have time for questions. So as we're going through these talks feel free to add questions to Slido. You should be able to find the Slido code either in the schedule or on Discord and you can direct any question to a specific one of us or just we as a team can answer them there at the end together.

So my name is Julia Silge. I'm an engineering manager here at Posit and I've been working on the Positron team for about the last year. So Positron is a new next-generation data science IDE. IDE stands for Integrated Development Environment and it is a piece of software that allows you as a user to yourself develop software. So here in this example screenshot somebody is writing a report using Quarto and R and there are a ton of IDEs out there in the world and what we're going to do during this session is tell you a little bit about what's new and or different and or special about Positron so that you can know why we're building this and also maybe more importantly whether it might be a good fit for you.

Positron as a data science IDE

The first thing I want to tell you is that Positron is an IDE for data science. We at Posit have just extensive domain knowledge when it comes to data science tooling and we know that someone who is writing code to analyze data is different in some pretty fundamental ways from someone who is writing code for general software engineering purposes like if they're building a website or building a mobile app. So we are huge proponents of code first data science as opposed to a GUI tool or like a no code or a low code solution but it is a fact that tools that are built for a typical software engineer who is writing code they're often not a good fit for someone who is just writing code to analyze data so all across our org pretty much every single thing we build or make or do is informed by how deeply we know this and we believe in this that we can make you as a data practitioner more productive by building tools that are specific for the kinds of tasks that you need to do.

So this is exactly why we're building a new IDE. There isn't anything quite like Positron out there and we hope that it's going to become one platform where you can do all of your data science. The second thing I want to tell you is that Positron is a multilingual or polyglot IDE. Currently as of today it has support for both R and for Python. A lot of environments that are built for you to do data analysis are specifically built just to use one language for scientific computing. So tools that you may have heard of or you know you yourself have used that fall into this category include RStudio, things like MATLAB, IDEs like Spyder. There's a lot of these and there are real limits to these kinds of tools and IDEs especially for the high and growing proportion of you who use more than one language for your data science.

You can use more than one language literally on the same project, maybe R and Rust. You can use more than one language over the course of a week where you switch between projects. Maybe some projects you use with Python, some you use with R. Almost certainly this is going to happen to all of us over the span of years or our career. And so by contrast with the IDEs that are built just for one language, Positron is built with front-end user-facing features that are about the tasks you need to do. So here in this screenshot you see that interacting with your plot, seeing the variables that you have defined. But then there are back-end language packs that are the engines for providing these kinds of features and tasks.

So it's a tool for writing code and exploring data that's going to work well for you no matter what language you use for data science. As of today in its early stages it ships with support for Python and R. But it is built in such a way that it can be extensible. Other data science languages can be added into them whether they exist now or don't even exist yet today. This is really exciting and I know a lot of you in here are wondering should I switch right now? And you're already probably starting to get an idea of what might push you to a yes answer. Like you are one of those people who uses more than one language for data science.

I do want to emphasize that Positron is a very early stage project and there is no need to switch today if you are happy with your code writing editing kind of experience. But in the talks and the rest of this session we specifically chose them and designed them to help you who are sitting here or who are listening to understand if it's a good fit for you today and if you should switch.

Extensibility and familiarity

The third thing I want to highlight about Positron is that it is both extensible and familiar. It's familiar to many of you who are sitting in this room with us right now because it is the design of it is inspired by and informed by our organization's experience building RStudio. Here you see a very RStudio like interface. There's a console, a source editor, help pane, but it is being used with Python instead of with R. For some of you we know this is great news. This is what you have been wanting. This is what you have been looking for. But for some of you this may make you nervous because you may be wondering what is happening to RStudio.

So RStudio is incredibly stable and solid software. It has over a decade plus of real-world use on it and for many of you it is probably the best choice today for your work and that may be true for quite a while yet. So we expect people to use RStudio for their real-world use for many many years to come and we're committed to maintenance for RStudio for the long haul. Positron also is going to feel familiar to those of you who are in this room who have used Visual Studio Code and that's because it is built on the open source components that make up VS Code. You're going to hear more about this a bit later but I just want to highlight one thing about this is that this makes Positron very extensible. Building on top of Code OSS opens up the wide world of Visual Studio Code compatible extensions for users who do want that data science specific IDE kind of experience.

One of the big investments that we needed to make in order to build Positron was a new modern language tooling for R. This already existed for Python and we only had to incorporate it and integrate with it. When it comes to R we had to start from scratch. Positron has a brand new Jupyter kernel for R called ARC which stands for an R kernel and you'll hear more about ARC later but it is software that allows R to talk back and forth with a front-end such as Positron. It gets you completions, diagnostics, debugging which is what this screenshot shows. Someone is running their R code in a debugger. Just a ton of different features that improve the experience of writing R code in some exciting ways. With that I'm run through our introduction. I want to thank you so much for walking through me through this introduction.

The Python data science experience in Positron

Hello everyone I am Isabel and I first learned about RStudio and perhaps data science in a course called Intro to Data Science where my professor Dr. Sanchez had us all download RStudio and that first day in that class was kind of magical. You know I got to explore data like in this like data explorer. I could see my variables in a viewer pane. If I was confused I just put a question mark next to what I was confused about and I made my first plot that was not in Excel. I can explore data pretty quickly and this RStudio IDE made this data science experience feel so magical and so accessible and so joyful.

I went on to do more schooling, more learning, more classes and RStudio was a tool that grew with me. I finished my degrees. I took a job doing data sciencey things in Python and I realized that like hey this data science life cycle doesn't change even if the tools I use or even if the language that I use changes. Something that did change was the tools I used and it felt like there wasn't quite an IDE that supported code first data science the way that RStudio did. Some tasks sort of felt like I was stubbing my toe on the corner of a table and okay like stubbing your toe is painful but it doesn't really impact your day until your day involves you stubbing your toe over and over again for about eight hours. I wasn't able to find this IDE experience that not only minimized my pain but maximized my data science productivity.

A few years later I've now hobbled my bruised feet over to helping out with building the Python experience on the Positron IDE. I hope we can convince you that we've built a tool that saves you a bit of pain and maybe brings you a bit of joy.

I hope we can convince you that we've built a tool that saves you a bit of pain and maybe brings you a bit of joy.

So how does Positron support data scientists? As a whole Positron is segmented into kind of these different panes. Up top you have where you are running code and kind of how you're running code. There's a file that shows you the code that you are running, a sandbox for development, and over on the right hand side there's a bunch of different panes to help you explore the code that you're writing. It sets you up for success when you open up Positron to look something like this. You can see my Python script surrounded in pink, you can see a console that's in blue, and you can see these two green squares that have panes that help you understand the variables and the plots that you're creating. We'll call this four-pane data science.

These panes are not intended to replace your code first data science experience but rather give you this UI plus code to make you a little bit more productive and feel a little bit like we're doing some interior direct decorating in your IDE. As you've written Python code before perhaps the first way you have stubbed your toe is trying to figure out how to run the Python you want. Specifically what virtual environment are you using? How are you selecting it? How are you activating it? How do you make sure you're running it in the right place? And you can see in Positron there's something called an interpreter selector. You might recognize here a few different ways to run your favorite Python, whether using Conda, pyenv, venv, poetry, virtualenv, or something else. When you click that power button Positron will activate and turn on this Python interpreter for you all across the IDE.

If you want to swap between you can just turn it off, select a different one, and it just kind of works. Positron it discovers all of your virtual environments you have on your computer, you're able to browse between them, and it will also remember per project which virtual environment you have been using and automatically start it up for you each time. And don't fear for the R folks in the crowd, I'm talking about Positron with Python here today, but everything I'm talking about applies to R as well if you have multiple versions installed. The drop-down only is going to populate with the languages you use, so if you only have R installed on your computer you're not going to see Python at all, we're only going to show you what you want.

And a pretty common workflow for data scientists is, you know, creating a Jupyter Notebook, doing all your exploration, and maybe copying this out into a Python script as some sort of artifact for a pipeline. Or perhaps you're running IPython in a terminal and again copying and pasting things over. It's all to say that data science is very experimental, iterative work, and it's convenient to have a tool that understands this is the data science development lifecycle. Positron does support running Jupyter Notebooks, but if you're someone who works for in Python scripts, we have an interactive built-in console. You can use command enter to interact to iterate through a script, you can move back between this console and your script, it kind of gives you a sandbox to play in. You're able to explore your data more deeply and bring the final results back into some sort of reproducible script, perhaps part of a pipeline. And the console has all the affordances you would expect, you're able to use things like autocomplete.

And with the right Python running, we're already off to a good success story. But the first thing you're going to want to do in your data science lifecycle is import some data. And one thing you might be wondering as you're building this database connection is like, what tables do I have on hand? And in this four-pane data science, one of the panes we have, P-A-N-E-S, not like subbing your toe pane, is when you connect to a database, there's a connections pane. So when you create this connections object in something like SQLite, you're able to see all of the tables, all of the rows, all the columns, and their types at a really high-level glance very quickly. It makes databases just a little bit easier to navigate.

With some data in hand, you're probably going to want to tidy it up. And the first question you're probably asking yourself is, what does this data even look like? There's a few different ways that Positron allows you to explore your data. If you're viewing data from a database, there's this little eye that pops up if you hover, or there's a table that shows up in this variables, and you can even just type view data in your console. All three of those locations will bring you to the Data Explorer. When you open up your data, whether it's from a Polar's or a Panda's data frame or a database, you can see the percentage missing per column. The name you can see in this data is never missing, where our Ferris wheels that we're looking at almost are never closed. If you want to do some really quick looking at your data, you're able to filter. You can sort by ascending or descending, or maybe you just want to look at the Ferris wheels that are between 150 and 200 feet tall.

If you know what data you've collected, you're going to want to understand it a little bit better. And sometimes tidying is messy business. It involves creating lots and lots of variables, and while for a while you can keep them all in your brain, at some point it just is too much information to keep up here. There's something called a variables pane. As you're creating each variable, it shows up and it gives you a little bit of information about it. You can see kind of on this most left-hand side, it says the name of the variable. In the middle, it'll give you either a preview of the data, the actual value of the variable itself, or some sort of other helpful information. And furthest to the right, you either get an icon that you can click on and interact with, like opening this up in the data viewer, or it'll give you the type of variable it is.

We also allow you to expand variables. So let's say we have this dictionary and we want to see maybe what all of the names are, and we can quickly look at this without having to type out in the console, and allows you to just, you know, search through your information a little bit more fluently. We're also able to see functions with a function signature, and if it's something that's custom, like perhaps a new Ferris wheel class, there's reasonable defaults that allow you to understand what you have in your environment.

And so countless times, as you're making all of these variables, I've run into a time where I like there's a function or method that I'm not quite sure what to do with, and I'm a little confused, like what does this parameter mean, and what method should I use? And normally my workflow involves going to the web, finding the documentation, finding the function I want inside the documentation, scrolling through documentation, perhaps getting distracted partway through, ending up watching cat videos, you know, who knows? And that was something I loved in RStudio and I couldn't quite find in the Python world. So Positron allows you to get help for functions, classes, or modules in Python just by adding a question mark next to it. It'll send you directly to the documentation in the help pane, and it gives you kind of that quick and easy information on hand directly for what you want. It might feel familiar for people coming from RStudio, you guys already had a fantastic help pane, but this isn't something that Python users are used to, so when I tell you this was mind-blowing to me, this is cool stuff here.

And after, you know, you've built all your variables, it's time to do some plotting, and nobody makes beautiful plots on their first try. If you do, maybe you've transcended IDEs, I'm not quite sure, it looks something like this in Positron. There's a plots pane that, as you iterate, as you generate these codes in the console, they show up in the plots pane, and you get this filmstrip across the bottom. Between each plot, you can see, you know, where you've come from, what works, what doesn't, as you're trying to build the beautiful plot to explain your data to others. Or, if you're someone who wants to continue on the same plot object, you can see it automatically update in the plots pane. You can see here, like we're adding different colors, we're adding different labels, and you can also resize and export your plot from the plots pane into a variety of formats. So, you can see we can make it a square, and we can export it as a PNG, JPEG, SVG, or PDF.

And finally, it's time to communicate this with others. So, how can we share the beautiful data science artifacts that we're creating to others? We have something called a viewer pane. This is also something that might be familiar if you're coming from RStudio, and this plays nicely with the VS Code extensions that you might have already seen and loved, like the Quarto extension. This is interactive HTML that automatically gets populated in the viewer pane. This is a Quarto extension, but there's also Shiny extensions. So, when you run your Shiny app, it'll pop up. You can interact with this directly in the viewer pane. You can also see other things like APIs, like great tables tables, and a variety of other HTML documents. So, Positron is doing a little bit of interior decorating in your IDE space. It's adding in new furniture and taking out that table that you keep stubbing your toe on. Hopefully, this allows you to walk through your data a little bit more fluently and avoid a little bit of pain. We make these tools to help you because we know data science can be messy business. It comes with its own pain.

The data explorer

My name is Tom Mock. I'm a product manager for Positron, and I'm excited to talk a little bit more deeply about one little area of Positron and how much focus we put on the little design principles within it. So, when you first start opening your data, you might want something like a data viewer, right? In RStudio, there's a data viewer. You can see your grid, or you might be opening things in tools like Excel because you want to kind of walk through your data or feel the data or explore individual cells as opposed to doing group by operations or summarization. So, of course, we built this into Positron and made it available for both Python and R. But we didn't want to stop with just a data viewer. So, we've actually built what we call a next-generation data explorer.

Of course, it still has that data viewer grid that people are looking for in terms of individual cells and the interaction of an indice, a column, and details about them. But we also have things like a summary panel showing you summary statistics that are appropriate for that data, you know, measures of central tendency for numeric data or unique values for strings or categorical data. There's also missing data and column names there as well as their type. And lastly, we also have a persistent filter bar that allows you to create, delete, and see the filters you've applied to your data as you're exploring it.

Now, of course, this is not intended to replace your code-first data science, but be used with your in-memory R or Python data frames. So, let's talk a little bit about that. For the grid design, it's, again, intended to be this polyglot tool. So, it's just as good in R as it is for Python. You can expose these data frames into the data explorer by calling view in R or Python, and that will then populate it into the data explorer or using the workflow that Isabel showed with clicking the table button. What this means is that the code that you write in R or Python also updates the view in the data explorer. So, here, I overwrite a column and change the year from 2013 to 2014, which affects the variable column as well as the summary statistics. So, this is reactive to the changes you make in code as you move back and forth between them.

Beyond that, though, we've tried to make it something that's highly scalable. So, again, in this example, I have 31 million rows, 24 columns, and in real time, I can scroll horizontally or vertically across anything in the data frame, and it will kind of instantly load that data. We do a lot of clever things with caching, fetching different windows of data to try and make it efficient in this way. But we've also sweated a little bit of the small stuff. So, things like sorting. It's not just one column. It's multi-sorting. So, you can sort multiple columns at a time and show which order you've sorted them in and remove different sorting as needed so you can modify your data ephemerally in this way. We've used model space fonts in the grid to show you things like nice decimal alignment, problematic things like white space, or differentiating simply between one, I, and L of various capitalization. And then, lastly, for the grid, we've worked really hard on automatic column width so you can fit as much data horizontally and vertically as possible as opposed to these really, really wide columns when you've got a short variable name and short values.

So, the grid's great, but what about this filter bar? We've intentionally decided to add a filter bar that's outside of the data viewer because it allows you to see the filters you've created no matter where you are in your data. You can imagine I might have 100 columns or 1,000 columns, and I might forget I made a filter on column 1. So, we want you to be able to see them and delete them as needed. But you can always quick add a filter, you know, at one specific column and have it pre-populate that column name. This also gives us the ability to have more rich filtering. So, not just I'm filtering on a single value, but for something like strings it contains or starts with, or even a regex that says, like, okay, this match. And for numbers, we have typical Boolean numeric operations, less than, greater than, or even between two values. So, a lot of power here in how we built the filters.

And lastly, more closing out, we have the summary panel, which again has a lot of different component parts. We have the type. So, as a symbol, we can show you the type and the column name. If you double click on the column name, it'll instantly move you to that column within the Data Explorer. We show you missing data. So, even without having to do an operation, you can see, oh, this column is 100% missing, or it's got a few missing values that I can work on. And then there's summary statistics here that show you appropriate things for strings or numbers or the various different data types. So, this has been a quick run through, again, trying to sweat some of the small stuff in terms of what the Data Explorer can offer you. We're also working on Sparkline histograms and things within the summary panel, and in the future, more direct connections with remote databases. And thank you so much for the time.

How Positron works under the covers

Hello. My name is Jonathan. I'm a software architect here at Posit. And over the last few talks, we have looked into a lot of Positron's features. Today, I'm going to be talking to you a little bit about how Positron works under the covers. So, by the end of this talk, you will understand the following. You'll understand why we chose to build something new. You'll understand what the big pieces of Positron are. And finally, you will understand how they work together.

So, let me start out with the elephant in the room, which is, why didn't you just build something on top of RStudio? So, to answer that question, we need to kind of go back to our goal from Positron, which Julia talked about a bit. We wanted to make an IDE that was truly multilingual, polyglot IDE, that's focused on data science. And I want to focus in a little bit on that phrase, polyglot IDE. So, to understand why it's very difficult to build a truly polyglot IDE on the foundation of RStudio, you need to understand a little bit about how RStudio itself works. So, in RStudio, it's basically a two-process system. Everything that is inside the RStudio window comes from one process that runs R, all the computations that you do. It also takes care of saving files, serving up the HTML UI, and doing everything else.

This, by the way, is why a lot of RStudio features are slow or don't work well while R is busy. The only other piece of RStudio is Electron, which runs as the Chrome outside the window. It draws the window frame as well as the UI. Here's another view of the same architecture in boxes this time. And this can kind of give you an intuition for why it's very difficult to add another language subsystem here. There's no place to put it. You can run Python in RStudio, but when you do, it has to run inside of R. So, you can see why this is a really challenging environment on which to build a truly polyglot data science tool.

Let's look at the other part of that phrase. We want a polyglot IDE that is focused on data science. You know, people love RStudio, but we get a lot of comments from people saying things like, you know, I love RStudio. RStudio is really great, but. It's really great, but can I please, please pop out individual panes? RStudio is really great, but can I please have special level control over my themes? You know, could I please, please find files, hidden directories? Could I please save and load custom pane layouts? Could I please have shortcuts? Could I please have gate stash? Could I please have custom BIM key bindings? Could I please have a tree view?

My goodness, we love all these suggestions, but there's a lot of them, right? And these things all have something in common, which is that these are not data science features. These are IDE features, right? And so, with every release of RStudio, we've had to make tough choices about whether we invest in those IDE features or whether we bring data science specific features.

And that's why we made the difficult decision not to build on the RStudio platform when we started the Positron project.

Positron as a Code OSS fork

Positron is a fork of another project called Code OSS. And you can be forgiven for not knowing what Code OSS is, but I promise you, you do know what it is, because Code OSS is the foundation for a proprietary IDE that almost all of you have used, which is called Visual Studio Code.

We are not the only people to fork Code OSS. Many other companies have forked Code OSS to make IDEs that are specific for other purposes. For example, there's a company called Cursor that makes an IDE called the Cursor IDE with a lot of AI features. So, we are doing something very similar with Positron. We are taking the Code OSS foundation and adding data science tools to it.

And we've already covered this a little bit in the session, but what that looks like is taking that powerful IDE foundation and adding tools like environment selections, plots and visualization pane, a really first-class console, an amazing data explorer, and so forth.

So, this brings me to the second elephant in the room, which is, why isn't Positron a VS Code extension? So, to answer that question, I've got my good friend Twilight Sparkle here. So, Microsoft learned a lot from their experience with other IDEs. IDEs like Atom, for example, and Emacs, they allow you to load extensions and they become kind of part of the IDE. This can lead to a lot of clutter and slowness and instability. And so, when Microsoft built Code OSS, they learned to put the extensions in a sandbox and run them separately from the IDE.

The other thing Microsoft learned is that when you give people the ability to do anything you want with extensions, they just can't help themselves. It is a mess, right? And so, Microsoft learned their lesson and Visual Studio Code extensions are fairly limited. They can't contribute custom UI to the Workbench service. They run on that separate sandbox process. They're ephemeral. They can be restarted at any time. They stop running when the browser is closed. And finally, probably most importantly for us, they don't make sense as a foundation for other extensions, right? R and Python and these other language packs are extensions to Positron. And it's very difficult to build a horizontal layer that provides that extensibility when you are yourself running in the sandbox.

So, finally, let's talk a little bit about how R and Python talk to these Positron features. So, R and Python are extensions. We call them language packs for Positron. And they talk to the code OSS features in the same way that most other extensions do. So, VS Code has an API, which is what allows extensions to contribute things like auto-completion and diagnostics. So, you've seen a lot of Positron's specific features and we've added a separate parallel API that allows the extensions to not only contribute code completion and diagnostics, but also plug into our console, our plots pane, and the rest of our tools.

It's important to note, then, that those R and Python sessions live in extensions. And this is important not only for extensibility, but also for this reason. How many of you have seen this screen before? So, this is what happens when R crashes in RStudio. And the reason that it crashes the whole window is because, as we just established, that window is itself R. Everything is coming from R. So, this is what an R crash looks like in RStudio. Here is what an R crash looks like in Positron. It's nothing. It is horribly obliterated. You will lose anything you've got in memory because your R process crashed, but the rest of the IDE will continue to run very smoothly.

Now, the final reason that it's really important to have all these things in extensions is that this allows Positron to be extended to other languages in the future. And this is something Julia alluded to really well, which is that we want Positron to be a foundation upon which more data science tooling can be built. And all of the R and Python code that we have written plugs into a public API that anybody can write for. So, it's not tightly integrated into the core. Everything in the Positron core can be added to by anybody via a public API.

So, we have Positron and we have these language packs. I want to talk for just a minute about how those two things talk to each other. So, when we built Positron, we wanted to use standard protocols for standard functionality. And what that means is that when those R and language subsystems talk to Positron, they use protocols that have already been established in the data science community. When we run an execute code, we are using the Jupyter protocol. When we do completion and diagnostics, we're using the language server protocol. We've only added our own protocols when we have Positron specific features that we need to power.

So, quick recap. It is a polyglot IDE. It's focused on data science. It's a code OSS fork like VS Code is. Those extensions live with language features. We use existing protocols wherever we can to make it easy for other people to plug in. And we have a public API that lets anyone add new languages or features.

ARC: the R kernel for Positron

Hello, my name is Lionel, and I'm here with my colleague Davis to talk to you about ARC, which is a component in Positron. So, Davis and I are from the Tide Divers group, and we've been working for a number of years on packages like dplyr. And for the last year, a bit more than that, we've been working on Positron with the mission to make it easier for other people to use Positron. To help making R support the best it can be in Positron. So, we've seen with Jonathan that Positron is a multilingual IDE. And we've seen with Isabel that we've broadened our scope to include Python. And so, the question that we want to answer in this talk is, where does that leave us with R?

And the answer is that we wanted to make the R support the most modern it could be, and the result is ARC. So, when you use R in Positron, ARC is behind most of what's happening. So, if you type R code and get some completions, ARC is providing the completions. If you need to execute a piece of R code and get some output from R, that goes through ARC as well.

So, what does it take to make a great development experience for R? There are three main things that ARC is concerned with. First, code execution, so sourcing of script or executing a piece of code. Code assistance, so that ranges anything from completions to diagnostics to let you know that there might be a problem in your code even before you execute it. Or contextual help. And the final piece is the debugger. We want to make the debugging experience in Positron the least frustrating it can be.

And each of these areas is backed by a standard protocol, and we'll see in this talk why this is important and why it matters. And we'll start with execution.

So, here you have a classic Positron setup. You have an open R file with a bunch of code. You have a console and a plot pane. And the user here has created a data frame, got some output in a console, and created a plot. So, where does ARC fit in this picture? So, you can think of code that you execute in Positron as a question that Positron asks to ARC. ARC does a bit of magic and then provides an answer back to Positron, which is able to display the output in the console. The same thing when you create a plot. ARC does a bit of magic after getting the question from Positron and provides the answer back to Positron.

So, the question is, how does this work? What is the language that ARC and Positron use to communicate with each other? And to answer this, we'll take a step back and look at what's going on in the Jupyter world. So, here we have a Jupyter notebook. If you're not familiar with this way of doing data science, you have a document made up of cells. You can write code in the cells and you execute these cells and you get the output included in the document. So, this is very popular in the Python community. But the thing about Jupyter is that it works with multiple languages. It's multilingual, just like Positron. And it's even in the name. So, if you did not know, Jupyter stands for Julia, Python, and R.

And so, the way they pull this off is not by implementing support for all these languages. They have independent backends called kernels. And so, when a Jupyter notebook needs to execute a piece of code, it's like a question. The kernel gets the question, provides an answer back to the notebook. And Jupyter came up with a standard set of questions and answers. And in this case, it's an execute request that is asked to the Jupyter kernel and the kernel provides an execute response back. And that's exactly what we needed to communicate between Positron and Arc. And so, Arc is a Jupyter kernel. And actually, in this notebook, this is Arc running in the background, running the R code for the notebook.

And so, in Positron, it's the same thing. You run a piece of R code. This is an execute request from the Jupyter protocol. And Arc provides back an execute response. And the nice thing about this is that it doesn't have to be a Jupyter notebook. It doesn't have to be Positron. It can be any project that implements the Jupyter protocol. So, a few weeks ago, we were super excited when we've seen that the Z people saw this new idea written in Rust that implements many interesting ideas. And a few weeks ago, they've implemented support for Jupyter notebooks and they made sure that Arc would work great in Z. And so, here you have R code running in Z. Using Arc. And we hope that many more projects will be able to do the same in the future.

Code assistance and the debugger

While code execution is certainly a very important part of this, it's not the only piece of the puzzle, or not all pieces of the puzzle. We also think that it's really important to have fantastic code assistance inside of Positron. Now, code assistance just means anything that makes it easier for you to write R code. That could look like diagnostics, where we analyze all the files in your project and can pop up a little tooltip that says, hey, we think something's wrong here. You might want to take a look. Code assistance could also look like a function signature, where you're working with SliceMax. You don't necessarily need to remember all of the arguments. We can pop up a signature and say, here, hey, here's what they are.

Assistance could also look like jump to definition. Jump to definition looks like this, where you can click on SliceMax, and that takes you straight through to the definition of the function inside of your project. Now, these are features that you might have been familiar with in RStudio, but Positron has a number of code assistance features that are unique to it as well. One of those is help on hover, which I really like. It looks like this, where you hover over SliceMax, and you get this really great inline documentation immediately. This is great whenever you're actually just kind of in the flow state of writing R code. You don't necessarily need a whole help pane. You just kind of need to quickly remember how something works. You hover over it, get what you need to know, and you go back to work.

Now, these features in Positron work in a very similar way as executing code. Code execution worked through the Jupyter protocol. It allowed us to ask questions and send back answers, and code assistance works in the same way. With the Jupyter protocol, that really allowed us to standardize these questions and answers. With code assistance, we used the language server protocol, which standardizes the questions and answers in a very similar way. We ask a hover request of Arc, and Arc can send back a hover response in return. The reason that, again, that is so cool is because Arc doesn't care who's on the other side. It doesn't care who's asking the question. It could be Positron, it could be a Jupyter notebook, it could be some other IDE. As long as it implements the other half of the language server protocol, you're in business.

Over on the right-hand side here, our magic 8-ball, Arc, is doing quite a bit of work to actually send back these responses. Part of the smarts of Arc is wrapped up in a technology called TreeSitter. The only thing you need to know about TreeSitter in this talk is that it takes in all of the R code in your project. We run it through TreeSitter, and it spits out a structured representation of that code, which honestly doesn't even look like R at this point. However, we can use that and power all of the assistance features that we've been talking about today. Now, we think TreeSitter is really cool as a project because it works with any language. As long as you have a grammar that teaches TreeSitter how to work with that language, then you're good to go. The problem was we didn't have an R grammar for TreeSitter, so we had to build one.

Now, we've built it to work with Arc and Positron, but as it turns out, there's actually a lot of community benefits to having an R grammar for TreeSitter. For example, if you were to bring that R grammar over into Zed or NeoVim or other editors that support TreeSitter, it can power things like syntax highlighting or indentation immediately over there as well. And there's also one other big player that uses TreeSitter. It's GitHub. If you've been on GitHub in the past couple weeks, and you've done a search, say, for case when on dplyr, you may have noticed that the search results for R have gotten really, really good. In this case, we found a function called case when. If you click on that function, it will jump you straight to the definition in the project. This is just like jump to definition in Positron itself. It's a great way to actually explore a new project without having to pull it down completely. The reason this works is because behind the scenes, GitHub code search uses TreeSitter. Now, up until a few weeks ago, this did not work for R because there was no R grammar for TreeSitter. We built one. So we were able to work with the team at GitHub on their code search engine and get them to incorporate our R grammar into their code search engine. And as of a couple weeks ago, it doesn't just work for dplyr. It works for all R projects, cost GitHub.

I have one more thing about Arc that I want to talk about today. That's the debugger. Now, if you've never used a debugger before, you probably have done something like credit debugging. That's the idea of sticking in a whole bunch of print statements inside of your functions, rerunning it again. Hopefully, you get some more information out of it and can figure out where the problem is. A debugger is one more step beyond that. You actually get to step through that code one line at a time, stop at any point like we've done here inside of mutate, and then kind of investigate the world around you. In this case, you could use the debug variables pane and see what variables exist at this point in time and what their values are.

It probably won't surprise you to hear at this point that we've implemented the debugger through just another set of standard questions and answers. For example, Positron might ask Arc, hey, what variables exist at this point in time? Arc sends back a standard answer and that populates this variables pane you see here. Now, with code execution, we use the Jupyter protocol. With code assistance, we use the language server protocol. With the debugger, we use a third called the debug adapter. Now, I have one demo of the debugger that I'd like to show because I think it's super cool. This is the most mind-blowing feature.

So, I'm a tidyverse developer, so I work on dplyr quite a bit, and that means I debug mutate quite a bit. So, for better or worse. So, I have gone ahead and set a breakpoint deep inside of mutate here. We can trigger that breakpoint by running the code. That drops us directly inside of mutate. We can do a couple of steps here to step through the code. You'll see the debug variables pane populates and you get your very important variables here and their values. But I actually want to talk about where we've stopped. We've stopped on a very important place. It's the boundary between the R code of dplyr and the C++ code of dplyr.

As tidyverse developers, we write a lot of C and C++ because we want the tidyverse to feel as fast as possible for you all. We know we're not alone. We know that you write a lot of packages with rcvp, cpp11, because you want your packages to feel fast too. But that typically comes with trade-offs. While you get performance, you end up losing a lot in terms of debuggability. You can't step through the code near as easily like in RStudio like you can with R, and you kind of end up resorting back to print debugging like we talked about before.

However, in Positron, you can do this. One more step, and we've stepped directly from the R code of dplyr into the C++ code of dplyr. This is absolutely insane. It's something you've never had before, and you have all the same tools available. The debug variable is paying. You can step through the code. You can look at the call stack. You have a debug console. Everything is just there. The way that this works is because on the R side, we implemented the debug adapter protocol. Someone else has implemented the C side of the debug adapter protocol. As they use the same set of standard questions and answers, they just work together. You can step from R to C and even back, and everything just works. It's pretty magic. That is the power of using the modern and standard tooling inside of R.

One more step, and we've stepped directly from the R code of dplyr into the C++ code of dplyr. This is absolutely insane. It's something you've never had before, and you have all the same tools available.

So Arc knows a lot about R, and it does a lot to support the workflow of R users in Positron. It does this with standard protocols and technologies. We've seen three ways in which using standards like this allowed us to contribute to the development of the R code. Allowed us to contribute to the community and reach a larger audience. So we've seen how R can be used not only in Positron, not only in Jupyter Notebooks, but in any front-end that implement these protocols like the ZID. We've seen how by providing an R grammar for TreeSitter, we were able to implement tons of useful features for Arc and Positron, but also contribute to other projects like GitHub and all of the users. And finally, we've seen how as Thai-diverse developers working with Positron, we were able to benefit from a C and C++ debugger implemented by another group in the same way that our R debugger will benefit other projects.

And so that's the great thing about using these standards is that the functionality is not locked in. So you can be a Jupyter Notebook user and use Arc and benefit from all this functionality in the same way that Positron users are. And so on a personal note, when I learned that Positron would implement R support in this way, I was super excited. Because despite having worked for RStudio for a long time now, now Posit, I've never actually been an RStudio user. So when I was a student, I learned to use Emacs. And when I joined RStudio, this was a tool I was familiar with. So I kept using it to develop Thai-diverse packages.

But I was a bit jealous of my colleagues that were using RStudio because obviously the R support is very good in RStudio. And completions, for instance, were much better than what I had in Emacs. So with Arc, I immediately saw that I could write a bit of glue and having Emacs talk to Arc and finally enjoy the same level of functionality as my colleagues. So that's not what happened in the end. I switched from Emacs to Positron. And with the ecosystem of VS Code extensions, I don't miss much from my previous setup.