
Positron: An IDE Specialized For Data Science
Dr. Julia Silge, Engineering Manager at Posit, joins @JonKrohnLearns to introduce Positron, a fresh open-source IDE that’s perfect for exploratory data analysis and visualization. She also lays out her top picks for LLMs that boost coding efficiency and discusses when traditional NLP methods might be the smarter choice over LLMs. Plus, Julia highlights some must-know open-source libraries that make managing MLOps easier than ever. Tune in for insights that every data scientist, ML engineer, and developer will find useful. Watch the full interview “817: The Positron IDE, Tidy NLP and MLOps — with Dr. Julia Silge” here: https://www.superdatascience.com/817
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
The most exciting thing that you're working on right now is that as an engineering manager for Posit, which formerly known as RStudio, and the makers of RStudio, you're now working there as an engineering manager and your project that you're leading the development of is something called Positron, which is described as a next generation IDE, integrated development environment, for data science. So that is what RStudio was many years ago.
I mean, I was using RStudio since 2007, that I can kind of at least since then. And it was definitely my go to IDE when I was primarily an R developer back then, an R data scientist, although I guess I wouldn't have used the word data scientist in 2007.
And so with Positron, what are the gaps or limitations that you're addressing that aren't covered by things like RStudio, VS Code, or Jupyter Notebooks, which might be the go to IDEs for data scientists or software developers today?
What Positron is trying to solve
If I was going to sum up the one gap I feel like that Positron is working to address, it's that there isn't something out there right now that can be one place you go to do all your data science. So Positron is not a general purpose IDE. It is specifically an IDE built to do data science.
And I come from a science background, and I've always been someone who wrote code for my data analysis. But I've always really felt that my needs were a little different than someone who is writing general purpose code, like to build a website or to make a mobile app. People who write code to analyze data are different in some real ways.
It's not that it's like they're worse coders or like, no, no, I really do think that. I don't think it is that people who write code to analyze data do a worse job writing code. It's that their needs are different and that they're writing code in a different way.
So folks who have been, for example, who have been using VS Code as a data science IDE have really felt that tension where they're like, this is really general purpose. And instead, I'm trying to kind of customizing it using extensions to fit my needs. So Positron is meant to specifically be a data science IDE.
So Positron is meant to specifically be a data science IDE.
A polyglot IDE
Positron is also like a real driving reason why we've built it the way it is, is that it is a multilingual or polyglot IDE. A lot of the environments you might download to do scientific computing or data science or data analysis are built specifically for one language. So I know all of us have used these. So RStudio is an example of one of these, like MATLAB, Spyder, there are a lot of environments in which you would do data analysis that are just built for one language.
And increasingly, I just think that's not how many people, that's not how as many people work. Many, many people use multiple languages, whether it's on one project that literally uses multiple languages over the course of a week, they pick up different projects that use different languages or almost certainly on the span of years or your career, you use different languages because things change in our ecosystem.
Like you said, you started with R and now you use other languages. There are so many people who use combinations of R and Rust, or they work on projects that's like Python plus front-end kind of technologies, JavaScript, HTML, et cetera, or like almost any data science language plus SQL. Very few people, an IDE that is built to use one language, for very few people, is that really going to fit all of the needs that they have over the course of a week, a month, or multiple years.
So Positron is built with a design such that the front-end user phasing features are about the tasks you need to do, like whether that is interactively write code, whether that's dealing with your plots, whether that's seeing, exploring your data in a visual way. And then there are backend language packs that provide the engines for those front-end features.
So Positron, it's very early days for Positron. We only made it public about six weeks ago, as of the day we're recording this. So it is currently shipping with support for Python and R, but it is designed in such a way that other data science languages can be added, because there's a separation between the front-end features and what is driving them. So we look forward to adding support for other languages as we collaborate with other data science communities, or new things come up, like new exciting ways of doing data science come up.
How data scientists work differently
So the polyglot IDE part to me, that makes a huge amount of sense. I get that especially as a contrast to RStudio. For people who are writing code as a data analyst or data scientist, people who are working with data, what is different that we need specifically relative to another software developer?
I think one piece that is very different is that the process of writing code is more exploratory, is more interactive. And that's not wrong or bad. That is actually just the fact that instead of getting a spec from a product manager and building a product, like that's not what data scientists, data analysts do. You start with data and you often don't know what you can or should do in detail until you start that process.
And if you have a code writing process that is more exploratory, you need more supports for writing in that interactive exploratory mode. Some things that support that are things like a truly, truly fully featured interactive console. Of course that does exist in various ways. People get at that in various ways, like when they use notebooks or using say a Python REPL.
But if you get to a truly, fully featured interactive console where what happens in the console is then reflected in the rest of where you're working, like say in Positron we have what we call a variables pane. If you come from our studio, you may be familiar with something called an environment pane where you see all the things you've... And it updates as you change things or the plots that you see. You have them all right there. You can scroll through them. If you change and make a new plot, you see it pop up there. And you have that really interactive way of working.
Some of the other things that I know really make a difference for people help inside of the IDE where you are working. So you don't... You know, you're working along, you're like, ah, wait, what is the function signature? Or maybe I want to look at the docs for this. So instead of having to get out of a flow state and go somewhere else and read docs, like on the website, you can open up help right there and copy paste, like go right back and forth and stay in that kind of flow state.
Another thing is you're building interactive apps. You need a way to have that right there and that it updates as you change your code versus having some sort of build process, going somewhere, looking at a browser. There's really quite a lot that if we put it together, we can make people more productive.
The company that I work for, Posit, the company formerly known as RStudio, Posit, it's a really fun place to work as someone who likes thinking about the process that people bring to their tasks. Because I do think, like, we are huge believers in code-first data science. Not no-code solutions, not GUI-based tool. People who do data work should be writing code. And at the same time, their needs are different, like their needs are different.
And so we can, like pretty much every single thing my company does is like deeply informed by this belief or like how deeply we know that data practitioners are different and that's good and fine and we can make them more productive by building tools that are specifically for the kinds of tasks they need to do.
Because I do think, like, we are huge believers in code-first data science. Not no-code solutions, not GUI-based tool. People who do data work should be writing code.
Open source and licensing
And yes, so not only do you have Code OSS as a kind of a backbone that's providing building blocks for Positron and offering that kind of extensibility through all the VS Code extensions, like you mentioned Databricks there, any number of extensions that people might want to be able to import. Rainbow tabs, whatever you want, it's out there.
In addition to all those things, the Positron project itself is open source. So if people are listening and they want to be contributing, right now at the time of recording there are 27 people, including I can see your face as one of those GitHub contributors. So any of those contributors can go and they can contribute to this developing and very exciting project.
So Positron is licensed such that it is source available. Anyone can come and look at the source, change it, contribute to it, and it is also licensed such that it is free to use for, including for commercial purposes. Like you can use it, of course, in academia for personal projects, but you can use it at work. So it's licensed in such a way that it is free to use in your work as a data scientist, data analyst. So it's free to get it, so you can read the code. That's really, there's real benefit to that kind of model for building and making software.


