Resources

Keynote: Julia Silge - The Right Tool for the Job | SciPy 2024

There are many programming languages that we might choose for scientific computing, and we each bring a complex set of preferences and experiences to such a decision. There are significant barriers to learning about other programming languages outside our comfort zone, and seeing another person or community make a different choice can be baffling. In this talk, hear about the costs that arise from exploring or using multiple programming languages, what we can gain by being open to different languages, and how curiosity and interest in other programming languages supports sharing across communities. We’ll explore these three points with practical examples from software built for flexible storage and model deployment, as well as a brand new project for scientific computing. Julia Silge is a data scientist and engineering manager at Posit PBC, where she leads a team of developers building fluent, cohesive open source software for data science in Python and R. She is a tool builder, author, international keynote speaker, and real-world data science practitioner. She holds a PhD in astrophysics and serves on the technical advisory committee of the US Bureau of Labor Statistics. You can find her online at her blog and YouTube

Aug 20, 2024
46 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Julia is a data scientist and engineering manager at Posit PBC, where she leads a team of developers building fluent, cohesive, open-source software for data science and Python and R. She's a tool builder, author, international keynote speaker, and real-world data science practitioner. She holds a PhD in astrophysics and serves on the Technical Advisory Committee of the U.S. Bureau of Labor Statistics. You can find her online on her blog, which is quite excellent, and YouTube.

I am super happy to be here at what is my first SciPy. I've had multiple people that I have worked with tell me it is their favorite conference, so I'm really happy to be able to be here and join you today and to speak specifically about the process of using multiple programming languages for scientific computing and how we can explore or start to identify what is the right tool for the job.

Before I get a little deeper into it, I want to tell you a little bit about the path that brought me here to be standing in front of all of you talking about this. My academic background is in physics and astronomy, but I came up through those fields before the dominance of Python in these fields and in scientific computing in general. If I were maybe 5 to 10 years younger than I am, the tooling that we are all here to talk about would be my tooling, and you would be my people, but I started in these fields before that era, and so I and the people around me when I was in academic departments doing physics and astronomy, we wrote a lot of C. We wrote a lot of FORTRAN. We wrote a lot of bespoke code that in hindsight really does make a lot more sense to have in a well-maintained community resource like AstroPy and that whole ecosystem. There were people around me who wrote IRAF, who used IRAF and IDL, closed source tools like that.

This was like my first scientific computing experience, my first exposure to how to use computing for scientific purposes. So I was in academia for a while, left, did a few pretty random things in tech companies for a little while, and about 10 years ago made a career transition, made a change in my professional identity to the sort of at the time newly burgeoning field of data science. So I became a data scientist.

As I made this transition, I brought with me my excellent quantitative training that came from being someone who studied physics and astronomy. I brought with me a lot of real-world experience dealing with data, messy data that I collected myself at telescopes, a lot of experience understanding how can I know what kinds of questions I can answer with this data? How can I communicate about this data with data visualization and speaking and writing? However, I did not know at the time modern data science languages, and so as I was making this transition, it was my first time really wrestling with, is it worth it? Do I need to do this? Is this important for me to add additional languages?

So I first learned Python, and I then learned R, and actually it turns out R has proven to be one of the great loves of my professional life, and my exposure to R was such a good fit for how I approached data analysis that it really had a huge impact on how my career then went after that. The work that I'm probably most strongly associated with is my open source, my work in the R world in open source and books and learning materials and that kind of thing.

So I moved into data science. I was a data science practitioner, and I started being involved with tool building on the side. I bet as many of you are, you have one sort of day job, and then you contribute to open source on the side in your own time. About four, four and a half years ago, I made a bit of a pivot again to becoming a full-time tool builder, so spending all my time building tools for data science. So when this pivot happened, this was me changing my job from working in tech as a data scientist to working at a company that was then called RStudio. So I started working at RStudio and started working on open source software for machine learning, for text analysis.

About Posit and multi-language tools

Now about the company that I work for, it's the same company, but it's no longer called that. So the company that I work for is now called Posit. The main motivation for this rebrand or rename is that for a long time, the company that I work for has built software, tools for scientific computing, for data science, that are tools for Python, that are tools for R, of course, that are tools for Julia. It's actually, people don't believe you, honestly. When you say that, when your name is so strongly associated with one particular programming language. So we rebranded, we made this new name to try to be more clear about who we are.

And now when I think about what I spend my time doing and what I see the people around me who I work with, what we spend our time doing, I notice that we now have, you know, in aggregate, like years of experience building tools using multiple programming languages. So they use multiple programming languages in two ways. First, we ourselves internally use multiple languages as we build these tools. You know, my company is probably best known for the RStudio IDE, and if you go to GitHub and you look at the GitHub repo for the RStudio IDE, you can see actually the pretty long list of languages that are used to build that. So we internally as an org have experience needing to use multiple languages together as we collaborate to build the tools that we're building. We also have years of experience building tools for multiple programming languages because the tools we build are meant to be used from multiple programming languages.

So I'm going to share a couple of examples of these that I'll use to illustrate the points as we go along. So the first one I want to talk about is Quarto. So Quarto is a tool for scientific authoring documents that can be used from Python, R, Julia. It works with Observable. So it's a tool that you can use to write a paper or to make a website or to make package documentation or to make slides, actually. The slides that you're looking at are, in fact, created with Quarto, and they're available at that URL down at the bottom if you'd like to check them out afterwards. If you were here on Monday, you may have had the opportunity to go to a tutorial to be able to learn how to use Quarto for scientific communication in Python.

Another one is Shiny, which is a tool for making interactive apps for data, and Shiny has an implementation in Python and an implementation in R. We at Posit, there are folks who work on tools for making publication-ready tables. So if you need a table in a paper you're writing or for the website that you're making, and Great Tables is a tool for Python and for R. And later in the conference, tomorrow, I believe, there's actually a talk here about how to use the Great Tables Python packaging for creating beautiful tables that you need.

So these are all examples of things of which I am a user, but not directly a developer. And then this next set are ones that I myself have been involved directly in working on and really inform how I think about these issues of using multiple languages. The first is Vetiver. So Vetiver is software in Python and R for MLOps. Pins is software in Python and R for versioning, publishing, and sharing data. And then I'm excited to also share some new work that I've been working on recently with folks here.

So these are all sort of the, this is the context, right? Like the context for what are the experiences that have brought me to think about what's hard and what's easy? And what are the things that come up when we use multiple programming languages? As I thought about, as I was thinking about standing here in front of you and talking to you all, like what do I want to talk about with the people at SciPy? What do I feel like I've noticed or learned? There are three things, three things that I want for us to talk about here together. The first is what does it cost when we use multiple programming languages in scientific computing context? The second is what do we gain when we start using these different programming languages? And when we have curiosity and openness to different ways of doing things, what does that allow us to share or to give to the people around us or to the communities around us?

The cost of using multiple languages

So what if, hypothetically, I stood in front of you and I said, you really need to learn another programming language? I'm not. To be clear, I am not saying that. I am not saying that. But what, like, hypothetically, imagine how you would feel, you know? And I think that many of you probably, like, you would feel a reaction. You would feel like maybe a defensiveness. You might feel a certain sense of I'm absolutely not going to do that or, like, I don't have the time to do that. That's because there are huge costs associated with learning new programming languages. It's really expensive. It's expensive in terms of time, energy, concentration, effort. It takes a lot to learn, especially if we're talking about becoming professionally competent in an additional programming language. This is something I myself have done several times through my career, and so I'm very familiar with, like, how challenging it can be and how costly.

I do think these costs, you know, may be changing a little bit in the era of LLM-based coding assistant tools. This is an area, like, I've actually seen them work be quite helpful to people, like, maybe when they know one programming language and then are, you know, want to translate something to another language or learning a new language and are asking, like, hey, how do I do this but in this other language? But, you know, nothing that I've seen from these type of LLM-based tools really changes the fundamental fact of how expensive it is for a person to gain competency in a new entire programming language.

At the same time, there are real benefits to specialization. If you yourself probably have experienced this in your career, like, you have had good things happen to you because of the way that you have specialized in something, or when you build a partnership or a collaboration with someone, you know, often you don't want to have exact overlap in the kinds of skills that you and the other person have, but instead you want your skills to be complementary to another person's skills, so it's not perfect overlap but complementary overlap. So these are real. The costs of learning, the benefits of specialization are really real.

So this is what we kind of experience at the individual level, and when we think about the organization level, these costs kind of get aggregated up, and so if you have everyone in an organization starting to experience these kinds of costs, you start to observe tension and problems around consistency and complexity. So consistency, you know, gets pushed down, and complexity gets pushed up, and, you know, we start to have questions like, well, who can do code review for this other person if we don't have someone who knows that programming language? Or, well, if we have multiple ways of building a model, like, how are we going to go about deploying these different kinds of models?

And these experiences, which are real, right, these experiences that I bet many of us have observed or experienced the way I have, they result in people having a belief or a value that you might express like this. There should be one, and preferably only one, obvious way to do it. This comes from the Zen of Python, and this is a statement that is saying, like, it's really important for us to be consistent. It's really important for us to not have unneeded complexity here.

And I, like, I have experienced these kinds of costs firsthand. An example that comes to mind for this is my work on the pins package. So pins is software in Python and R for versioning, publishing, and sharing data. And it, the example here, this slide shows you how you might write a pin to a board. The sort of metaphor here is that you have, there's a pin board somewhere, and you're pinning things on it, and versioning, there's nice support for getting versions of different datasets, and you can, you know, be specified, for example, here, we're going to write something as a parquet file. So this is what writing looks like, this is what reading looks like. And the, I'm using a toy dummy board here, but in real life, you would use a board that is like an S3 bucket, or a network drive on a high-performance computing cluster, or, you know, Google Cloud Storage, or whatever, and pins is like a friendly user interface for a data science user persona, a data analyst user persona, so that they don't have to, they can, you can switch out back ends in a pretty flexible way and not get, you know, dragged down in the specifics of how things have to be versioned on an S3 bucket versus another kind of, another kind of way of storing data.

Okay, so reading and writing data, you know, and it is explicitly, like a goal of pins is to support interoperability, but we, we observe that there is cost for individuals when they need to do this. So in the slide, I showed writing using Parquet, and Parquet has really, really great features around interoperability. You can read and write Parquet files from Python, and R, and JavaScript, and C, and like all, like all these different ways of reading and writing and getting consistent results. Like that's, that is a point, a whole point of Parquet and Arrow. If you, if you know that already, that's fantastic, and you are set up for success. However, there are stumbling blocks that people kind of hit when they, when they don't have, they have not yet engaged, spent the cost to, to learn what they'll need to do to like really collaborate. And honestly, this even comes up with things like trying to store things as CSV, which you may think is just like, you know, this is plain text, but you can run into problems trying to read and write CSVs from, from Python to other languages, like to R. And, and then, you know, there's just a whole universe of binary for file maps that are quite difficult to open in versus one language versus the other.

So this, so there, if a individual wants to have this kind of collaboration with someone who uses a different programming language for them, they have to face this. They have to face and, and spend, spend this energy and time to get it right. We also have observed cost to our organization. So the, so the pins, Python package and R package are explicitly built to work together. But what that means is that the steps we take to, to write tests, to run those tests on, on a CI system, like we have to scale up the complexity of those projects because interoperability is an explicit goal of them. So if someone is only an expert in, say, Python packaging, they have to, like, there's a cost in that person having to think, OK, I need to also learn about R packaging. I need to also learn about how GitHub actions works for R in a different, in addition to how GitHub actions work for Python. So these costs are real.

What we gain from multiple languages

I am now going to, and to start and talking about what is it that we gain when we use multiple programming languages. And I am going to start back at the organization level because I know that's top of mind when people try to think about, oh, like, will we support using XYZ language in our research group or company or department? So the, at any given time, the people that are in your organization are better at some task using one, one tool versus another tool. And when that person can make the choice of what tool they're going to use, they are, they are going to be more productive. And so when we use multiple programming languages, this makes, this makes everybody more productive.

But this, this dynamic is in tension with the consistency, consistency, complexity thing, like, like, oh, we're having problems around consistency and complexity. This ends up, this ends up being intention or balancing everyone actually being better at the work that they are trying to do. And in my experience, like there is no a priori way of knowing how that balance is going to play out. Like there is, there's no hard and fast rule about how, like which one of those will win in the end. It depends on scale of your, you know, it depends on the, like the specifics of what the infrastructure is like in your organization.

This leading to people to just be like, okay, well, let's figure it out. Let's take a pragmatic attitude to how it is that we, how we choose tools. This leads to like a belief or a value that you might express like this, practicality beats purity. This is also from the Zen of Python, and this is a, this is a, like a reflection of the idea that we can, we can, we can in our specific circumstances decide what works best at this point and not have an overly purist view about tooling or how people should be doing things.

This is really related to this idea of the right tool for the job. And what I observe is that it is, it is very difficult and, you know, like maybe plain impossible to make, to make overarching general statements about what a right tool for a job is. But instead, what the right tool for the job is, is always specific to a circumstance, to a person, even to a time. So the right question isn't what is the right tool for the job, but what is the right tool for the job for this person at this time in this organization?

So the right question isn't what is the right tool for the job, but what is the right tool for the job for this person at this time in this organization?

The project that I worked on that, where I've seen this play out a lot is, is, is Vetiver. So Vetiver is software for Python and R that, for MLOps. So for about the, the process of deploying and operationalizing models. So the, when you say MLOps, people mean a lot of different things. So I'm going to get a little specific here by looking at this diagram of a model lifecycle. So if we start with, you know, we start by collecting data. The first thing we'll need to do when we have that data is to understand and clean it. And there's lots of great open source software, open source tools for understanding and cleaning your data. Next, the next thing you do is it's time to train and evaluate a machine learning model. And again, there's lots of really fantastic open source packages across different languages to, to train and evaluate the model.

At that point, it, the situation becomes much less clear. And that's both because there, there's a lot less community understanding of what, what is the right next thing to do once you have a machine learning model trained. And also because there are fewer open source tools for what needs to happen next. Especially fewer open source tools built for a scientist, data analyst, data scientist type user, as opposed to what I might call a generalist software engineer type user. And this is where Vetiver sits. Vetiver has a opinionated idea of what are the next things you need to do. You need to version your model, deploy your model, and then monitor your model. And also provides, provides functions and infrastructure for, for taking those next steps.

And it's really about this, like giving people the opportunity to choose the right tool for, for what they are needing to do. So person A might decide, oh, I'm going to train a random forest model using Scikit-learn and Python. And they can deploy, they can version, deploy, and monitor their model with Vetiver. And person B might say, oh, I need to create a survival analysis model. And I think the best way to do that is going to be to use tidy models and R. And they also can use Vetiver to, to version, deploy, and monitor their model.

So the, the, what, one thing I really have observed with our, our users of Vetiver, people telling us about how they're using it, is that it allows people to have autonomy to make the best decisions, giving their own domain knowledge about what they know, what they know about their data and their situation. And still actually get back some of that consistency. Like we can, and tools themselves can be designed in such a way so that we can give people the flexibility that increases their production, allows them to make the best decision in the context in which they are working, and also push back against these problems around a consistency and a complexity that arise when people do things in different ways.

Now that is, that's at the organization level, right? Like what is it that we end up gaining when we make, when, how people become more productive? And let's go back to the individual. And again, hypothetically, hypothetically, what if I stood up here and said, you need to learn another programming language, which again, to be clear, I absolutely am not. But like I want to reflect on what happens when people do. What happens when people do take opportunities as they come and, and, and add other things, other languages to their toolkit? What are the things, the benefits that individuals gain and, and, and observe?

The first one is that people scale their impact. If you are someone who can solve a certain problem using a certain toolkit, that's great. But if you are someone who can see a problem and understand different ways of solving it, weigh pros and cons that are specific to your situation, understand how certain stacks connect into other infrastructure, understand like how, how that systems level thinking about a problem, that second person has much more impact in their field, in their organization, in their, in the areas in which they work.

We also, I think it's important to think about the long term when it comes to these kinds of, these kinds of like issues that are individual. I do know some people who have spent their whole career using one programming language and are really happy and fulfilled and successful. But in my experience, that's, that's fairly rare. That's fairly rare. You know, a lot of us at least also use SQL, right, like in addition to the other languages that we use. But I also observe some, I'm going to call them like archetypal arcs, career arcs that people often take that as they consider what do they want to do with their career in the long term.

A common arc I observe is people starting from high level scripting languages like Python and R and moving to the front end, like really becoming experts in JavaScript. And often this is people who are really interested in data visualization or making interactive apps and dashboards. I observe people, you know, kind of going the other way. They will start with a high level scripting language like Python or R and then move lower level to use something like Rust or C or Go. Often these are people who are working on, you know, mathematical methods, machine learning methods, people who are building developer tooling so that you, you know, we have to work at the, at a lower level because maybe the high level scripting language needs to talk to something else and so we need to work at a lower level. My own arc, I don't know, I perceive it to be a little weird, right? Like I started a long time ago with C, I went way to the front end, I was actually paid to write Flash for a little bit, believe it or not. And then like I landed back kind of in this, at the high level scripting language layer, building tools for this layer.

But I think it, like when we can have a long-term view of what our careers are like, we can better understand, more accurately understand what the individual benefits are and whether it's worth it at any given time to learn something else.

And the last thing I want to say here about what individuals gain is about increasing your vocabulary. So the metaphor here is from natural human language, like different languages that we use. If you yourself are bilingual or if you've, you know, dabbled in other languages or, you know, you hear about people learning languages, you probably have come up, will come up against some situation where there's a word that, no, there's a word that correctly, so perfectly expresses some concept in one language only, or not in another language. And you know, we might say like, oh, it's very difficult to translate that word because of how a word is connected to some concept. And you know, people might use that word in, when they're speaking in other languages, because it's so perfectly encapsulates some concept.

This happens in computing. Computing languages are really different from each other and are built with different priorities, really different characteristics. And sometimes some concept in computing is really well expressed in one language. And it, you know, maybe cannot be translated or cannot be translated perfectly to some other language. When we have curiosity and openness to things that are happening, you know, a little bit outside our own communities, that allows us to increase the vocabulary of what we understand we can do using scientific computing.

What we can share across communities

Okay, so we talked about cost. We talked about what we gain. And now I want to talk about what using multiple programming languages allows us to share or to give across both to individuals and to different communities. And I approach this question mostly as a tool builder, because it is in the process of building tools that I have most observed this phenomenon of someone learning from one community and then bringing that thing that they've learned to another community. So let me be a little concrete about what this might look like.

So the main R package documentation generation tool is called Roxygen. And it is pretty directly inspired by the Doxygen tool to create docs that I bet many of you have run into or seen or experienced. R as a programming language does not have interpolated string literals. And people inside the R community observed how great that is, like in Python or TypeScript or other languages. And they're like, let's bring that great idea of interpolated string literals to R. So they built a tool called Glue that gives you that kind of behavior for strings. So those are both examples of things coming to R. But of course, this moves around in all kinds of directions all the time.

I'm going to highlight Quarto here again, because Quarto is a next generation implementation of a kind of working, a way of working that did come from the R community. So the original implementation was called R Markdown. And it is a plain, it's like some of the main characteristics of it are plain text format. So that's in contrast to a Jupyter Notebook, interspersed plain text and executable code chunks. And in the R community, R Markdown was so life changing, like so impactful for people as they worked, that it motivated, like, let's build this tool in a way so that it's not specific to R, but actually can be applied to different kinds of computing languages. And can be used for all kinds of different purposes. Like I use Quarto to write my blog. I use Quarto to make my slides. And you can do this from Python, R, Julia, like all these different ways of using this.

Introducing Positron

So there's all these examples of things moving around all the time. Looking at the company I do, like the company RStudio, if you think about this idea of, like, oh, something is great in one community. How can we take it somewhere else? I will say there's one thing that's top of mind for people. People will say things like, gosh, is there anything like RStudio but for Python? Or they're like, hey, I'm loving Python. Python is, like, really fun and I'm loving it, but I still prefer RStudio as an IDE. Or people will say, oh, we have there are these certain features that are out there that are in the RStudio IDE and I really want that feature, but when I do my Python work. Or people will say this to us. All I want is PyStudio.

So I am really excited to announce the project that I have been spending a lot of time over the last year, and that is a brand new data science IDE. So this data science IDE is called Positron. And if you have ever used RStudio or seen somebody use RStudio, a lot of this looks familiar. There is a pane where you write your source code. There is a truly interactive console, like, fully featured interactive console. There's UI affordances for seeing the variables that you have defined for dealing with the plots that you've created. For getting help right in your IDE so that you don't have to, like, kind of get out of the flow state when you need to quickly look up a, you know, a function signature or how certain method works.

Our design of Positron here is directly informed by the years of experience at my company building these kinds of tools for a data analyst user persona, the kind of person who deals with data on a regular basis. If you have ever used VS Code, you probably also think this looks pretty familiar. And that's because Positron is built on top of the open source components that are used to build Visual Studio Code. There are two main reasons we've done this. The first one is that it allows us to concentrate on what we're good at. Like, internally at my company, we have a lot of experience around data science tooling. And by using the general purpose components from the open source parts of VS Code, like, the, you know, support around general source code editing, saving files, interacting with version control, we can focus on the, like, with a fairly smallish team, we can focus on the pieces of the data science IDE that we think are table stakes and don't exist for a general purpose software engineering IDE.

A second big reason is it allows us to connect into the vast ecosystem of VS Code compatible extensions. Our studio as an IDE was never very extensible or customizable. And by building something that works with these extensions, we really open up people's ability to customize their IDE for the kinds of tools, the kinds of tasks that they have, the kinds of things that these need to do. And in fact, we ourselves developed some of these very extensions. So this is what developing a Quarto document looks like in Positron. And it uses the same VS Code compatible extension, the Quarto extension, that you would use in officially, like, Microsoft branded VS Code. So the extensions can be used in both places, and we're able to modularize code in ways that has really big benefits for the kind of people who do this kind of work.

So I am really excited about this. Like I said, like, a big reason why is because it is the project I have worked on where I have most observed this back and forth and this learning and then sharing. And I don't only mean, like, from RStudio to Python. I don't only mean that. I mean learning from, like, the general software engineering, like, the way debuggers are built, the way that, like, plots are, like, information about plots is shared in different kinds of tools. Like, I've observed this in a way that's really exciting for me.

I want to emphasize that Positron is a very early stage project. Today is Wednesday, and it has not even been a full two weeks that this project has been public. So it is, like, a brand new baby. And, boy, I love babies, but babies can be, you know, a little bit of a mess, a little bit of a challenge. So I certainly think it's probably not the right fit for everyone who is sitting in this room. Like, I'm not encouraging you all to, everyone, you should switch right now. If you consider yourself a little bit of an early adopter, and if something that I said about this intrigues you, I do invite you to go to our GitHub, download an installer, and give it a try. We'd be really interested in feedback that you have as you try out our new data science IDE.

Wrapping up: cost as investment

All right. So as I wrap up here, I started out talking about the cost of using multiple programming language. And, you know, it's probably no shock, it's probably no shock to you as I get here to the end that I do want to reframe that. I do want to reframe that, because when I think about my own career and when I observe what happens in the careers and the organizations around me, I think it's more accurate to think about it as an investment. And you have to decide for your situation what investment fits, what kind of investment fits, and then we can, community-wide, start to gather, start to accumulate some of these gains that we can realize.

And if I were to leave you with one takeaway, it would not be, again, just to be clear, it would not be, you better go learn another language. No. If I were to leave you with one takeaway, it would be, if you can approach the communities around you that are, like, adjacent or maybe even a little further away, if you can approach that with curiosity and openness, that allows you both to learn new concepts and new skills that will make both your own work and the work in your organization more robust and more fulfilling. So thank you very much.

If I were to leave you with one takeaway, it would be, if you can approach the communities around you that are, like, adjacent or maybe even a little further away, if you can approach that with curiosity and openness, that allows you both to learn new concepts and new skills that will make both your own work and the work in your organization more robust and more fulfilling.

Q&A

Thank you, Julia, for this excellent talk. A quick note for everyone. We're going to be asking questions through Slack, and so please use the keynotes channel to ask your questions.

Students who are interested in going into data science are always asking what language they should use to be, quote, unquote, good or competitive as a job candidate. Do you think being a baddie at Python and okay at R is good or the other way around? Alternatively, what have you seen as a powerful distribution of programming skills? This is from Anna.

Great. That's a great question. Thank you for that. I definitely think, especially for people who are junior, what's most important is becoming truly professional competent in your first programming language. When people, I think that being, like, a baddie in one thing is the most important thing. And depending on the kind of work you want to do, it's good to observe what is most dominant in that work. So when I, what I observe is that, for example, like I said at the beginning, physics and astronomy today, Python is super dominant. If people are working in fields like tech central, it's kind of very Python dominant. If you want to work or if you're interested in something like life sciences or like insurance and financial services, we see much more prominence for R in some of these other kinds of fields. So I would say high level, most important thing is to be good at one thing versus trying to be a little bit of a jack of all trades early in your career, to be too spread thin. And to observe, depending on what you're interested in, what's the thing that makes most sense for you to get good at.

We have a lot of questions about Positron. Here's an easy one. Does Positron have dark mode? It does. Yes, it does. It does. So Positron currently has built in light and dark modes and, like, a high contrast, high accessibility mode. I will say there's a bug open about other using VS Code, VS Code, like, extensions that have color theming. It works mostly. So, yeah. But it does have a built in dark mode.

From Pierre, are you planning a full web version of Positron, which could be enabled thanks to the code OSS technology? That's right. So as of today, so it's currently in what we're calling, like, a public beta. And as of today it's only a desktop app. But before the end of the year, it will have support for so two kinds of ways of working. One is over you may have used like SSH tunneling with VS Code. And so, that will work with Positron. Right now, it doesn't quite. But that's in the on the it's highly prioritized over the next couple months. And then the other thing that so Positron will always be, you know, like free to use for like and like supported use in research and teaching. And so, that's an example where we would want to be able to like install it into a Jupyter Hub so that people can and use it and it does use the same code as a code server infrastructure that VS Code does.

So, for example, we're telling people do not teach with Positron this fall. Like if you're someone who's a teacher, like this fall is too early, but we could talk about if you're interested reach out, we could talk about the spring for using it for teaching and certainly by next year, we would expect it to be a good fit for teaching Python and data science.

Could you talk about distributed computing related to these like languages in the past it was so much work? Are there some good ways to utilize distributed computing? That's a great question. I may have to punt that question to someone who is more than an expert, more of an expert than me on that particular case. I think most of the ways I have used distributed computing have been from inside of these high level scripting languages and then the interface there happens like, like let's go say from Python down to the lower level thing and then it goes out to workers and then it comes back in. And the, I mean, I still think there are some real usability challenges around these kinds of workflows and like in which cases do they work and do they not? And that like processes of setting up these kinds of environments that bring everything that you need out and then back in is something that I think is not a solved problem.

The area that I worked on this the most has been when I was working on machine learning software for R for a couple of years. And then when you do something like you want to do like a tune hyperparameters for some model and you want to take your data, you want to like go down and then you want to try a whole bunch of different hyperparameters and you, so you send it out to a bunch of workers and you bring it back and you, and you come back in and we experienced some real tensions around how does this work for different operating systems? And how does this work, you know, like how does this work even on different versions of these operating systems? So it's a, that's an area that I think is not quite solved and could use some, some real investment.

This maybe leads me to my last question and a nod to, we will have sprints this weekend at a location to be determined from cure. Are there any issues open that are easily approachable during this weekend sprints? I don't know. I haven't looked at them. Oh, do you mean for Positron? That's a really interesting idea. We've actually had some of our first contributions from people outside of our team, even in the last just like 10 days, which has been so exciting to see and things that people were able to get going right away were like there was, there was a user who really loves like a code cells feature. I don't know if you have used this in like their Envius code or it's like not a full Jupiter notebook, but like in a dot pie file, you write these little code cells and it gives you kind of like a notebook light S kind of, anyway, anyway, that's some where that people have been able to contribute, contribute like right away.

So if you are someone who has built an extension, a VS code extension, or you have ever worked in VS code itself, which is, I know not a lot of people, but like that, that's where like a lot of the work comes. So if you're someone who has ever built a VS code extension for, for like, and put it up on, you know, the marketplace or the, or OpenVSX, like there's a lot of work in Positron that happens in the kind of these built-in extensions. So that would be, I may, I think that'd be the best place to look, is look at for things that are, are tagged, that they are for the extensions, because that's the easiest way to kind of have that iterative development work where you can like, you can, you can make a change and then see it right away and like see as you're making the changes what it's doing. So I would say look for something that's for one of the built-in extensions.

Well, thank you again, Julia, for your talk and thank you for the questions.