
779: The Tidyverse of Essential R Libraries and their Python Analogues — with Dr. Hadley Wickham
Tidyverse #RProgramming #RLibraries Tidyverse, ggplot2, and the secret to a tech company’s longevity: Hadley Wickham talks to @JonKrohnLearns about Posit’s rebrand, Tidyverse and why it needs to be in every data scientist’s toolkit, and why getting your hands dirty with open-source projects can be so lucrative for your career. This episode is brought to you by Intel and HPE Ezmeral Software (https://bit.ly/hpeintel). Interested in sponsoring a SuperDataScience Podcast episode? Visit https://passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • [00:00:00] Introduction • [00:02:55] All about the Tidyverse • [00:15:19] Hadley’s favorite R libraries • [00:28:39] The goal of Posit • [00:34:12] On bringing multiple programming languages together • [00:50:19] The principles for a long-lasting tech company • [00:53:34] How Hadley developed ggplot2 • [01:03:52] How to contribute to the open-source community Additional materials: https://www.superdatascience.com/779
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Do you personally encourage teams to be multilingual? Or do you often write in multiple different languages, maybe even within the same project?
I think most people tend to be like, you know, like 90% R or 90% Python. But in general, it's just better to be pragmatic. And I think that's one area where I think generative AI is like really interesting. It's really easy to translate between them. Maybe this means the barriers between them are going to erode a little bit more.
Dr. Hadley, welcome to the Super Data Science Podcast. It is a surreal experience for me to have you here. We have, I have seen you in person. In fact, we have actually seen each other in person, I'm sure of this. And so let me tell you these stories. We didn't, I didn't tell you this before we started recording. So you're just getting this on air.
Circa 2014, in New York at an O'Reilly Strata Hadoop World Conference, you did some kind of hands on training, it might have been a half day training, they use the the air, the flight times data set. Does that all track? Yeah. So I did that I was in the audience for that. And then a couple of years later, in 2016, at the joint statistical meetings in Chicago,
there was an announcement from our studio, that Hadley Wickham would be at the RStudio booth at a certain window of time. And I walked by a couple times. And we made eye contact and you kind of smiled friendly. But I was too nervous to talk to you. I didn't know what to say.
You were at that time and still today, one of the most iconic people in data science to me. And I was just like, well, you know, there's no what do I do? What do I say? How do I introduce this? And so now I finally know what to say. I've got a question to ask you. Hadley, welcome to the super data science podcast. Where in the world are you calling in from? I'm calling in from Houston, Texas. Nice. It is truly such an honor to have you on the show. You were on the show in the past.
So four years ago, you were on the program. And that was specifically episode number 337. But at that time, our host was Kirill Aromenko. And the timestamp on this is pretty interesting, because that episode was published in February 2020. So it was a pre-pandemic world.
A very different, different time. Exactly. So all of that has passed and we're almost back to normal, except that so many data scientists are working from home. All right. So straight into the technical content. Hadley, if there's one word that is most associated with you, it's got to be tidy. In 2014, you wrote a highly cited paper called Tidy Data. And you're also an author of the popular tidyverse, a collection of packages that share a high level design philosophy.
Last but not least, you have been writing a book called Tidy Data Principles, which is to be completed next year in 2025. What does tidy mean in the context of data programming? And yeah, what's the guiding principle? That's definitely not a question that Chad GPT would have asked you.
What "tidy" means
Tidy to me is about having things that are kind of like well organized and like well broken down into kind of like little pieces that you can then like reassemble, like Legos. I think that's been a motivation for a lot of my work is like, how do you take some like big, maybe kind of vaguely ill-defined problem and then break it down into like concrete pieces that you can actually get stuck into?
And experiment with and play around with and iterate towards a final solution. All right. And yeah, so in your tidy data paper, you draw parallels between tidy data and the principles of relational databases, specifically Codd's relational algebra. What is Codd's relational algebra? Could you elaborate on how database design can benefit statisticians and data analysts in their work?
Yeah, so you can go and like look up on like Wikipedia or somewhere what Codd's relational like algorithm actually algebra actually is, but like I can never remember it, but I think there's a. I don't know, like I.
Yeah, it's one of those kind of very precise definitions where every like word makes sense individually, but when you string them together in a sentence, it's very hard to understand what it means. But I was kind of.
And so I think like that, like the ideas of Codd's relational algebra are really important. Like you want to have your you know, you don't have inconsistencies in your data, but it's really difficult for folks who are not like trained in databases and computer science to get the idea of the algebra.
And so a lot of that, like the idea of the tidy data was like, how can I frame this in a way that like makes more sense to statisticians and data scientists and people working with data. And to me, like it's like this, like, it's just like you've got a rectangle and all tidy data really is.
We put the variables and the columns and you put the observations in the rows and you kind of wonder like that legitimately took me like eight years to figure out. It seems so simple in hindsight, but it's just one of those things that like once you figure it out and explain it to other people, it makes a lot of sense, but it takes a while to get there.
We put the variables and the columns and you put the observations in the rows and you kind of wonder like that legitimately took me like eight years to figure out. It seems so simple in hindsight, but it's just one of those things that like once you figure it out and explain it to other people, it makes a lot of sense, but it takes a while to get there.
It takes a while, even for me as my first time using the tidy data principles, which must have been many years ago now is probably a decade ago or more. But that first time kind of wrapping your head around shaping the data in this tidy way, it is so different from the way that we're typically taught in university.
It's so wasteful, especially if that's like a string. It's incredibly inefficient. And so popping over to the tidy principles where that goes away is transformative. But it did it. I remember the first time trying to like melt data.
And I'm like, what? And even when I saw it the first time, it took me a while to figure out and I was kind of like, this isn't right. Like I needed to kind of, I felt like I needed to change it into the way that I'm used to, to even work with it.
Yeah, it's really, it's really interesting at the moment. I'm on the sort of run the program committee for Posit Comp. So we decide on like what talks we're going to have and how we're going to arrange them. And so the program committee is mostly like data scientists.
So that, and what that typically means is you can't actually get a shape. You can't really understand the shape of the whole program without like joining three different things together.
And it's interesting, like when I share it with my colleagues in like marketing who have to then turn this into the website, like the way they want to put that in a Google sheet is just like so totally different. There's like merge cells, there's like colors. It's just like, I get it, but just like the, you could not like compute on data in that form, but it's so much easier to look at as a human.
It's impossible. I mean, I, you know, I, I don't think I could dictate it. I could certainly do more to like try and persuade people or like, you know, help them do their jobs better, but it's like a lot of, a lot of work and you're, you know, fighting.
Like the fact that none of their tools think about data in this way. And I don't really want to have to go and create like the tidyverse for marketing and the tidyverse for finance and like every other, every other field.
We like, we do, I think we have like more, certainly more penetration of like Quarto and other kind of those, those sort of document generation tools, but there's still quite a high bar for folks who are like, if you're not used to using Git and GitHub, like collaborating in this way is pretty tough.
Even though that final product can be pretty nice. And that's, that's one of the things we, I sort of hope for the future of Quarto is like these tools for, you know, scientific documents that let you mingle text and code, but also can like work with Google dogs and multiple people can be like contributing to them.
You can comment on them. You can share them with non-technical folks.
Yeah. Quarto is something that we also have as a whole topic area later for discussion. It is a great tool that I think anybody can be using within data science or amongst data analysts, particularly.
Have there been persistent challenges that you faced, like technical hurdles or adoption resistant conceptual misunderstanding? Has there been any of that over the years?
With tidy data in particular, sorry.
Tidy data. Not too much. I think there's definitely some areas where there's just such strong conventions for the field of having things that I would think want to be in a single column spread across multiple columns.
There's some types of data where it's just like arranging things in non tidy forms. It's just much, much more memory efficient.
100%. And for people out there who haven't had the tidy experience, it is absolutely worth wrapping your brain around it. Because once you do get used to it, everything becomes so much easier.
And all of the all of the tools in your tidyverse work so seamlessly together. It's like, yeah, it's like, I don't know, when I was a kid, you know, you'd have some computer video games that were just garbage and buggy.
And you'd constantly like, you know, you'd be able to walk through walls or whatever. And then you get your for me, it was like having a Super Nintendo for the first time. And like nothing ever crashes and everything just works.
And the tidyverse is kind of like that you get the data formatted in that way. And you're just it's just smooth sailing.
Flow states and the tidyverse goal
And it all just works. And, you know, we're certainly still some distance from that perfect utopia. Over time, it really feels like the amount of time you can kind of stay stay in that flow state, stay answering asking questions of the data you actually care about, not like, how do I get this, this thing into this other function, so I can actually just do what I want to do.
So hopefully this is ringing a bell.
Yeah, I think so. And kind of framing that even like a little more broadly, like one of the things that me and my teams would think about and talk about quite a lot is like how much do you want to force you to learn some like new concept that's, you know, really going to that might really like a better mental model that's going to help you in the long run.
But until you learn what that thing is, it's going to like the code is going to be a bit of a mystery for you. And that kind of balance of like we want to we want you to like learn some new ideas like this idea of tidy data.
Like they're clearly like this is big, pretty clearly a big payoff of getting that concept into your head. And then versus other times, like, are we just teaching you some kind of like technical jargon that's like, you know, really useful for us, but maybe it's just more like junk to fill your brain up with.
So that's one of the things we're going to think about a lot, like how like what do we like? How much do we want to accommodate your existing mental model versus how much we want to give you a new and better mental model like either possibly kind of against your will.
Favorite libraries
Amongst all of the libraries that you've developed in the tidyverse. So there's things like reshape and plier for shaping the data and and being able to have pipelines of data within this tidy framework.
Like they're not equal to you, right?
Yeah, I think I mean, I have to say one of my favorites is like dbplyr, which allows you to write like R code, dplyr code, and then translate that translates that automatically to SQL.
And one of the things I worked on a couple of years ago was this package called Waldo, which is all about like concisely describing the difference between two objects.
And that's kind of like a similar problem. Like you've got this like deep technical understanding of like the language and all the objects and like writing C code iterate through them.
So that kind of tension there between like this again programming and like human psychology. I just find that like really interesting and fun to explore.
Yeah, it must be interesting to think yourself. Well, there's clearly this need to be addressed and it's a it's an impossible problem to solve perfectly.
So how can we get most of the way there in a way that will satisfy most people?
And those are two libraries that I definitely need to spend more time with. I don't think I've used to be prior or test that actually.
It's basically the same code and like R or Python or SQL or JavaScript. There's like pretty you can express pretty much the same things in every single language in a way that is surprising and interesting.
Shiny and reactive programming
That's a clicking point. Do you want to talk a bit about shiny? And actually, I think something about it is that it's now. And we're going to talk about positive soon and the change from our studio. But this is something that works kind of across programming languages now.
Yeah, exactly. So there's now shiny for our and shiny for Python. And they, you know, they're completely separate code bases. But the idea that really unifies them is this idea of reactive programming.
And the idea of reactive programming, I think, at its heart is pretty simple. You've got like a bunch of inputs to your app, things that people can change. And you've got a bunch of outputs.
And that again, that's that's that's one of these ideas like tidy data that takes your takes a little while to get your head around. It's like probably it's quite possibly an idea you've never encountered before in programming.
It works a little bit differently to things you might have encountered. But like once you get that idea, it just gives you this incredible tool set to create apps that like where things just work.
And so you've got these mysterious like bugs in your app where things don't change. And you expect them to, which is like one of the most frustrating things to try and debug when something doesn't happen.
That that is fun. Yeah. So, yeah, shiny. Really, really, really cool.
It allows you to spin up basically like that Super Nintendo game that I was just describing. It just kind of works like you think it should. People don't walk through walls accidentally as they're using your your dashboard that you developed in literally minutes.
Yeah, it's funny. Like, I remember talking to Joe Chang, who wrote shiny like very early on.
And I was like, Joey, you think I use this like website like you can use like use Ruby for that. Use PHP for that. Like why on earth would a data scientist want to make a website?
And now it's like so obvious because you don't want to give decision makers in the organization just like a PDF. You want to give them like a little interactive app. And there's just been so many examples of people just like really impressing their bosses with shiny.
Because you can like whip up something in a couple of hours that looks like a polished app. It does exactly what you want. I remember a very early phone call from a shiny user saying like we saved him a quarter of a million dollars.
And that not only is that a cost and time benefit, but also that like if you as a data scientist can do it yourself, you don't have to try and communicate to someone else exactly what you want.
It is. Well, and this also allows you to make changes yourself. You know, if if you notice an issue or a user complains to you, you can just go in and fix it as opposed to needing to be a middleman, a middle person, I guess.
And I think with how often executives think they want a dashboard and then relate it to how often they actually use it. That is another strong point for using shiny because you're not wasting weeks or months developing a dashboard.
You're you're wasting hours of days. I mean, just in general, that whole iteration, the more you can do to like increase your iteration speed, like the more effective it makes you.
Because, again, like it's so hard to predict in advance, like what's the thing that's going to be valuable? There's definitely a lot to be said to just like trying out a ton of things and seeing what sticks rather than like doing a bunch of upfront planning and just hoping desperately that you've got a really good mental model of the world and your idea works.
Why use R? R vs Python
So we are going to talk about, as I already mentioned, the positive name change. We'll end up talking about Python a bit for our listeners. If there are listeners out there who don't already use R, why should they be using it for me?
I can actually give one example, which is for me, for data visualizations. I still find I can do things way more quickly, have much more fun making visualizations in R and get exactly what I want.
There had been in the past attempts to create a ggplot2 style Python library, but the one that I had been using became deprecated and harder and harder to use.
It never had all the functionality of your ggplot2 anyway. Anyway, so that's like my big example. I don't know if you have big examples of why people might want to use R still today.
On the topic of ggplot2 specifically, I think the best Python equivalent is Plot9. That's actually by a developer, Hasan Kabiraji, that we've been sponsoring at RStudio.
I think that's the best possible realization of ggplot2 you can get in Python.
I think that comes down to at the heart of it, R is more of a special purpose programming language. It's designed from the ground up to support statistics and data science.
I think that has a lot of benefits, particularly if you've never programmed before. I think you can get up and running in R using R to do data science. You can do that without learning a ton of programming. You can get up and running pretty quickly.
That obviously lends to maybe a little bit of weaknesses. I'm like, now I've got this thing, I just wanted to do the same thing again and again and again and again.
R tends to be a little bit magical. It tries to guess a little bit more of what you want and that's great when you're working interactively and it guesses correctly. It's not so great when you're working on a server somewhere else and it guesses the wrong thing.
Posit's rebrand and the R/Python ecosystem
Speaking of differences between R and Python, I seem to remember, and you can correct me if I'm wrong about this, but I feel like you have a famous tweet from years ago where you say, somebody says something like,
and it must have been a famous poster themself that you responded to, and I can't remember, it might have been like Wes McKinney or somebody like that saying that one of the advantages of Python is that it's faster than R.
And then you have this super famous reply of, what is that, and I will make it faster. Do you know what I'm talking about?
I don't, but I know I've heard things like that in the past.
Yeah, it's kind of a misperception because Python isn't actually that fast itself. I mean, languages like Julia have come up to be faster than Python.
Yeah, I think one of the reasons, often the biggest, you have the worst arguments with your family and not with strangers. With people who are so similar to you, you tend to have more friction than with people who are really different.
I think because R and Python are actually really close together in the spectrum of programming languages. It's so easy to see all of the little things that look weird to you as opposed to looking at some programming language that's miles away, and it just looks totally different.
You can't, I just think that, I don't know, I think there's something to that. It's because we're close, you can see these little noises.
Certainly, when I see things in Python that people are like, wow, that's really cool. I'm like, challenge accepted. I will make that better in R.
Yeah, exactly. Let's dig into that a bit now. For 11 years, you've been the chief scientist at Posit, makers of open-source software for data science, scientific research, technical communication communities.
Many R users will know Posit as the makers of RStudio, a full-featured integrated development environment IDE for R, which I myself have been using for as long as I can remember.
Basically, as long as I have been typing, I have been using RStudio.
RStudio, as you actually kind of let slip earlier in this episode when you were talking about Joe Chang, I think. Oh, no, no, no. You were talking about plot nine and how you were like, RStudio is supporting. Wait, no, Posit. That's two years ago, the company name has changed to Posit.
From a distance, I mean, I don't even think it's from a distance. I think this is explicitly related to how Posit is now supporting more than just R. Is that right?
Yeah, yeah. I mean, the goal of RStudio and now Posit has always been to be this kind of like a company with a long-term vision. Like we talk internally about this sort of idea of a hundred-year company.
And when you think about a company like that, like obviously no programming language is going to be around in a hundred years' time. When we started with R, that's something that's near and dear to many of our hearts. It will always be.
And I kind of think about this as like the Burlington Coat Factory problem. I don't know if you know Burlington Coat Factory, but we have a lot of ads for them on television. But for a long time, they were like, no, it's like Burlington Coat Factory. It's not just coats.
And for us to go into customers and say like, buy RStudio. It's not just R. It's hard to make that story.
So I really, really wanted to say like, hey, for a long time now, like our products have supported not just R but Python and Julia and other tools. We don't want to lock ourselves into this. We're going to be R forever regardless of what happens with the rest of the world.
So renaming deposit, rebranding deposit was really about saying like we're in this for the long haul. And we care about data scientists regardless of what tool they're using.
Piping in R and Python
One of my favorite things that you can do really well thanks to the dplyr library that you led development of is piping. And so you can extremely easily have functions pass it. I mean, just like if people are familiar with Unix programming pipes there where you have output from one function goes the input to a next function.
And prior to me discovering dplyr, which was probably around 2010. Does that make sense? Prior to that, I would have so many variables in my workspace. It was just such a pain to keep them all straight.
And you just end up in these weird situations where like should I be investing time thinking about the name of this intermediate variable? Am I going to use this later?
Or should I just name it like intermediate variable 15 and have really ugly code?
So piping gets rid of all that where you can read the flows like a sentence. You're like, okay, this preprocessing step happens, then this next, and you can just see it so easily it makes it so elegant to read.
Do you think we'll get to a point where and I have used some kinds of piping attempts in Python, but my experience of that has never been. And I guess it's been a few years since I've tried. Yeah. But it seems like it's never been as smooth or as easy as with R. And maybe that's related to what you were talking about earlier with data visualization.
Yeah. So one, kind of like the native equivalent of piping in Python is like method chaining. You know, like if you're using pandas, you do something dot something. Yeah, pandas does it. Dot something.
But the big difference between like method chaining and the pipe is in method chaining, like all of those methods have to come from the same class. They have to live in the same library, the same package. Whereas with piping, they can come from any package.
Where with R, you know, because you can combine things from different packages, like the equivalent of pandas is kind of like dplyr and tidyr and readr and a bunch of other things.
Yeah, it makes perfect sense. And it's actually your explanation is so simple for how that's happened. And that had kind of escaped my attention as to why it works so well in R.
Marrying R and Python: Arrow, DuckDB, and beyond
When you were last on this podcast four years ago, you said that you wanted to marry the Python and R languages. Four years on, how do you assess the progress made in achieving this dream, especially through projects like Apache Arrow?
Yeah, I think we've come a long, a long way. And like Arrow has made a big difference in just being able to like seamlessly move data from one platform to another, you know, one programming language to another.
I think the other, and then kind of coupled with that, I think the other technologies that's really, really interesting is DuckDB. You can, you know, you can use DuckDB from R, you can use it from Python and you don't have to have like a database file. You have like a directory full of parquet files.
Another, you know, sort of a similar thing is like Keras and a lot of the machine learning toolkits in Python. Like the reason that they are fast is not because like Python is fast. It's because you express those high level ideas in Python and then they get compiled down to some low level machine code.
And that's why packages like the Keras package for R, which is maintained by one of my colleagues at Posit, Tomasz Komarski, like it does the same thing. Like you express these ideas in R rather than Python, but then it gets compiled down to machine code using exactly the same toolkit.
So I think that like we're just going to continue to see more and more of that. Like R is not fast. Python is not particularly fast. What is fast is like people really caring about stuff and Rust and C. And then you write a more user friendly interface on top of that. So the programming language is the data scientists use every day.
And those libraries that you mentioned there are super cool. In addition to the arrow that I mentioned, speaking of Wes McKinney. And we actually have. So we have a whole episode about that.
So back in episode number 523, we had Wes McKinney on and he talks about the Apache Arrow project at length. Really cool one. And the other projects you mentioned there, DuckDB as well as Keras for R. Yeah. Super cool. Invaluable packages that people should be trying out on the show for sure.
And that's just like it's really it's kind of nice to see that like in practice and that the kind of dream of Arrow of like you've got data over here and you want to get it over here. Let's make it as easy as possible.
Multilingual programming and generative AI
How do you think about that in your own?
I'm like 100. I'm 100% R and as far as I can tell, I probably always will be. Like that's my job. Like if there's something that I could do better in Python, like I will write an R package.
But, you know, but that's not the reality of like most people's lives and most people's jobs. I think most people tend to be like, you know, like 90% R or 90% Python.
And like I use it, I've been using it quite a bit for to generate JavaScript because I do the occasional web thing. And I really like it because I like I know enough about JavaScript that I can kind of look at what it produces and say like that looks right. But for me to manually like figure out would just take like so much longer.
And so I think that's really, I think it's really interesting to kind of think about how that's going to affect kind of programming languages. If it's really easy to translate between them, maybe this means the barriers between them are going to erode a little bit more.
Yeah, I think that's right. We haven't, it's amazing that we've gotten this far in the episode without talking about generative AI yet. That's kind of a nice refreshing and actually looking at all the topics we have lined up. I don't think any of it touched on gen AI, which is kind of crazy today.
But as you were talking earlier about how in a hundred years, how Posit is aiming to be a hundred year company. And we think about what programming languages will be a hundred years from now. I was kind of just, that was batting around in my head, bouncing around in my head as you were speaking. How I wonder if a hundred years from now, anybody will be programming.
Because I wonder if just natural language expression of things will be so powerful. I wonder if we'll be working at all. It's just going to be like a Mad Max hellscape.
I think the other thing that's really interesting to me, though, is like if people are really going to be using like generative AI for programming a lot. Like what does that mean for new programming languages, which are not going to have any training data available for them? Like that seems like that's going to kind of raise the barrier to new languages even further.
Yeah, it's also just like what happens to stack overflow. And I always sort of like this idea of like poisoning the well. Like people are stopping using stack overflow, which is kind of like fine in the short term. But where's like all the training data going to come from in the future?
It's exciting and scary. Yeah, it is exciting and scary.
I have this perhaps completely unfounded intuition that somehow it's going. And there's people way smarter than me who have spent a lot of time thinking about this who could easily crush what I'm about to just say. You might even do it right now.
But somehow I have this feeling, based on how quickly issues like hallucinations have been stamped out, the jump between GPT-3.5 and GPT-4 with how much less it's hallucinating, I have this completely unscientific, uneducated intuition that somehow we're going to be fine on this front, that like we're not going to end up having a complete like there's a specific term for this.
Yeah, I think the thing that makes me like less optimistic is like my Tesla and, you know, like this promise of like self-driving cars, which just doesn't seem to be getting any closer. It's like every, like, I don't know, 50% of the time that I pull into our garage, like it thinks the random collection of tools on the wall is like a semi.
That's about it. So I'm just like, meh. And that's like clearly something that's been like very much hyped and a bunch of money has been. It's just gonna be interesting to see, like we're clearly in this like explosive growth. And is it just going to like flatten off or is it going to keep going? Or is it going to get steeper? Who knows?
S7 and object-oriented programming in R
So maybe we won't be going down the Excel route. But another big innovation for our that has actually happened recently is R7.
Which I heard about for the first time doing the research for this episode or reading search our researchers research. Yeah. Do you want to tell us about R7 and the problems that it's aimed to address?
So we actually renamed it to S7 relatively recently. Oh, really? It's called S7.
So it's called S7 because there are two. This is like a lot of historical minutiae. But R was like the language that came before R was called S. And S was the kind of introduced object-oriented programming. And S version 3 and S version 4.
So in R there are two types of object-oriented programming. S3 and S4. Two chief types.
So S3 is like really just a very lightweight set of conventions. Like it's not like object-oriented programming in any other language. Basically, it's very, very lightweight. S4 kind of like swings too far in the other direction. It's like very formal. There's a lot of kind of boilerplate. It's quite complicated. Things can go wrong in weird ways.
So the idea of S7 was really to try and find like the sweet spot in between them. Like to take the nice features that S4 had, add them on top of S3 in a backward-compatible way so that we can hopefully switch.
Hopefully, we're not just adding another object-oriented programming style to R, but we're actually supplanting S3 and S4 over time. Because everything you can do in those two, you can do in S7 and you can do it more easily. And there's better documentation and tooling and that kind of stuff.
For our listeners who don't have a computer science background, what does it mean for a language to be object-oriented? And that it can have these kinds of grades from lightweight, like you were describing with S3.
Yeah, I don't know. I don't know. You have objects and you program with them. I don't know.
And it's especially weird in R because when you're using R, you're not really aware that you're using object-oriented programming. Unlike in Python, where I think you're much more aware that you have objects and you call methods on those objects.
Object-oriented programming is much, much less important in R as a data scientist. I think you benefit from it because packages use it.
So I think the main benefit to you as a data scientist is not that you're going to be writing S7 code. But the packages that you use are going to and they're going to be able to write code faster and more correctly from the get-go.
So hopefully more of a general uplift of developer productivity in R. Probably not going to affect data scientists day-to-day that much.
Posit's mission and the hundred-year company
You've already mentioned on the show how Posit has an ambition to build a company, a suite of tools that could last 100 years. What kinds of principles or philosophies do you think are critical to creating a legacy that lasts that long in technology?
Yeah, that's a good question. I don't think we know for sure.
And I think that is pretty special because we can say legitimately we don't want to make products that lock you in. We want to make products that like, you know, we want to help you and we're going to sell you products that are going to help you do your job.
And you're going to hopefully pay us money for those because they save you time. They allow you to do things that you couldn't otherwise do. But we're not just about that money. Like we really care about your kind of life as a data scientist. We want to build tools for you.
We want to build tools, open source tools for people that don't have a bunch of money. We want to improve academia. So I think that that's part of the mission is like we're not just we're not optimizing for short term profit. We can say like we're going to take a longer view.
And, you know, part of that is also like we're not we're not a VC driven company. We don't have to explode in either a good way or a bad way. In three years time where our investors like want to get money so that we can

