JJ Allaire & Hadley Wickham | RStudio: 2022 and Beyond

Transcript#

This transcript was generated automatically and may contain errors.

Thank you, Hadley. The title of this talk is RStudio 2022, but the actual subject of the talk is to consider the question of what would it take for us to give this same talk in 21-22. That's what we're going to try to consider here today. And to do that, we want to start by reflecting back a little bit on where we've been. I want to point out now that if you go to a tech conference and the founders are on stage and they say we have some exciting news and the first slide says our journey so far is a 95% chance the company's getting acquired. So that is not at all what's happening today. And that will never happen. That might have occurred to some of you.

Anyway, we feel incredibly blessed to be part of this community. And you know, the company started 13 years ago, but many of us have actually been working on and with R far longer than that. And we built a set of tools and software that helps people ask and answer the hardest questions with code in a reproducible fashion and all the while creating an incredible community, which I think if you're here today, you feel if you've been at past comps, meetups, all the interactions we have online, I think the community we've built together is really remarkable. And I also wanted to reflect back on kind of what the most difficult problem we've solved so far as a company and one very obvious answer to this question, and it is a difficult problem, is that we've made R usable and productive and approachable for many, many, many, many people, many more people than probably any of us thought would be possible. And that is a difficult problem. It's a very important problem.

Sustainable funding for open-source scientific software

It kind of enables all the things I talked about on the first slide. But when me and Hadley and Joe and others all got together about ten years ago and said, hey, let's work on this together for a long time, I actually was pretty confident that we would be able to solve this problem. But there's another problem that I think we have solved, which I wasn't 100% confident we'd be able to solve. I hoped, but I wasn't 100% confident, which is to create a sustainable model for funding open-source scientific software. Scientific software is traditionally created by large proprietary software vendors, kind of necessarily so in a way because it's actually, there's a lot of software to be written. A lot of people depend on it. But we really felt strongly that we'd like to find a way to make open-source software for science sustainable.

And reflecting a little bit on why this problem matters, I think actually everyone here has a pretty good intuition about why this problem matters. And that was a subject of the talk I gave a couple years ago. The scientific process and the scientific method depends on replication and reproducibility. And fundamentally, if I can't run your software, either over space, you know, a colleague, another company, or another country, or another university, or over time, if I can't run your software, I can't reproduce your work, and I can't build on your work. So having the fundamental software be free and open-source kind of ensures, or at least creates the conditions for widespread reproducibility and replicability.

And also resiliency. Software does die. Companies die. Versions of products die. Things get out of fashion. Open-source software has that fundamental right to fork, which means that software can persist even beyond the time period over which its main sponsors are working on it. And then finally, participation. You've seen with CRAN tens of thousands of packages addressing all different domains, methodological approaches, and having the user community be part of building the tools and deciding what methods are supported is fundamentally better than the idea of a single vendor or a couple vendors being a choke point, sort of deciding what analytic methods people are going to be able to use with software. So I think this problem matters a great deal to science, data science, science, the future of science.

So having the fundamental software be free and open-source kind of ensures, or at least creates the conditions for widespread reproducibility and replicability.

And so when we think about the solving this problem of sustainable funding for open-source scientific software, there's a few dimensions of the problem. Some of it has to do with just the mission of the entities that are working on it. Some of it actually has to do with the way incentives are aligned around the work. And then even if you have those things right, getting to a certain scale of effort is actually another threshold condition. Companies like Wolfram and MathWorks and SAS, you know, employ hundreds of people to work on scientific software. And I think something along those lines is probably necessary to do it really well with open source.

So first, kind of the corporate charter. And again, we talked about this a couple years ago at Conf. And it's important to note that when we became a public benefit corporation, we actually encoded our mission into the corporate charter. And it actually is the case that our directors and officers have a fiduciary duty to pursue this mission. And that mission, when we wrote it, I think about three years ago, is actually written, you'll notice here pretty broadly, we talk about free and open source software for data science, scientific research, and technical communication. And partly, even at the time, we are exclusively focused on data science, we sense that the kind of model we were building might have broader application, and we wanted the mission statement to reflect that.

But then there's incentive structures. I would say that many or most software companies and technology companies have kind of the seeds of their own demise planted in them in them from the earliest days. Startup companies often exist to be really exist to be acquired. And once they're acquired, all kinds of things happen to their software and their mission, their people. Even public companies who are not acquired, who are nominally independent, are really, there's a pretty narrow lane they have to exist in, which is to always grow, always grow their revenue, always grow their earnings kind of at the expense of everything else. And that certainly also causes the missions to drift.

And so the thing that I'm really pleased about with our studio is that we actually are an independent company, we're committed to always being an independent company. And the control of the company lies inside the company, people who work at our studio, fundamentally control the company. It's not controlled from the outside. So we actually can say we're not going to be we're going to continue to be independent, we can follow through on that. And for us, our imperative, we certainly would like to grow. But it's not growth at all costs. And it's never at the expense of our core mission. And so our hope, maybe it's a little optimistic. But we'd like to be here fulfilling that mission in 100 years, we're certainly going to think as if we are or try to, and make our work and company be organically sustainable.

Scale of open-source investment

And then finally, there's a there's a question of scale of effort. And I'm really happy about the fact that about 40% of our engineers work full time on open source software. This is a list of some of our open source projects and the number of full time engineers that work on them as of as of now. That's 43 engineers, but there's more than that we actually have dozens of people at the company who are not full time open source software engineers who also write packages and contribute to open source projects. So there's many more than the 43. But there's 43 who are paid to do nothing but work on open source software. 43 is great. I hope in five years, that's going to be 143. And I hope in 10 years, that's going to be 200 or 300. And I think that represents an opportunity to create a scale of effort that is really remarkable and actually presents an opportunity.

Talking a little bit about how we kind of got to this scale, what the basic mechanic is, and I think I talked about this a couple years ago to briefly recap, recount it. There's a virtuous cycle of synergistic relationship between our open source and our commercial software. We write open source tools that are accessible to everyone, they're accessible to anyone, independent of their economic means. And they're, they're, they're really useful, as they are without purchasing anything. But what happens is that inevitably, when lots and lots of people adopt open source software, some of that software is adopted in complex environments, larger organizations that have security or scalability or integration requirements, people want to use the software, deploy the software and use the software on the web. And that creates an opportunity for us to create commercial products, which then help grow the company and allow us to invest back in our open source tools.

Now, I want to tie this back a little bit to this, the stuff about mission and incentives. You'll notice there's a line here, there's a line between open source and enterprise products. And a company, and we've seen this, we've seen historically, companies whose in principle imperative is growth at any cost, actually have an incentive to move that line over time. And what I think is great about what we've done, because we have in the independence is that we actually have an operative principle for what's open source and what is not. And that is really that the core intellectual property that you create the reports, applications, analyses, code, that is all based on open source packages, protocols, file formats, and even the core productivity tools like RStudio, are open source that anyone can use them. And then when we get into these scenarios of more complex environments, that's when our deployment on the web, that's when commercial software comes into play. But we will always respect that line, and never compromise that line for the sake of, you know, of goosing our growth for any given year or period of time.

And I apologize, actually, I don't think the type here is going to be super legible people past the 10th row. But this is just kind of a picture of the company over time, kind of how this has played out. In 2009, it was one person. Four years later, in 2013, there were six people just working on open source software. And then about five years into the company's life, we decided that we'd like to try to grow this group and do more. And as you can see, we started introducing some commercial products that actually worked, and we became profitable very quickly. And we've managed to grow really, really substantially since then and create a lot of this is just this is only some of the open source software that we've created. And as you can see, also, we've managed to fill in additional commercial products for all the different scenarios that people have. And that these things, again, are sort of feeding each other, and allow us to get to this scale of, you know, roughly now about 250 people in the company and, and dozens and dozens of people working full time on on open source software.

Looking ahead: science beyond data science

So where does this leave us? We talked about that's kind of our studio 2022. And we talked about where do we go in the future. And I think what's remarkable to me is that we have actually built this company that can have open source software for science at its core. And it's able to fulfill that mission as an independent company. And we're able to employ lots and lots of people working on open source software and the potential to employ many, many more. But when we wrote that mission, we included scientific research and technical communication. And as you've probably already seen, with quarto , which you'll hear more about tomorrow, we're starting to work in that realm of scientific communication, it certainly overlaps with data science, but it goes significantly beyond it.

So we've been asking ourselves the question, and really, when we started, we worked on that mission. And we worked on quarto, you know, do we have the potential? Does the company have the potential to impact the practice of science more broadly than just the practice of data science with our and I think the fact that we wrote the mission that way, the fact that you a lot of the things you've seen us working on over the last couple years, to me, the answer to that is pretty decisively that we can and we should do that.

to me, the answer to that is pretty decisively that we can and we should do that.

So to sum up, we think we've got a really good foundation towards a sustainable model for funding open source scientific software. And we want to do that by creating this this long term, thriving, trustworthy company that's that's hopefully going to be around 100 years from now. And 100 years is a long time. Like programming has only been around for 70 years, data science, charitably has been around for 30 years. The languages today like R and Julia and Python, are they going to be popular even in 30 years or 50 years, let alone 100 years. So it's pretty clear that as much as we individually love specific programming languages, to thrive in the long term, the mission of the company has to be broader than that.

And I think you can kind of see that today already, like one language is not enough for most data scientists, because most data scientists have to use SQL and something else. And many teams use a combination of R and Python. And as soon as those teams want to interrupt with the rest of the company, even more languages come onto the table. And we believe this is like a, this is a fundamental property of software, like there's never going to be one language to rule them all. And so the best tools have to take this, this into account.

And I think we're already seeing this in an open source tools like Torch and TensorFlow , which are tools for doing deep learning, machine learning, where you express your model in a human friendly language, a language designed to interact with humans like R or Python. And behind the scenes, your model is converted into something very high performance. Or Stan, very similar to Torch and TensorFlow, but more statistical, statistical computing environment really helps fit many Bayesian models. You define your model in a high level language like R or Python or Stata or MATLAB. And then behind the scenes, that's converted to something very performant. Or Arrow by our friends at Voltron Data. Arrow is a data format designed so that different languages can collaborate on like literally the same data in memory. So there's none of this kind of expensive copying or transforming that the different languages can collaborate on exactly the same data set.

And we're very much inspired by R. R is not a language like driven by purity of its philosophy. R is a language designed to get shit done. And it's been this, it's been like this from day one. Yes, the precursor to R was designed by statisticians at Bell Labs who had all these little Fortran programs that each tackled a little bit of the data science of the statistical analysis problem, but they needed some toolkit to join them together. So R from day one has always been about building bridges, combining tools to get things done.

R is not a language like driven by purity of its philosophy. R is a language designed to get shit done.

And even our pro products are kind of already multilingual like RStudio Workbench. Of course, it supports RStudio IDE, but it also supports Jupyter Notebooks, JupyterLab and Visual Studio Code. RStudio Connect, of course, it can deploy shiny apps and R Markdown documents, but it can also deploy Jupyter Notebooks and Flask APIs and Dash and Streamlit apps. RStudio Package Manager is not just about R package, it's also Python packages. But there's something about those names. Like RStudio was a great name for the company when our one product was literally a studio for writing R code. But over the last few years, that name has started to feel increasingly constraining.

JJ Allaire & Hadley Wickham | RStudio: 2022 and Beyond

Transcript#

Sustainable funding for open-source scientific software

Scale of open-source investment

Looking ahead: science beyond data science

Announcing Posit

What the rename means for products and open source

Long-term vision

Featured software#

rstudio