Resources

JJ Allaire & Hadley Wickham | RStudio: 2022 and Beyond

RStudio's CEO & founder JJ Allaire and Chief Scientist Hadley Wickham walk through RStudio's journey so far, the most difficult problem RStudio has solved, and why that problem matters. JJ discusses the company's virtuous cycle (and core operative principle): Commercial software enables investment into open source. The line between those two things will always remain where it is, and your intellectual property will always be yours. Can RStudio impact the practice of science more broadly and not just the practice of data science with R? JJ says "we can and we should." Hadley continues, saying one language is already not enough, which has caused RStudio to contemplate a new name... Session: Keynote

Oct 24, 2022
24 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Thank you, Hadley. The title of this talk is RStudio 2022, but the actual subject of the talk is to consider the question of what would it take for us to give this same talk in 21-22. That's what we're going to try to consider here today. And to do that, we want to start by reflecting back a little bit on where we've been. I want to point out now that if you go to a tech conference and the founders are on stage and they say we have some exciting news and the first slide says our journey so far is a 95% chance the company's getting acquired. So that is not at all what's happening today. And that will never happen. That might have occurred to some of you.

Anyway, we feel incredibly blessed to be part of this community. And you know, the company started 13 years ago, but many of us have actually been working on and with R far longer than that. And we built a set of tools and software that helps people ask and answer the hardest questions with code in a reproducible fashion and all the while creating an incredible community, which I think if you're here today, you feel if you've been at past comps, meetups, all the interactions we have online, I think the community we've built together is really remarkable. And I also wanted to reflect back on kind of what the most difficult problem we've solved so far as a company and one very obvious answer to this question, and it is a difficult problem, is that we've made R usable and productive and approachable for many, many, many, many people, many more people than probably any of us thought would be possible. And that is a difficult problem. It's a very important problem.

Sustainable funding for open-source scientific software

It kind of enables all the things I talked about on the first slide. But when me and Hadley and Joe and others all got together about ten years ago and said, hey, let's work on this together for a long time, I actually was pretty confident that we would be able to solve this problem. But there's another problem that I think we have solved, which I wasn't 100% confident we'd be able to solve. I hoped, but I wasn't 100% confident, which is to create a sustainable model for funding open-source scientific software. Scientific software is traditionally created by large proprietary software vendors, kind of necessarily so in a way because it's actually, there's a lot of software to be written. A lot of people depend on it. But we really felt strongly that we'd like to find a way to make open-source software for science sustainable.

And reflecting a little bit on why this problem matters, I think actually everyone here has a pretty good intuition about why this problem matters. And that was a subject of the talk I gave a couple years ago. The scientific process and the scientific method depends on replication and reproducibility. And fundamentally, if I can't run your software, either over space, you know, a colleague, another company, or another country, or another university, or over time, if I can't run your software, I can't reproduce your work, and I can't build on your work. So having the fundamental software be free and open-source kind of ensures, or at least creates the conditions for widespread reproducibility and replicability.

And also resiliency. Software does die. Companies die. Versions of products die. Things get out of fashion. Open-source software has that fundamental right to fork, which means that software can persist even beyond the time period over which its main sponsors are working on it. And then finally, participation. You've seen with CRAN tens of thousands of packages addressing all different domains, methodological approaches, and having the user community be part of building the tools and deciding what methods are supported is fundamentally better than the idea of a single vendor or a couple vendors being a choke point, sort of deciding what analytic methods people are going to be able to use with software. So I think this problem matters a great deal to science, data science, science, the future of science.

So having the fundamental software be free and open-source kind of ensures, or at least creates the conditions for widespread reproducibility and replicability.

And so when we think about the solving this problem of sustainable funding for open-source scientific software, there's a few dimensions of the problem. Some of it has to do with just the mission of the entities that are working on it. Some of it actually has to do with the way incentives are aligned around the work. And then even if you have those things right, getting to a certain scale of effort is actually another threshold condition. Companies like Wolfram and MathWorks and SAS, you know, employ hundreds of people to work on scientific software. And I think something along those lines is probably necessary to do it really well with open source.

So first, kind of the corporate charter. And again, we talked about this a couple years ago at Conf. And it's important to note that when we became a public benefit corporation, we actually encoded our mission into the corporate charter. And it actually is the case that our directors and officers have a fiduciary duty to pursue this mission. And that mission, when we wrote it, I think about three years ago, is actually written, you'll notice here pretty broadly, we talk about free and open source software for data science, scientific research, and technical communication. And partly, even at the time, we are exclusively focused on data science, we sense that the kind of model we were building might have broader application, and we wanted the mission statement to reflect that.

But then there's incentive structures. I would say that many or most software companies and technology companies have kind of the seeds of their own demise planted in them in them from the earliest days. Startup companies often exist to be really exist to be acquired. And once they're acquired, all kinds of things happen to their software and their mission, their people. Even public companies who are not acquired, who are nominally independent, are really, there's a pretty narrow lane they have to exist in, which is to always grow, always grow their revenue, always grow their earnings kind of at the expense of everything else. And that certainly also causes the missions to drift.

And so the thing that I'm really pleased about with our studio is that we actually are an independent company, we're committed to always being an independent company. And the control of the company lies inside the company, people who work at our studio, fundamentally control the company. It's not controlled from the outside. So we actually can say we're not going to be we're going to continue to be independent, we can follow through on that. And for us, our imperative, we certainly would like to grow. But it's not growth at all costs. And it's never at the expense of our core mission. And so our hope, maybe it's a little optimistic. But we'd like to be here fulfilling that mission in 100 years, we're certainly going to think as if we are or try to, and make our work and company be organically sustainable.

Scale of open-source investment

And then finally, there's a there's a question of scale of effort. And I'm really happy about the fact that about 40% of our engineers work full time on open source software. This is a list of some of our open source projects and the number of full time engineers that work on them as of as of now. That's 43 engineers, but there's more than that we actually have dozens of people at the company who are not full time open source software engineers who also write packages and contribute to open source projects. So there's many more than the 43. But there's 43 who are paid to do nothing but work on open source software. 43 is great. I hope in five years, that's going to be 143. And I hope in 10 years, that's going to be 200 or 300. And I think that represents an opportunity to create a scale of effort that is really remarkable and actually presents an opportunity.

Talking a little bit about how we kind of got to this scale, what the basic mechanic is, and I think I talked about this a couple years ago to briefly recap, recount it. There's a virtuous cycle of synergistic relationship between our open source and our commercial software. We write open source tools that are accessible to everyone, they're accessible to anyone, independent of their economic means. And they're, they're, they're really useful, as they are without purchasing anything. But what happens is that inevitably, when lots and lots of people adopt open source software, some of that software is adopted in complex environments, larger organizations that have security or scalability or integration requirements, people want to use the software, deploy the software and use the software on the web. And that creates an opportunity for us to create commercial products, which then help grow the company and allow us to invest back in our open source tools.

Now, I want to tie this back a little bit to this, the stuff about mission and incentives. You'll notice there's a line here, there's a line between open source and enterprise products. And a company, and we've seen this, we've seen historically, companies whose in principle imperative is growth at any cost, actually have an incentive to move that line over time. And what I think is great about what we've done, because we have in the independence is that we actually have an operative principle for what's open source and what is not. And that is really that the core intellectual property that you create the reports, applications, analyses, code, that is all based on open source packages, protocols, file formats, and even the core productivity tools like RStudio, are open source that anyone can use them. And then when we get into these scenarios of more complex environments, that's when our deployment on the web, that's when commercial software comes into play. But we will always respect that line, and never compromise that line for the sake of, you know, of goosing our growth for any given year or period of time.

And I apologize, actually, I don't think the type here is going to be super legible people past the 10th row. But this is just kind of a picture of the company over time, kind of how this has played out. In 2009, it was one person. Four years later, in 2013, there were six people just working on open source software. And then about five years into the company's life, we decided that we'd like to try to grow this group and do more. And as you can see, we started introducing some commercial products that actually worked, and we became profitable very quickly. And we've managed to grow really, really substantially since then and create a lot of this is just this is only some of the open source software that we've created. And as you can see, also, we've managed to fill in additional commercial products for all the different scenarios that people have. And that these things, again, are sort of feeding each other, and allow us to get to this scale of, you know, roughly now about 250 people in the company and, and dozens and dozens of people working full time on on open source software.

Looking ahead: science beyond data science

So where does this leave us? We talked about that's kind of our studio 2022. And we talked about where do we go in the future. And I think what's remarkable to me is that we have actually built this company that can have open source software for science at its core. And it's able to fulfill that mission as an independent company. And we're able to employ lots and lots of people working on open source software and the potential to employ many, many more. But when we wrote that mission, we included scientific research and technical communication. And as you've probably already seen, with quarto, which you'll hear more about tomorrow, we're starting to work in that realm of scientific communication, it certainly overlaps with data science, but it goes significantly beyond it.

So we've been asking ourselves the question, and really, when we started, we worked on that mission. And we worked on quarto, you know, do we have the potential? Does the company have the potential to impact the practice of science more broadly than just the practice of data science with our and I think the fact that we wrote the mission that way, the fact that you a lot of the things you've seen us working on over the last couple years, to me, the answer to that is pretty decisively that we can and we should do that.

to me, the answer to that is pretty decisively that we can and we should do that.

So to sum up, we think we've got a really good foundation towards a sustainable model for funding open source scientific software. And we want to do that by creating this this long term, thriving, trustworthy company that's that's hopefully going to be around 100 years from now. And 100 years is a long time. Like programming has only been around for 70 years, data science, charitably has been around for 30 years. The languages today like R and Julia and Python, are they going to be popular even in 30 years or 50 years, let alone 100 years. So it's pretty clear that as much as we individually love specific programming languages, to thrive in the long term, the mission of the company has to be broader than that.

And I think you can kind of see that today already, like one language is not enough for most data scientists, because most data scientists have to use SQL and something else. And many teams use a combination of R and Python. And as soon as those teams want to interrupt with the rest of the company, even more languages come onto the table. And we believe this is like a, this is a fundamental property of software, like there's never going to be one language to rule them all. And so the best tools have to take this, this into account.

And I think we're already seeing this in an open source tools like Torch and TensorFlow, which are tools for doing deep learning, machine learning, where you express your model in a human friendly language, a language designed to interact with humans like R or Python. And behind the scenes, your model is converted into something very high performance. Or Stan, very similar to Torch and TensorFlow, but more statistical, statistical computing environment really helps fit many Bayesian models. You define your model in a high level language like R or Python or Stata or MATLAB. And then behind the scenes, that's converted to something very performant. Or Arrow by our friends at Voltron Data. Arrow is a data format designed so that different languages can collaborate on like literally the same data in memory. So there's none of this kind of expensive copying or transforming that the different languages can collaborate on exactly the same data set.

And we're very much inspired by R. R is not a language like driven by purity of its philosophy. R is a language designed to get shit done. And it's been this, it's been like this from day one. Yes, the precursor to R was designed by statisticians at Bell Labs who had all these little Fortran programs that each tackled a little bit of the data science of the statistical analysis problem, but they needed some toolkit to join them together. So R from day one has always been about building bridges, combining tools to get things done.

R is not a language like driven by purity of its philosophy. R is a language designed to get shit done.

And even our pro products are kind of already multilingual like RStudio Workbench. Of course, it supports RStudio IDE, but it also supports Jupyter Notebooks, JupyterLab and Visual Studio Code. RStudio Connect, of course, it can deploy shiny apps and R Markdown documents, but it can also deploy Jupyter Notebooks and Flask APIs and Dash and Streamlit apps. RStudio Package Manager is not just about R package, it's also Python packages. But there's something about those names. Like RStudio was a great name for the company when our one product was literally a studio for writing R code. But over the last few years, that name has started to feel increasingly constraining.

Announcing Posit

So today, very excited to announce that RStudio is becoming Posit.

So Posit is a real word, admittedly, it's a bit of an obscure word. But it means to put forth an idea to pretend that something is true, so that you can argue about it. And this is like literally what data scientists do, they posit hypotheses, so that they can test or evaluate them with data. So why do we like this name? Well, it's a real word. And it's authentic, like it's literally what you all do. It's specific. It's not some like gobbledygook series of letters that could mean anything. But it also feels kind of generic enough to last us for a long time to come. And if you know anything about my personal package naming philosophy, you'll also know it makes me very happy that it's exactly five letters long.

So as well as the new name, Posit, we have a new logo, which evokes the kind of the greater than or less than symbols of statistics and mathematics, the angle brackets of HTML, that greater than sign that you see the console all the time. But also this idea that weaving together different tools, different languages, different environments, different skills is stronger than they are alone.

So how did we get this name? It was a lot of work. We partnered with a branding agency who did a lot of research, talking to our studio employees, our customers, community members, trying to figure out like who we are as a company, what's our DNA. Then that went on to a bunch of brainstorming. I think they brainstormed over 500 names that got weeded down to about 80, which we looked at, we shortlisted about five, kind of sat on it for about a week, but Posit very, very quickly kind of rose to the top.

What the rename means for products and open source

So what does this mean? Well, for our pro products, not much. As I said, they're already support multiple languages. They're not really going to change. But hopefully now it's a little bit easier to tell that multilingual story. If you're an R user embedded in a Python heavy team or org, hopefully it's a little bit easier to get your colleagues to take a look at Posit than it would be to get them to take a look at RStudio. And over time, in the coming months, the names of our products will change to Posit Connect, Posit Workbench, Posit Package Manager, and so on.

What about for open source? Well, the IDE is already multilingual, not just R and Python, but it supports C and C++ and JavaScript and HTML and a bunch of other stuff. Keep doing that. But we're also going to increase our investments in Visual Studio Code. For R Markdown, it is evolving to Quarto. You're going to hear a lot about Quarto at this conference. I highly recommend you come back tomorrow for Julie and Mine's keynote, which is going to be all about Quarto. But basically, very, very briefly, Quarto is like R Markdown, but designed from the foundations to be multilingual. Not just R or Python, but any language that might come along in the future. The tidy models, MLverse and MLops, these are already fundamentally about connecting different systems and languages and paradigms that will keep going, just be more of the same.

The Shiny? Well, that is a very good question. And if you want to find out the answer, you'll have to wait for Joe's keynote this afternoon.

What about the tidyverse? Well, I still believe that R is the best language for interactive data science. We're not going to stop investing in it. I will learn a little bit about Python, but I'm not going to stop writing R code. But of course, we're also going to broaden our focus a little bit, start thinking, start doing little experiments about how we can help folks in other language communities. And you can expect us to hire more people who are multilingual.

Now, most importantly, the question you've all been asking yourself, well, what does this mean for stickers? Well, most excitingly, if you look underneath your seat, there is both, it's really wedged under there, so you might have to do this after the talk. But you'll find not only a new Posit sticker, but also kind of a commemorative RStudio sticker to support, to recognize the name that's stood us instead for so long.

One lucky audience member will also find a golden ticket, which is a free registration to the conference next year.

Long-term vision

So Hadley talked a little bit about the current projects we're working on and where those stand and what you can expect to see over the next couple, few years. But I wanted to address a very, very big picture, where are we going long term? And I think you can just look at our mission and see that we've written it with a much broader scope than data science. And as we grow, we would like to apply this solution to funding open source scientific software as broadly as we can.

That said, we are very decisively oriented toward data science today, and that will not change any time soon. You can see in Quarto, which overlaps with data science, but also kind of gets into a broader focus on scientific communication. We'll be investing in that quite a bit, but I would say for the next five or ten years, I think this is the areas of investment you're going to see us work within. Now, over time, over a longer period of time, we'll certainly be looking for opportunities to do more and contribute more. We don't know what that looks like right now, but we'll definitely be on the lookout for it.

And so, kind of going back to our journey and revising the language a little bit to be phrased a little more generally to talk about science. For me, and for my colleagues at our studio and the company, I hope for all of you and the broader R community, I am very excited about the opportunity to take all the things that we love about R and RStudio and bring them to everyone. That's what I'm really excited about, so I hope we can all do that together. We don't have a website yet for the new company, but there's a little bit of a placeholder website you'll see there.

Thanks, JJ.