
J.J. Allaire | Open Source Software for Data Science | RStudio (2020)
Open Source Software for Data Science J.J. Allaire | February 1, 2020 Open-source software is fundamentally necessary to ensure that the tools of data science are broadly accessible, and to provide a reliable and trustworthy foundation for reproducible research. This talk will delve into why open source software is so important and discuss the role of corporations as stewards of open source software. I'll also talk about how RStudio is structured and organized to pursue its mission of creating open source software for data science. About the speaker J.J. Allaire - JJ Allaire is a software engineer and entrepreneur who has created a wide variety of products including ColdFusion Open Live Writer Lose It! and RStudio
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank you very much, Hadley. Today I'm going to talk about free and open source software for data science. The lens I'm going to use to talk about it is kind of the how and why of RStudio. A little bit about how the company got started and where our mission of creating free and open source software sort of derives from. But then I want to get into the ways that tools, both open source and proprietary, for scientific and technical computing are developed. How are they financially supported? How are they sustained? How can you come to trust them? And get into a little bit about the nature of corporations as stewards of this sort of software. And I'll ask the question, are corporations kind of inherently sketchy as stewards of this sort of software? Spoiler, they are. So then I want to talk a little bit about what we can do about that.
Origins of the company
So I'll start by a little background about how the company got started so you can get a flavor for kind of where I was coming from starting the company. The seeds of the company go all the way back to 1983. How many people here have heard of Bill James? People have heard of Bill James? Okay. So Bill James was a math teacher in Kansas City. And back in 1983, I was an avid follower of baseball. And I absorbed all of the, you know, analysis about baseball from writers and sportscasters and players and coaches and experts. And I came to believe a lot of the things I was hearing about kind of how baseball teams win games, how you assess the value of players and strategies.
So what Bill James did, interestingly, you can see back in 1977, his first baseball abstract was this little pamphlet. He went from that pamphlet over ten years to a New York Times bestselling book where he used data, empirical data analysis, to systematically debunk many of the assumptions that people had about kind of how baseball works. And for me, as a 14-year-old, that was really striking that all these people who were, you know, who had spent their lives in the game of baseball, who had developed kind of conclusions and intuitions about how it worked, that all of that could be debunked by systematically using data. It was shocking to me. And that stuck with me.
When I went to college, I didn't go to college for baseball studies. I went to college for political science. And when you talk about political science, you get into public policy. And public policy decisions are made that affect the well-being of many, many hundreds of millions of people. And so the same thing occurred to me that we've got lots of experts and we've got lots of people who've been absorbed in different fields and observing phenomena for years making these decisions and possibly, probably as it turned out, not actually using data to inform those decisions. And as I've become part of the R community, I've realized that the same phenomena repeats in fields like medicine and business, that people are making highly consequential decisions and they're not availing themselves of the tools they have to really understand how the world works.
So this then kind of, to me, through college, struck me as kind of the fundamental problem to solve. And it seemed to me that software was an important part of the answer. This is some of the software that I used in college and graduate school. And say what you might know about some of the software, I had the experience of software providing leverage. So my mind and what I could absorb was amplified by the fact that I could use the software to better understand things.
So from there, I went to graduate school at the University of Wisconsin-Madison studying political science. And this is going to be an aside that this group, I think, will find interesting. It's not really in the main thread of the talk. But I worked on a study there that was studying the effectiveness of vouchers and public school performance in Milwaukee. Milwaukee was one of the first, if not the first, cities to implement a school voucher program. And there was a study going on at Madison that was going to assess how effective school vouchers were. And I worked on that study.
So at the same time, there was a group at Harvard that was very convinced that the conclusions that would be drawn by the folks at Madison were going to be wrong. So they were very keen to reproduce and criticize the results of the study.
So they asked for the data. They said, can we please have the data so we can do our own analysis? So we had the data in Paradox databases that were machine-readable, super easy to send to them. But what we did instead, true story, was we printed out all the data and shipped them crates of paper so that they could reenter the data on their own.
So that's kind of where we were in reproducibility and academia back then. So that's a true story.
So when I got to Madison, I was really excited about data analysis and computation. And I was kind of like, could I kind of specialize in software for data analysis? And unfortunately, in political science at that time, software was definitely not something you could specialize in. And at Madison, even data analysis was kind of barely something you could specialize in.
So I was kind of in the wrong place. So while I was trying to, when I was supposed to be doing my graduate school work, I was teaching myself how to program in C++ and teaching myself how to program in the Mac. And that inevitably led shortly to me dropping out of the program and saying, I want to be a software engineer.
And what I alluded to just a little bit before, what fascinated me so much about being a software engineer is summed up well by this quote from Steve Jobs, where he says, what a computer is to me, it's the most remarkable tool that we've ever come up with. It's the equivalent of a bicycle for our minds. And the other thing he cites in that when he recounted that quote was there was a scientific American study that looked at the sort of efficiency of different modalities of motion, sort of the cost of transport, calories per gram per kilometer as it relates to body weight.
So they looked at, you know, organisms, salmon were super efficient, horses were super efficient. They looked at different modalities of human transportation and they found that a person on a bicycle was by far the most efficient. And that's, I think, what computers are and what software is.
So that became my fascination and I decided I want to become a software engineer and I wanted to build software tools. So I did that for quite a number of years. I built programming tools, I built research tools, I built some writing tools. And the software I worked on was, had a couple of characteristics. One was it was proprietary software. Two was I was working in startup companies. And proprietary software worked on in startup companies is almost, it has sort of the seeds of its demise built in from the beginning. Because startup companies are built to be sold and usually when they're sold they're in some form or fashion kind of destroyed or warped. And the proprietary software is often very bound up in the fortunes and fate of the companies that sponsor its development.
So I really enjoyed working on software tools but I found that proprietary software in startups was not something I wanted to do anymore after that. So I was sort of searching for what I'd like to do next. And I came to the conclusion that one of the things I wanted was I wanted to build tools that were durable that could outlast a given company. And I wanted to build tools that were accessible to everyone. That anyone could use irrespective of cost. And that led me to the idea of working on open source software. So I knew I wanted to work on open source software. I knew that I didn't want to do software startups. And at that time I found out about R, which kind of took me back to the work that I had done in data analysis as an undergrad in graduate school. And it took me not very, I don't know how long it took me, it was definitely less than 24 hours to conclude this is what I want to work on. At the time I felt like maybe for the next 10 years, now I feel like for the rest of my career.
So I was very lucky to discover R. And it also felt to me like I could offer something to the community. Because I had worked on kind of programming tools and tools to make people more productive with complex software. So I also didn't want to do a startup. And I thought, well, you know, this is fine. Because I think one or two people could actually make a significant contribution. We wouldn't need to have a startup. We could just make a contribution. And so that's when I started working on the RStudio IDE. And I worked on it initially by myself. And then Joe Chang, who I had worked with a couple of previous companies, joined me a few months later. And together we set out to build the RStudio IDE. And the general mission was to try to make a contribution to open source software for statistical computing.
Why free and open source software matters for science
So I want to now say, you know, why is free and open source software for science and data science so important? At the time I was focused on this idea of durability and accessibility. But as I've joined the R community, I've come to realize there's lots of other good reasons to prefer open source software for data science. So I want to get into a little bit of that. And I first want to make a distinction between different senses of the word free. There's for free, gratis. And there's with little or no restriction, libre. And both are relevant, obviously, with open source software. Famously Richard Stallman summarized the difference and the nature of libre as think free as in free speech, not free beer. We sometimes with free software focus too much on the fact that the software has no cost. But the most important thing is that it comes without restrictions.
We sometimes with free software focus too much on the fact that the software has no cost. But the most important thing is that it comes without restrictions.
And they actually the new summarizes kind of the four essential freedoms of free and open source software. And they mostly have to do with being able to do with the program what you wish, including inspect it, modify it, create your own derivative works from it. And the fact that you are not dependent upon the original purveyor of the software to continue using or evolving the software. So that's actually for science especially more important. The fact that it comes without cost is important. That's actually even more important.
So what are some of the reasons why we want to prefer free and open source software? Well, this I don't need to speak at any length about this with this group. But it's worth reflecting on the fact that if I use proprietary software to do science or data science and someone else wants to reproduce my results, at the best that person needs to buy a license for the software that I used. But at worst, and this often happens, I can't even reproduce my own work in the future because maybe the vendor who created the software has gone out of business or they've made older versions of their products inaccessible. So long-term reproducibility is really only assured by using free and open source software.
I want to point out briefly just how important the R community has been in this movement toward reproducibility. There was an article written in Nature in 2012 sort of making this case, and at the time they cited there were two systems known to enable the packaging of code, data, and text at the time. One of those two was Sweve, which the R community actually came up with in 2002. So we've been at this for 18 years, certainly longer than any other programming language community.
There's another consideration, which is resiliency. As I said before, software products and companies come and go. We don't want our research, our ability to reproduce the research, tied to the fate of a specific product or vendor. Now, a variation on this theme is that software doesn't come and go. It actually stays and becomes really, really important to customers, and then the vendor decides, oh, this is an opportunity for us to dramatically raise prices and extract more value from our customers. So notably, the four essential freedoms that I talked about ensure that this cannot happen with free software.
There's a great example of this from the database world, where MySQL, which is a GPL open source database, was acquired by Sun, and then subsequently Sun was acquired by Oracle. So Oracle, a proprietary database vendor, now owned all the copyrights to MySQL, and they were gearing up to try to do all kinds of things to try to exploit that position. And even though Oracle actually owned all the copyrights for MySQL, the community took the code from MySQL, forked it, and created another product called MariaDB and continued on with development. So only the fact that that software was free and open source kind of ultimately protected the community from a vendor that was going to be abusive.
And we can think in the R community, RStudio and many other vendors have provided offerings around R, and the commitments of vendors can vary over time. Companies can get acquired, they can shift strategies. If, as a user, your primary investment is an open source R code that will run irrespective of any vendor's products, then you're protected from that. You have that resiliency.
There's another piece, which is participation. I think maybe 10 or 15 years ago, many people might have believed, well, a proprietary software vendor can kind of enumerate and account for all the methods that are important in a field and then provide that. Or a lot of people believed that. I think now the explosion of innovation in statistical methodology, nobody believes that, that a single vendor could be the filter who decides what methods are supported and available and easy to use. So the fact that we have an open source ecosystem around R enables what you've seen with CRAN, where there's this huge long tail of innovation. There can be many, many different approaches to analysis, there can be innovation in methods, and it's all supported by the software. So participation is another fundamentally important thing.
And finally, what I talked about at the beginning, accessibility. And data literacy is becoming it's fundamentally important for organizations, it's also fundamentally important for individuals. And open source software allows everyone to participate and use these tools, again, without regard for cost.
How scientific computing tools get built and funded
So let's talk then a little more generally about how these tools for, you know, not just data science, but tools for scientific and technical computing get built. Both proprietary tools and open source tools. How they're built, how they're funded, what the underlying kind of structural incentives are.
Looking at the history of scientific and technical computing companies, these are leaders, if you don't recognize them by company name, you certainly recognize their products, SAS, MATLAB, Mathematica. And they actually have some significant shared characteristics. All of them started in academia by one or two people, and over the first few years of the project, it was only one or two people. And then eventually they sort of organically spread within academia and then grew into becoming companies.
All of these companies, I would say, if you look at their about page or read the writings of their founders or talks by their founders, they identify their principal mission is to support research and science, not profit. And consequently, these are all private companies. None of these are publicly traded. They're all closely held by the original group that founded them. And of course, notably, they all make proprietary software, not open source software.
So, what's problematic about this kind of mirrors what I said about what's good about open source software. We have reproducibility problems, accessibility problems, centralizing decisions about what methods are supported. And there's also that risk of companies that want to perpetuate themselves, be incentivized to kind of hold their customers hostage. So, those are some problems. One really good thing about these companies, though, is that they actually have an economic engine that they can use to fund development. And without some kind of an economic engine, oftentimes sufficient progress isn't made. The tools actually don't reach the bar where people can use them to solve all the problems.
So, now let's consider open source tools for scientific and technical computing. The ones that the people here are probably most hands-on familiar with are the ones, the R and Python data science ecosystems. But there have been lots of tools before that, notably SageMath and GNU Octave. And these all have roots quite similar to the proprietary software vendors. They started in academia. They grew very slowly and organically at the beginning. And the founders of these projects also were concerned principally with supporting research and science. And they wanted to protect the software. Their means of protecting it was to make it open source. Because at the time a lot of these projects emerged, the idea of open source had gained more currency. The viability of open source had more currency. Linux had achieved some success. So, this is the approach that these projects have taken.
And so, what's problematic here is do these projects have enough funding to sustain momentum and get where users want them to be? And honestly, over half of software engineering is solving boring problems. And sometimes you just need to have a size of team and a commitment of a team to go and solve those difficult and boring problems. So, you know, getting enough resources is the same. These projects can be a challenge. And there's also an issue of sort of project organization. Are these projects cohesive enough to deliver the software that users need? And then significantly, are organizations who are making decisions about taking big long-term dependencies on software projects, are they comfortable adopting the software without visibility into the long-term health of the project? Will the project be around?
So, there's different ways that people have come up with to fund open source development. There's grants, and Jupyter has actually done quite a bit with grants. There's probably a natural limit to how much funding you can get from grants, but Jupyter's done quite well with that. Open source software can be funded by companies that have an interest in the software, and Linux is the best success story here, where lots of really large companies that have an interest in having a free, robust, free Unix operating system all contributed to it. And that's the initial model for Ursa Labs working on Apache Aero, where there's a bunch of companies, including our studio, that have invested in building the software because we think it's going to benefit our users. There's also the traditional method of venture capital, but if you look at those companies I talked about, MathWorks, SAS, Wolfram, they've been around, the evolution of their products and the adoption of their products took like 30 or 40 years. They've been around for a long time. The venture capital model is much, much shorter, so I don't think it fits particularly well with building open source tools for science.
So, the question is, do any of these models actually work? And this question was asked rather, or answered rather poignantly, or pointedly, by a gentleman from Wolfram, because apparently Wolfram gets asked all the time, why isn't Mathematica open source? And so, he wrote a blog post, 12 Reasons Why It's Not Open Source. And I'm not going to read these reasons out loud to you. If you take a quick scan of them, I think you can get a sense for what the reasons are. And I would say, generally, I agree with this analysis to the extent that I do believe that you have to assemble a group that works together to achieve a set of shared goals over a sustained period of time. You have to have strong technical leadership to solve hard problems, and you do need a financial engine that can compensate talented people to work on these problems. That's kind of the main gist of his case. I agree that those things are required, but I actually think it's possible to do it with open source software. And I think that the kind of history of our studio demonstrates that.
RStudio's model: open source sustained by commercial products
So, if you look a little bit at the history of the company, 2008, late 2008, I believe, when we started, it was, as I said before, just one or two or three people working on open source software. And then about seven years ago, we decided that we thought we could do a lot more with just having three people working on open source software was fine, but we thought there was potential to do a lot more. And so we made a decision at that time to build a company around the open source work we had done with the notion that we could fund lots more open source development if we had a company producing revenue to fund the development. So we did that, and you can see over the last seven years or so, we've produced a huge amount of open source software. We have, I think, over 250 open source projects that we are active developers on. And we have this set of commercial products that we sell, and we've grown from that original one or two or three, six people, to now over 150 people. Our company is profitable, so we're able to take the revenue from our commercial products and feed it back into open source development and then continue as the company grows to grow that commitment over time.
So I think we've figured out a model by which we can still produce open source software, but do it in this kind of sustainable, well-funded way. We sometimes refer to this as a virtuous cycle where we create open source software, and lots and lots of people, because it's accessible, lots and lots of people use it. And when lots and lots of people use software, what happens is large organizations start to adopt the software. And larger organizations typically have deployment and management and scalability requirements that are different than individuals or small groups, and that creates an opportunity for us to build products, to solve those problems, which then gives us revenue to invest back in open source tools.
Now you'll notice here, it's pretty subtle on this slide, there's a line between these things. And you might ask the question, well, where's that line, and where's the assurance that a company like RStudio, as I said, corporations are inherently sketchy, isn't going to move that line at some point. Let me give you the operative principle for us that's behind that line. The operative principle has to do with preserving those four freedoms. So the core libraries, packages, protocols, file formats, even productivity tools like RStudio IDE need to be open source so that users who adopt the software experience the benefits of those four freedoms, and essentially the work is not locked in to the products of a given software vendor. The tools that we create to facilitate adoption of R in large, complex environments, those tools are commercial. Those are tools that, candidly, if we're not providing good value to customers, they can continue using R, continue using the Tidyverse, R Markdown, Shiny, everything without those tools. So for us, we have to continue to offer a good value proposition, our customers can walk away. And that's because the core software preserves the four freedoms. So that's kind of where we draw the line and how we think about open source versus commercial.
So I think we've managed to create a new kind of scientific and technical computing company. I think we have those attributes that the gentleman from Wolfram cited, which is that we have a financial engine that we can work on this in a sustained way for many years and apply adequate resources. We also, like those companies, think it's critical that we remain independent to pursue our mission. But we've managed to do this in a way where the core software is open source and we don't have lock-in, the way those software, and that is by design.
Corporations as stewards: the problem of shareholder primacy
So we're pleased with this, but it begs the question, again, I said about corporations, and are we trustworthy? And I would say we're not at face value trustworthy because in today's world, corporations, and I'm going to get into this in a little more detail in a minute, they are pure profit maximizers. So they're by default, you shouldn't trust corporations. I think our community needs more than we're a corporation that has acted well in the past in order to place their trust in the work that we're doing. And I think customers need to trust that we're looking for a relationship of mutual benefit and that we're not going to exploit our position as a software vendor in the future, as has happened with other proprietary software companies in the past. So we need to build, we need to make our motivation as clear and transparent as possible, and we need to try to build more real long-term trust, and I think some of that has to do with the nature of corporations, how corporations work, and how maybe corporations actually need to evolve.
So I'm going to go back, this is a little bit elemental, but I think it's not something that I had thought really carefully and critically about until a few years ago. What actually is a corporation? Why do they even exist? And what a corporation is, is actually, it's a virtual person. So it's acting as a person, legally a person, but it's actually, it's constituted by a group of people. So why do we even have corporations? Well, what did we do before corporations? Before corporations, any business that existed was either undertaken by an individual or maybe a partnership of individuals. And so the individuals were actually personally liable for everything the enterprise did. And when a contract was made between a business and another party, it was actually a contract with the individual or the group of individuals. So when someone left a company, the contracts actually had to be renegotiated. Business was really fundamentally person to person. And that meant that anything that a business could accomplish was sort of bounded by what one person's assets could accomplish and what liability one person was willing to accrue.
So as the industrial age progressed, this model sort of didn't seem adequate, and the first corporations were actually formed by Royal Act, the East India Company being one of the most significant examples of that. And then as governments realized that we needed things like bridges, railways, banks, utilities, we need to operate at a different scale than individuals have been able to operate at. So they created this instrument called a corporation that they believed needed this sort of absolving of liability and additional capital that the corporate model provided for.
So if you think about then the essential nature of a corporation, it's an institution that is actually created by government in order to benefit the societies that govern. That was the original purpose of corporations. And this worked very well and served the public benefit in many regards. But if you look at that second bullet, you know, about without fear of personal liability, that actually is a recipe for bad behavior, and we've seen that play out as well. So what we have today with corporations has worked well, but I think we're seeing it crack a little bit in the contemporary world.
So if you think about the legal theories of what the primary purpose of a corporation is, there's actually two competing theories. One is stakeholder. This is kind of the original idea and sort of the original idea that I've been sort of advocating for, which is that the corporation is created by the government and therefore has a social function. And the directors and officers of the corporation should consider all the stakeholders affected by the corporation, employees, the environment, the community, shareholders, kind of looking at everybody's interests when they make decisions.
There's another theory, which is shareholder primacy, and it's a pretty raw notion, which is the purpose of a corporation is to maximize value for shareholders within the bounds of law. Really narrow. And sadly, that is actually the legal theory that has won in Anglo-American legal systems. Shareholder primacy is the law that we live under currently.
And there's been a couple significant cases that establish that, or many cases that have sort of built that up. A couple of the more significant ones was in 1919. Ford decided that they wanted to produce less expensive products and pay their employees more, so they stopped paying dividends to shareholders. Again, balancing the needs of all different constituencies. And the Michigan Supreme Court said, that's not a thing. You're not allowed to do that. Your purpose is to create profit for shareholders. Full stop. You can't do that.
Another case, a little more contemporary, Revlon was faced with an acquisition proposal that actually the shareholders, it appealed to the shareholders, it was a good deal for them, but the board felt that it was going to not be a good deal for the employees or the bondholders, people who actually held debt. And so they tried to block the acquisition, and in this case, the Delaware Supreme Court rejected the idea that they had the duty or even the option to consider the interests of stakeholders other than shareholders. So there's been a number of cases like that, and that's kind of where we stand with corporate law.
And many people find this lacking. Companies while they're pursuing profit can create lots and lots of public health and environmental problems. They can create systemic risks, as we saw with recent financial crises. It leads back to the question is that we've given these corporations a special legal status. Does that carry any reciprocal obligation to the public good? And shouldn't companies be able to consider the welfare of their own employees in their community when they make decisions? A lot of the bad behavior you see by corporations is precisely because they can't consider, legally can't consider these things.
So in response to this, a bunch of states have actually adopted legislation that permit, note permit not require, permit directors to consider things other than shareholder value. And now we've seen recently this last year, there's this group called the Business Roundtable that's composed of a bunch of CEOs of mostly big public companies, and for the first time since 1997, they in a public statement said that corporations shouldn't exist solely to serve shareholders. So that's, I think people are sensing that this regime is wrong. This isn't really talking about fundamentally changing the system. This is just more saying we have dissatisfaction with the system.
The benefit corporation model
So what can be done to change the system? So I don't know how many of you have heard of the company AND1. Have people heard of AND1? It's a basketball shoe company. And I think as it will turn out, this company will be very significant in the history of corporate law and perhaps of capitalism. So let me tell you a little bit about AND1. Basketball shoe company founded in 1993. They were a socially responsible business. They had great, they treated their employees really well, had great benefits. They allocated 5% of their profits to local charities, and significantly they, they're a shoe company so they had overseas factories. They worked to implement a supplier code of conduct to make sure that workers in overseas factories were treated well, safely, had good wages, et cetera. So they were sort of a socially responsible business kind of before that became something that more widely practiced.
So what happened to AND1? Well, they were pretty small in the mid-90s. They took on external investors in 1999, and it turned out that they grew quite a bit over, over that span from 95 to 2001. But then ultimately they had competition from Nike and others, and their sales dropped, and they were sort of forced to sell the company. And what was really surprising and disturbing to the founders was, who created this company with the idea, we created the company, we control the company, we want to build a socially responsible business. When they went to sell the company, it was done exclusively to maximize shareholder value. And so after the sale, and they could do nothing about this, all those commitments to employees or overseas workers and local community were just stripped away. So they were shocked and disappointed, and it made them think something needs to change.
So they got together with a friend of theirs to start a nonprofit called B-Lab, and the idea behind B-Lab was to create a new form of corporate governance. So they created the nonprofit, and they actually created a new corporate structure called a benefit corporation. And this is a reaction to the shareholder primacy regime, where the directors of the company are legally required to account for all of the stakeholders, community, employees, in their decisions. It's a legal requirement, not an option. And they also name a public beneficial purpose as part of their charter, that again, they are accountable to pursue, legally accountable to pursue.
I won't read this in detail, but you can see a little bit of the actual legislation behind Delaware Public Benefit Corporations, and it puts a stringent legal requirement on the directors to act in a different way, to not use the shareholder primacy regime. So this nonprofit has actually been successful in getting 34 states to pass legislation that permit benefit corporations. There's currently over 7,000 of them. These are some examples. You've probably heard of some of these, some of these companies, and you might not have known that they were a benefit corporation. In addition to these companies, there are some public company, so the line at the top are public companies. They're not benefit corporations, but they have wholly owned subsidiaries that are, in fact, benefit corporations. So I think this benefit corporation idea has the seeds of what it takes to transform what corporations are and how they relate to their world and to their communities and all of their stakeholders.
RStudio becomes a public benefit corporation
So most of you or many of you can probably guess what the next slide of this presentation is going to be. I'm really happy to announce today that we are now a certified Delaware benefit corporation.
So we actually have a new name. We're no longer RStudio Inc. We're RStudio PBC, and we've always tried to run the company this way. That's always been what we've tried to do, but now it's actually baked into our charter. It's part of our corporate DNA. It's a requirement, not something that we do at our discretion. As part of it, we actually name a public benefit, and that actually goes into our charter. This is our public benefit, and you note that we cite the creation of free and open source software for data science, scientific research, and technical communication. It's a little bit broader than data science.
Today we're just data science, but I'm optimistic that we have created a model that actually is a better model for scientific and technical software. So someday I think it would be nice to do more within scientific computing. So that's why we wrote the public benefit a little bit more broadly. As part of being a public benefit corporation, we actually will release an annual report describing kind of how we've served our public beneficial purpose. We posted the first of those reports on our website. A few highlights. We provide some metrics around kind of our investment in open source projects, and I said before there's over 250 of them. We dedicate over half of the company's engineering resources to open source. Right now we have 36 full-time engineers that work on open source software, and that's broken down kind of by project in the report. And there have been hundreds of millions of downloads of our open source products and packages. So these are metrics we're going to keep reporting on every year.
So in addition to being a benefit corporation, the B Lab has actually created a certification program that both looks at whether you've changed your charter, but also looks at your impact specifically on your workers, customers, community, and environment, and actually rates you across a bunch of categories. So we're happy to also share that we've been certified as a B Corp by the B Lab, and you can see our impact report there, which actually has all those ratings, is also available now.
Some of this, though, begs the question. We've become a B Corp, and that reflects, I think, who we are and what we're about and what we aim to be, but it also doesn't necessarily talk about the future. And I want to tell everyone here that our plan is to remain an independent company, to never sell the company. But, you know, yes. So that is what we're going to do. But how can we actually make that happen? How can we provide assurance that that's going to happen? We do have outside investors as minority shareholders, and prior to this conversion, we actually had written into the financing documents, I had special rights that I could block what I say, undesirable outcomes. I could individually block the sale of the company. But, you know, just saying that I can block the sale of the company isn't really fully reassuring, because, you know, what if I die? What if I change my mind? So it's a little over-reliant on one individual, so we also, along with the transition, made some changes to kind of how those shares are held and how the rights are exercised. So now there's actually a group of people all inside the company who exercise those rights. So if I, you know, die or change my mind or anything like that, we still have those protections in place.
And before I conclude, I wanted to also talk about kind of who ultimately benefits from RStudio's success. As I said in the talk, we've tried to build a company where we have lots of beneficiaries. All of our stakeholders are beneficiaries. We do have shareholders, and the traditional way that shareholders get remunerated is usually either selling the company or going public. So that is not our plan. So what we're going to do is take our profits and use those to purchase stock back from our shareholders over time. So once we've met that commitment, we are going to dedicate a substantial portion of our profits to philanthropic causes that relate to our mission of open source software and open science. And those donations, we've documented in this year's annual report the donations that we've made. But as we are able to purchase stock back from our shareholders and dedicate more of our profits to these donations, we'll also report specifically on that in our annual public benefit report.
we are going to dedicate a substantial portion of our profits to philanthropic causes that relate to our mission of open source software and open science.
So thank you all very much for helping us to build this company and build this community. It's been an incredible experience, far exceeding anything I could have ever hoped for, and I'm excited for what the future holds. Thank you.

