Tareef Kawaf | Welcome and the Posit Vision | Posit (2019)

Transcript#

This transcript was generated automatically and may contain errors.

I'm usually the shiny -headed guy you see in some people's photos, like when they take photos with Hadley and you see there's a bald, shiny-headed guy next to him, that's usually me, and they're like, why does this guy keep showing up in these photos?

If you attended my talk last year at RStudio Conf, you know that I am regrettably not a data scientist, contrary to what Hadley said, yet. One of these days I will get there. I do enjoy playing with R. I find it, my background is in software engineering and mathematics, but I love to play with R and I love to explore how you can use it to understand the world more deeply, and hopefully make better data-driven decisions.

It is precisely the question of understanding the world and making better data-driven decisions. This is going to be so easy for me, I can tell.

You know, it's funny, I was talking to my wife and my seven-year-old said, you know, would dad get fired if he screws up on this presentation? I said, I don't think I will be fired for this, but all right.

So six years ago I got into R and eventually I met JJ, and at the time I didn't fully understand or appreciate just how incredible the R community was, and continues to be, and the positive impact you all have on this world. I also hadn't gotten a full picture of what was possible with this remarkable tool chain.

You'll hear a lot from speakers this week about the power of the open-source packages that are created in R, and I don't need to tell you guys about all the incredible things that are sort of coming down the pipe. My goal in this talk is to give you a deeper look at our studio story, what we believe in, and how organizations have adopted R and the R ecosystem to solve real-world problems.

So given the rate of innovation and change, it is often difficult to stay on top of what is possible and how the puzzle pieces fit together. I happen to know this because last year I was lucky enough to visit 30 different customers all over the world.

And my goal was to sort of talk to them about the new products we're building, where we're going, and to see whether the problems that we're solving are the same problems that they are trying to solve for themselves. And it became really obvious that most people don't have the full story, and it's really hard. And it occurred to me that the reason for that is because we've never told that story publicly.

So let's make sure that you guys walk out of here with three things today. What is the way of RStudio? What do we believe in? Why do we do the work that we do? How can you adopt R in production? And finally, how does R sustain itself? You'd be surprised how many people still ask me that question.

The printing press and the prime directive

To answer that, I'm going to take you back to 1439. Does anybody know what was created in 1439, what was invented? The printing press. I knew somebody in the audience would know the answer to that question, so it wasn't so hard.

At about that time, there was a goldsmith by the name of Johannes Gutenberg, and he invented the movable type-based printing press. Prior to its arrival, the process for creating content was one where people would write manuscripts, usually monks, because they were the only ones that could read and write, and the process was really, really slow and very expensive, which meant that knowledge got centralized into the hands of the few, and being able to validate the authenticity and correctness of the work was really, really difficult.

Developing a press that could accelerate the creation of perfect fidelity copies of a book, most notably the Good Book at the time, radically changed who could have access to knowledge and increased the confidence that everybody was seeing the same information. Many believe, and including Wikipedia, that the arrival of the printing press helped start and fuel the scientific revolution in the 16th century.

I'm particularly attracted to the few concepts in here about ushering in the era of mass communication, permanently altering the structure of society.

Now 500 years later, we have the prime directive. John Chambers wrote about this in his book, and I think it's so important and it captures who we are and what we do so well that it bears reading out loud.

Science, business, and many other areas of society continually rely on understanding data, and that understanding frequently involves large and complicated data processes. Those who receive the results of modern data analysis have limited opportunity to verify the results by direct observation. Users of the analysis have no option but to trust the analysis and by extension the software that produced it. This places an obligation on all creators of software to program in such a way that the computations can be understood and trusted. This obligation I label the prime directive.

Why RStudio believes in code

So for us, code is reproducibility. Reproducibility is good science. If you believe in code and reproducibility, you can get reuse, you can get automation, you can get scheduling, you can get parameterization. Good science for us is good business.

So for us, code is reproducibility. Reproducibility is good science. Good science for us is good business.

So I don't want to rush over this too quickly. If you can reproduce your analysis, you can recreate it, you can repeat it, and in the same context or a new one, and you can always reproduce your analysis if you record your analysis in code.

So last November, I was at a conference in Barcelona, the Gartner conference, and there was this big movement that was starting up called the no code movement. I don't know how many of you guys are familiar with it. But I sat there and I listened to what folks were saying and I saw the CIOs sort of like oh, this is really exciting, this is going to be the new world, I don't have to hire data scientists, I don't have to code, you know, and it occurred to me that we are basically the opposite. I'm like, why did I show up to this show?

So I figured it would be good to just remind people of what we believe. There are four things, four reasons that we love code, right? The repeatability I suspect everybody here would agree with, right? It's important for your analysis to be repeatable and reproducible down the road. It's one of the key tenants, if you will, of the scientific process.

But there are other elements that are also important. Inspectable analysis. So inspectable analysis speaks to the ability to review the work, point out flaws in the assumptions or suggest improvements. It ultimately helps one understand how the results were achieved in a transparent manner.

Diffable analysis addresses the ability to leverage that same work again for future work. Or to build on or extend it. If you want to solve something new, you might Google for it and understand that someone has done what someone has done to solve the same or similar problem. It serves as a foundation for sharing.

And then finally diffable analysis. I tried this out a few for a couple of days and most people are like, I don't know what diffable analysis is. This may seem like a variant of inspectability. But it's important in its own right. What we're talking about with diffable analysis is you being able to compare the changes made to your analysis over time or somebody else's analysis over time. It makes it easier for you to understand why decisions were made and if there were mistakes, where the errors were introduced along the way.

So for us, ultimately we believe complex problems will require code. Code is communication. Communication is critical. What gives you then leverage? Single data scientist creates something that a whole bunch of other people in the organization can use to solve real problems. With code, you can inspect how a problem is solved and either adopt it or figure out how to improve it yourself. And finally, with code, the answer is always yes.

Our open-source products create value for everyone, and our commercial products help businesses leverage that value.

Commercial customers benefit from a financially viable and sustainable ecosystem of these open-source tools and packages, along with products that meet their enterprise IT standards for deployment and production. A thriving ecosystem of education, training, and services can bring energy, new ideas, and talent to create and share data products at a fraction of the cost and hopefully improve our societies and our world.

Thank you so much for your attention and for taking the time to attend the conference. We are very excited to have the opportunity to spend a few days together. I picked out a few of the talks that will dive into greater detail on the topics that I could not talk very deeply about. If you are interested in hearing more, feel free to take a picture of the slide. If you want to see any live demos or ask any deeper technical questions, please stop by our professionals lounge if you go out the back and towards the left there. You can ask our engineers, and I will also hang out there and answer any questions. Thank you, everyone, and have a wonderful, fun-filled day.

Tareef Kawaf | Welcome and the Posit Vision | Posit (2019)

Transcript#

The printing press and the prime directive

Why RStudio believes in code

Open source packages and commercial products

RStudio Server Pro and the launcher

RStudio Package Manager

Shiny, R Markdown, and RStudio Connect

R as a glue language

The RStudio virtuous cycle

Featured software#

rstudio