Tareef Kawaf | Posit useR! 2017 Presentation, Brussels

Transcript#

This transcript was generated automatically and may contain errors.

My name is Tareef Kawaf, I'm president of RStudio , and I do this probably once a year at useR! just to give people a chance to give people an update on what we've been up to, and then, frankly, hopefully get some questions and have some dialogue. There's a lot to cover, so I'm going to try and run through things pretty quickly. This particular conference was interesting because I ended up with a whole bunch of conversations that highlighted that maybe we need to spend a little more time talking about who we are and what we do, which is not something that I was planning on, but anyway, so this is our mission. It's pretty straightforward.

It's aspirational, obviously, and as you can see by the name of the company, we're very much dedicated to R. We think that R is a fantastic ecosystem, and we want to make sure that that continues to be a free and open source solution that's available to people regardless of their economic means.

So when you look at how we think about the world, we believe that all the work that you do should be reproducible, and that invariably ends up meaning that we believe that it should be in code.

So we believe in APIs, designing good APIs, trying to figure out what are the ways that make it really easy for the data scientists to get their work done.

you can go in and you can get yourself comfortable with R connecting to Spark and leveraging the power of Spark without having to give up on R.

New packages and Shiny

And then here are the packages. The packages I listed here are all packages that were updated or created in the last year. The ones that are highlighted, the blog down is in this color because it's not on CRAN yet.

So on the Shiny side, we got to 1.0. I should mention we were on 1.0 on the IDE. We're on 1.0 on Shiny. Obviously upgrades to Shiny server itself. There's a couple new packages that we're working on that are not on CRAN yet. One is Shiny test. And so one of the big things is people create Shiny applications and then the organization says, hey, you broke it when you did X or Y or Z.

We recognize that because the background of our community is data scientists, right? They're not software engineers by training. And the tooling is not that great. We're trying to create tooling that makes life easier for folks. And my colleague Barbara talked about pools, so I won't cover that too much.

R Markdown and Bookdown

R Markdown, there's a new format called Bookdown. How many people have heard of Bookdown? Okay, so Bookdown is a format that makes it easy for you to write books in it. We'll show you the website that you guys can go to and take a look at it and play with it.

And how many people are familiar with parameterized reports? Okay. Okay, that's what I expected. So parameterized reports are, you know, you guys are familiar with R Markdown docs. You can write R Markdown docs. You can actually pass in parameters. If you write your R Markdown doc in a certain way, you can have something in the YAML file that specifies what the parameters are. And then you could render the doc by passing it in the arguments and use those arguments within the report. That allows you to sort of essentially personalize a report if you want to or reuse the core logic for, you know, something like, okay, I might run the same report but for this different date range or run the same report but for North America versus South America, whatever, right?

Books, webinars, and the RStudio conference

All right, we published a new book, R for Data Science. And when I say we, I don't mean me. I mean Garrett and Hadley. And you can, if you don't have, are you guys familiar with it? All right, so there's a, you can read it online for free. You can obviously buy it too. We also have a book on Bookdown. If you go to bookdown.org , you'll actually see that not only is there a book on Bookdown, but there are, you know, the R for Data Science book is also on there as well as other people who have published books on there.

We continue to have webinars. We have cheat sheets. I think I know that people are seeing the cheat sheets. And we had our first RStudio conference in January of this year. The next one is going to be at the end of January, early February. We had a wonderful time. About 400 people managed to show up. And so if you guys are available at the end of January, early February, you're obviously more than welcome to come along.

RStudio Connect

All right, so how we think about data science. You guys may have seen this before. You know, you import data, you tidy the data, you transform the data, you visualize the data, you model the data, and you keep going. And then eventually the key thing behind doing all data analysis is to communicate, right?

All right, so we decided to build a product based on feedback that we got from customers. And just to be clear, this product is a professional product. And right now it's only available on premise, but ultimately in time we imagine that we will have a cloud-based solution that everybody in the community can sort of leverage if they want to, right? And what this product is aimed at is making it really easy to go from an analysis to being able to sort of share that with the rest of the world. So it's push, button, deploy of all the artifacts that you sort of create out of R. It's an on-premise solution because most of our company customers will not run any of their stuff outside of their firewalls.

And so, again, if you want to think about it, you've got, you know, you guys are the creators of analyses. You may create notebooks, you have static docs, you have presentations, you may have parameterized R Markdown docs, as I said earlier, you can have shiny applications. And ultimately your goal is to communicate this to somebody, right? Somebody in the organization needs to be able to see this, and so we're trying to also build in things like, you know, scheduling. So if you wanted a report to run every Monday and email people out or whatever, you can sort of do things like that without having to sort of write your own crontabs or, you know, have a separate system call into it.

Now if you want to, you can absolutely do the same thing. The foundation of everything that you see here is all open source, right? People can recreate this themselves at any point in time, and I expect that they will.

In terms of, so again, the creators, there's artifacts, there's consumers, and you guys are all familiar with the work of the creator, right? So you're tidying that whole tidy, transform, visualize model, but you also care about reproducibility, you care about collaboration, you care about publishing, you care about protecting, right? So who can see this content, when can they see it, and so on and so forth, right? And then on the consumer side, they want to be able to get a hold of the data, they want to be able to explore the data, they may want to personalize it, say, I'm only interested in reports about North America, or I'm only interested in Southern Italy, right?

Or they may want to say, I know I saw, you know, Joe showed me a report or a dashboard, I want to see what that looks like today, can I go back and rerun that myself, right? So the idea is if you've already built this logic in R, we're going to try and make it more and more accessible to the business user.

All right, so that's what Connect does. And, you know, a lot of our work ends up being providing features to answer questions for the IT organization on why R, right? The data scientists are bought in, they use it, they're building really great things, and then they run into these obstacles within the IT organization. So, you know, we spend a lot of time trying to say, okay, you know what, here's how you tune this, here's how you monitor this, here's how you scale it.

For Connect, for example, I just spent a pile of cash on an external security vendor firm to come in and validate and make sure that, hey, you know what, here are the issues that we found, that there's an independent person taking a look at that. And so we have a higher bar to cross over because, you know, for IT, they're very familiar with a Python solution, a Python stack, and in many ways they ask the data scientists on the R side to hold themselves to a higher standard than what, you know, what even the rest of the organization is held to.

But our job is to hopefully make it easier and easier for that sell to happen. So deployment is a push-button deployment. We identify dependencies, you know, we use Packer right underneath the hood. We match up our versions to the best of our abilities, and so the idea is get more of these artifacts to be published quickly.

The reports, all the reporting is based on our markdown, so it's completely reproducible. You can schedule it, you can email it, you know, ultimately in time, you can imagine that you can also have that drop into a location on disk or drop into Confluence, or potentially, like, send a Slack message or whatever, right? And you can customize, you can give the end user the opportunity to sort of customize the report that they see based on the parameters that they're interested in. And obviously, it can do everything around hosting Shiny applications.

So generally, architecturally, this is what it looks like. You know, you've got the data scientist publishing on one side, you've got the end user sort of interacting with it as a web browser. They don't know that they're talking to anything that is R-related.

Useful websites and resources

So useful websites, shiny.rstudio.com, I'm guessing everybody here knows about it. rmarkdown .rstudio.com, everybody? Yes? No? Kind of? All right. You guys are falling asleep on me. db.rstudio.com is a new website. It's talking about, like, we're trying to capture all the things that are related to sort of connecting from R to the database. spark.rstudio.com is the same kind of thing, but for Spark. Bookdown, we've talked about a little bit. And if you wanted to play with Connect, you can, without downloading it, there's an eval that you can download and run it, but if you wanted to just play with it and see what the end user experience is like, there's all the features, because it's a common shared infrastructure, but it'll give you a feel for, like, what you can publish, how you can schedule, et cetera. So you can go to beta.rstudio.com and just log in with your Google Auth.

All right, so with that, how many minutes do I have? Two and a half minutes. Questions? Concerns? Nothing? I can't actually see if there are any hands up. All right. That's it. That's all I had. Thanks, guys.

Tareef Kawaf | Posit useR! 2017 Presentation, Brussels

Transcript#

RStudio's open source commitment

Hiring and the Tidyverse

RStudio IDE updates

Spark and sparklyr

New packages and Shiny

R Markdown and Bookdown

Books, webinars, and the RStudio conference

RStudio Connect

Useful websites and resources