Resources

RStudio Team Deep Dive | In A Hosted Environment

You probably know that RStudio makes a free, open-source development environment for data scientists. It’s made with love and used by millions of people around the world. What you might not know is that we also make a professional platform, called RStudio Team. In this Live Session, Tom will walk you through our Rstudio Team Trial, where you can learn how to best test drive.... - Scaling your data science work - Seamlessly managing open-source data science environments - Automate repetitive tasks - Rapidly share key insights and data science products securely to your entire organization. - And, optionally integrate some of your favorite open-source packages into the trial experience Leading organizations like NASA, Janssen Pharmaceuticals, The World Health Organization, financial institutions, government agencies and insurance organizations around the globe use RStudio’s professional products to tackle world-changing problems and we’re inviting you to learn how. You’ll learn how RStudio Team gives professional data science teams superpowers, with all of the bells and whistles that enterprises need. If you don't have your own trial instance of Rstudio Team to follow along (not required), feel free to request yours here: https://www.rstudio.com/products/team/evaluation3/ Additional resources here: https://docs.google.com/document/d/1HGt7LSohhyxpCvETvVEFHugrdaSnTcZaXbI0jV5g9ok/edit?usp=sharing

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

All righty, happy Thursday, y'all. Thanks for joining me today. We're just getting started. It's right at 10 a.m. Central Standard Time for me here in Texas, but we'll wait a few minutes to let some folks roll in, and then we'll get started on our RStudio Team eval demo. We're going to have a bit of a live code, a little bit of slides, and definitely going to be taking questions as they come up in the chat. If you do have any questions, feel free to drop them into the YouTube chat here on the live stream that you're watching.

A lot of the examples I'm using, you can find the public source code here on my GitHub, so that's going to have some of the details for the slides as well as some of the example applications we're showing in R and Python. I also have my colleagues from RStudio who will be answering some questions, so if you see me talking and you see people answering questions in the chat, I'm lucky enough to have some great team members behind me. We also have Kelly O'Brien, who's the RStudio Connect product manager, so she might answer some additional questions there in the chat.

What we've been doing is kind of for a lot of these is if you'd like to say hello in the chat, feel free to say hi to each other in the chat. Usually it's between 60 and 600 people, somewhere in that range in the group, so feel free to talk amongst yourselves or ask questions as well.

Overview of RStudio Team

So for today, again, like I mentioned, there's going to be a little bit of some slide craft in terms of like we go some slides, we're going to walk through, but the bulk of today is going to be walking through the RStudio team evaluation. So the team evaluation is all three of our professional products, so RStudio Workbench, RStudio Connect, and RStudio Package Manager. The evaluation means that they're time-gated in terms of it's just trying out the products and we provide the hosted environment for that.

If you did want to try it out, you just fill out this form and you can click start in the evaluation and then I'll walk through what the evaluation actually looks like, how you can use the products, and then really what today is about is understanding how these things fit into your workflow in an enterprise, as well as how you make some of these arguments for using open source. Even if you're not using our professional products, we'd love for you to use them, but even if you're not going to use them, how do you motivate folks to use R and Python and all the wonderful packages that you use in both languages within your organization?

So again, I'll jump back into the slides and we'll talk a little bit about RStudio team, and then we'll hop over into the evaluation proper and have quite a bit of live code and kind of walking through the whole process. Again, for RStudio team, we think of this as really like a single home for R and Python for all data science teams. So regardless of if you're 90% R, 10% Python, or 100% Python and no R or some other mix in the middle, we've tried to build these products in a way that they're serving both kind of constituencies, that both people are very happy and able to be productive with these products.

As far as the three products, if you weren't familiar with them and kind of the different people who would be interested, you know, maybe you're a data scientist or data analyst, maybe you're a decision maker, a business user, someone else kind of trying to make decisions with data, with data science, or maybe you're even an R admin or Python admin or IT admin that's more concerned about all this open source work that's being done and people are trying to be productive, but you're trying to support them and make things secure and scalable and have proper operations around it. So again, we kind of think of this as these are the three core personas that we're trying to solve problems for.

In terms of data scientists really want to use R and Python, they want to develop their applications and share them with the rest of the organization, whether that's for other data scientists or for business users trying to make decisions with the data science work they're doing. And then the IT team, whether you're an R admin, meaning that you're kind of in both worlds or a Python admin or a traditional IT admin, you're just trying to support all this and make things operate smoothly.

Again, the three products are RStudio Workbench, RStudio Package Manager, and RStudio Connect. RStudio Workbench is where you would do all of your data analysis and kind of writing your code. So you can write there in R, you can write there in Python, you have the RStudio IDE, as well as VS Code and Jupyter. So a nice mix of different environments you can write code in. RStudio Connect is where you'll publish all your results. So that could be R Markdown reports or Jupyter notebooks. It could also be interactive applications like Shiny or Dash or Flask or Streamlit in Python. And then lastly, RStudio Package Manager is supporting all those open source libraries. You can actually store copies of them on-premise or in your cloud, behind your firewall, wherever you need them, as well as your own internally developed packages.

So as far as what we're kind of focusing on, here at RStudio, we're focused on open source first in terms of we're really trying to build an open source model where we give away actually the vast majority of the software we create. So things like Tidy Models, the Tidyverse, R Markdown, those are open source packages that we're giving away, you know, freely available to anyone who wants to use them. The professional products build on top of those to provide some of the things that organizations are more interested in. So security, scalability, you know, access control, as well as logging and authentication and all that.

We're also focused on being code first in terms of we really believe that to do, you know, really serious data science and do, you know, productive work, that code is going to make you better. And then learning even a little bit of code or learning a lot of code can help you be a more productive data scientist and have more control over what you're building. And lastly, centralized or even cloud-based. So this means that our products run on a server, whether that's on-premise, bare metal, you know, existing servers you have, or in a virtual private cloud like you might find on, say, AWS, the Amazon Marketplace, or Azure, or Google Cloud, or the other providers.

The problem RStudio Team solves

So we'll talk about one more slide, then jump into the live coding section, but really the problem here we're trying to solve is that there's this chasm or this gap between what data science teams are doing and trying to doing, and actually impacting the business and, you know, creating business value or affecting decisions in a positive way.

So data science teams create things, but it's stuck on their laptop, or they're trying to share something with their boss or their colleague, and they have to email it to each other, or they, you know, have to put into a shared drive, the person downloads it and moves it over.

So how do we cross this chasm and get the data science work that you're doing into the hands of either live decision makers, or into the automated decisions that people are doing inside of the software? So as a data scientist, you're either creating insights, or trying to share insights, or creating models that are going to be used downstream, and that's what you're doing today, but you're still trying to get them closer to either decision makers for automated decisions, or actual human interaction.

So in our world, you'll be creating these insights, or writing code in R and Python in a data science workbench that supports both languages. So log in one time, you have your full environment for R, you have your full environment for Python. So with R and Python, you can build amazing things. You can build applications, reports, APIs, you can send, you know, programmatic emails, as opposed to manual emails. There's a lot of power here in terms of things you can build.

Once these things are finished, you can now get them onto a deployment server, basically a centralized location, where if people want to access the things you're creating, they know to go here. They go to this location, they log in, it's access controlled, it automatically scales up to meet the user needs, all the things you want there. And this will deliver the actual applications, as well as, you know, scheduling emails, and other assets, or automations that go to live decision makers, or hosting things like APIs, so Plumber in R, or Flask, or FastAPI in Python, which integrate into other software, whether it's, you know, Java, or JavaScript, or your website, any other type of things there.

Once these things are finished, you can now get them onto a deployment server, basically a centralized location, where if people want to access the things you're creating, they know to go here. They go to this location, they log in, it's access controlled, it automatically scales up to meet the user needs, all the things you want there.

Walking through the hosted evaluation

So once you actually get an evaluation environment, this is a hosted environment, meaning that there is nothing for you to install for the evaluation. You just go to the URL, and you get access to the entire suite of our products for this temporary environment. So, you have access to RStudio Workbench, access to RStudio Connect, and RStudio Package Manager, which make up RStudio Team.

It's got some additional links, in terms of if you need some help, or if you want to learn a bit more about Team, or you have some questions, as well as access to a built-in mail server, so you can send emails from Connect, or through some of the automations there, and see how those work. So, really, part of the power here is that, if you wanted to evaluate, how can I advocate for using these open source tools in my organization, rather than having to download the software, install it, and go through this IT process, you could actually evaluate it here in your browser, show some of the work you're doing to your colleagues, or to your decision makers, or to your stakeholders who own the budget, or your IT colleagues, and actually say, here's the things I could be doing. Can we move forward and try and make this a reality?

Something else to note, in terms of once you've got to this page, you're going to be logging into everything with the default password and username. I've changed the username for this environment, because I don't want anyone to accidentally log into my environment with this information, so I have changed this for mine, but for you, if you were to open up this environment, you could go to, say, RStudio Workbench, and it's going to have the default username and password as RStudio, and RStudio. So, easy to remember and log in, and you can use the same credentials inside RStudio Connect.

As far as, the next question will probably be like, okay, cool, I actually want to see this in action, so let's open up RStudio Workbench, I'll accept this, and you'll note that it automatically logs me in, because I've already logged in here. If I wanted to, it will come with some basic examples of code, but I could always upload or write my own code here. It's a full RStudio environment, so let's go ahead and open an RStudio session.

I'm inside RStudio team now, so it might look a little bit different than your traditional RStudio environment, but the core idea is I can open up a new RStudio session by clicking new session, we'll say my first session, I want to use RStudio, and I'll start that up, and it'll set me over into the happy environment that I know and love, RStudio, the IDE.

At this point, you know, maybe I have a script I want to use, I could always bring it in from version control, so if I wanted to clone something from, say, like a GitHub repository, I could do that, or you can actually upload files from your desktop. So if you had code you wanted to upload, again, please note that this is not a production environment, in terms of you shouldn't try and use, you know, this evaluation as the environment we're going to do all your data science work today, but this is an evaluation of how you could do some of that work, so you can try it out here, and see if you wanted to move forward with evaluating it, say, on-premise, or actually moving forward with professional products.

Part of the back out here, in terms of, obviously, we've seen RStudio, I'll go into a little bit more in terms of what you can do there, but it's more than just RStudio, in terms of I can open up a new session, and I can actually change the editor, so if I were a Python user, I might be more comfortable in, say, JupyterLab, or Jupyter Notebook, where I could use the notebook environment to write Python code, and publish those notebooks to Connect. If I were writing, say, applications, like Streamlit, Dash, or Flask, I might be more comfortable in VS Code, which is a great editor for Python, as well as things like JavaScript, and some other languages, so those are also available to be opened up through RStudio.

There was a question in the chat, in terms of using additional clusters, so this environment is run, this hosted evaluation is run in AWS, in a single environment, so it only has access to that local cluster. If you did have something like an elastic cluster externally to that, something like Kubernetes, or Slurm, you could also launch sessions that are running in that elastic environment, so that is an option if you were running it on-premise, or in your own private cloud. For the hosted evaluation, you are limited to this local environment, or local cluster.

So, let's start up the Jupyter Notebook, and that hops us right into Jupyter Notebook. There's a welcome to Jupyter that we can go into, that has some, a little IPython notebook that we can open up, and now we're directly into a Jupyter Notebook, and it's pre-filled with some codes, you can play around here, you can run the different lines, and create some plots and things, but just showing you that all the different plots, and assets that you're working with, work inside RStudio.

So, we've shown RStudio, we've shown Jupyter Notebooks, and let's, just for fun, we'll show a VS Code session as well, because I know a lot of folks that I worked with previously, that were using Python, really liked VS Code. So here, hopping in directly into VS Code, again, it's kind of just VS Code as it is, you can install some of the extensions that you want, or customize it to how you see fit.

All right, so we've shown RStudio, we've shown Jupyter, we've shown kind of a lot of different things going on, so let's go back to the home page, you can see I still have all these different sessions open, and that's part of the power, in terms of like, I've logged in one time, my one environment that I have to log into, I've been authenticated, and now I have access to all these different coding environments, where I can do Python natively, I can do Python natively in Jupyter or VS Code, and I can do all my R work inside RStudio, the IDE.

Publishing R Markdown to RStudio Connect

So we can start off with, say, an R Markdown. That's one of my favorite packages in R, and it's very powerful. You can see I've uploaded a file here that actually hasn't been expanded, so let's go ahead and untar this bundle.

I'm going to look over at the chat real quick. Disable certain types of environments where, say, Jupyter Notebooks are already available via other systems. Yeah, so there's a good question about, are all these mandatory? You know, if you just wanted to use RStudio Workbench for RStudio, that's more than fine. You know, say you have a JupyterHub instance or something else, and you're not using Jupyter, that's great. Part of what we're trying to solve here is having kind of an entire workbench, and that's why it was rebranded from RStudio Server Pro to RStudio Workbench is because we're trying to serve different personas here. So not only R, but R and Python together. For a lot of the IT teams, they wanted to manage one environment as opposed to managing three or four different custom environments.

So as far as these, you know, files, so I've opened up in R Markdown files what I'm doing here. So let's go ahead and, you know, open up the R Markdown. It's a pretty traditional one, and it says I've got some packages I need to install, and that's more than fine. You can install the packages you'd like to see in here.

So let's say I wanted to install Flex Dashboard. I can install the Flex Dashboard package, and even though I'm in this hosted evaluation, I can still install packages as I see fit. Part of the benefit of working with Package Manager inside here is we get really fast package installations. So let's install just three packages that are kind of already available. If you've only been working on your desktop, you might be used to really, really fast package installs because CRAN provides binary packages which install very quickly.

So for this, for a Linux environment like I'm running here in RStudio Workbench, packages have to be built ahead of time in terms of that's what Package Manager is providing is providing the Linux binary so they install very quickly. So I was able to install those packages while I'm talking and kind of faster than I'm talking because I'm using the binaries from RStudio Package Manager.

So I've written this R Markdown document. It's got a lot of Python, or sorry, a lot of Deplier code. It's got some parameters and different settings I can do. And say I want to share this with a colleague. Sure, I can take the R Markdown document and I can email it or I can put it in a shared drive or something else and they could, you know, render it or open up the HTML. But for a business user, they may not want to receive it that way. Like they're like, I don't know what R is. I just want to see the final report. And that's what Connect is providing.

So once you finalize your report, you can actually take this document and with the blue publish button up here, you can take all the documents in this directory. So there's the R Markdown report, email, and these returns, and we can create a new asset on RStudio Connect. So I'll click publish, and this will actually take that document, all the dependencies, build everything, and install it on Connect for me. So it's basically taken a snapshot of all the different packages and the environment I'm working in, and it's recreating that environment for me on RStudio Connect.

So now, again, in the time it took me to tell you about it, I now have a URL that I can share with my colleagues and I can limit access or grant access based on how secure I want it to be. For today's example, let's say that I want to show this to y'all. So I can actually copy this link that I just created, and we can paste it here, and you could actually go to this asset that I just created.

Now, I open up the sharing settings, so it kind of opens it up inside Connect, and this is our first kind of foray into Connect. We've published something, so it pulls it up here, and we can look at it, and it defaults to making it where only I, or the specific user who published it, can see it. As the publisher, I have the ability to change these sharing settings. So maybe I want to make it available to anyone within my organization. As long as they can log in, they can see this asset. Or what I just did is actually make it public, so anyone with this URL can open it.

If I were to limit it to specific users or groups, this is pulling in from my authentication, so I don't have to go around creating groups in one thing and creating groups in another thing. Whatever your IT team is using for authentication can most likely integrate with RStudio Connect, and then you can use that existing authentication to bring in the users or groups from there.

The other component is this is a parameterized R Markdown. So while it's showing one example for this code, it's got a nice interactive graphic and some interactive graphics down here, it also, Connect can handle those parameters inside it natively. So for your business users, they can come in here and change parameters and rerun the report. So they don't have to go ask you, hey, I saw you generate this report, can you generate a new version for me, and then they wait, and then you send them a new one, and you have this back and forth. They can actually answer some of their own questions and generate a new report or a new asset very quickly here.

Something else that's really exciting in terms of what this can provide is I can schedule this document. Because I've published it with the source code and because Connect is a data science platform, it can actually re-execute R and Python code. So this R Markdown, maybe I want to run this every day. So again, rather than me having to manually generate a report, manually send an email, manually change parameters, I can just have this report update in place every day at a specific time.

I want it to run really early because my boss comes in at 630, and I want to make sure that it's available for her so she can see it whenever she comes in first thing in the morning. And I want it to send an email every time it sends so that she gets the kind of quick notes in her email, and they can come to Connect to see the details as well. So now I've saved it as a schedule in terms of this will now be executed every single day at 5 AM and will execute all these different things and regenerate this in place for me. So very, very powerful.

Multiple R versions and background jobs

So what I see is that for RStudio Workbench, kind of going back, leaving Connect, going back to RStudio Workbench real quick, there's a question about using multiple versions of R. Yes, one of the benefits of using RStudio Workbench is using multiple versions of R in the same environment. So while this is using RStudio 4.1, which is one of the latest versions of RStudio, maybe I have a project from six months ago or a year ago that was actually using R3.5, and I want to make sure that I can actually go back to that environment and work with it.

I still have project-specific libraries in terms of because I've moved from R4.1 to R3.5, these packages need to be installed, but I can do that very quickly because, again, I'm using Package Manager to supply the packages to this environment. So I can install these dependencies. RStudio Workbench tells me, hey, these are missing, so let's install them really quickly.

Yeah, so another good question was about, you know, maybe I want to generate something that's, you know, outside of Connect. Like, it's great to have, you know, HTML content, but my boss or my colleague wants to see something in PowerPoint or see something in Word. So, yeah, absolutely. So we can actually open up, let's go back and do a new session while that's loading because I have my server, I can open up a new session.

So as a data scientist, I can be more productive because I have multiple sessions running on a bigger server, and there's more compute available, there's more RAM, there's more CPU, as opposed to my desktop, which is limited to whatever is there.

So the question was about maybe using PowerPoint. So, yes, you can generate PowerPoint documents with R Markdown. This one actually can create one based off of a template, and I can publish that to RStudio Connect. So let's publish it really quickly. So I'm going to publish that document to Connect so that, you know, people can go to Connect from their laptop and they can go look at it, and they can access the beautiful HTML content that I've created or the R Markdown report I've created, but it can also generate and attach a PowerPoint as an email.

So now I have a PDF report that's hosted on RStudio Connect. It's got some static graphics and other things, and if I email a copy of it, so let's just email this version to myself, it will not only attach the PDF, but it will also attach a PowerPoint file. So if we go back to the hosted evaluation, I'm in the home environment here, I can scroll down to web mail and open up that link, and now I'm in my inbox here, and you can see I've got a couple different emails, but this is the one that was just sent over.

And again, this is, you know, similar information. So not only is it able to embed a ggplot directly in the graphics inside the body of my email, but if you look at the attachments, it's also got a CSV file, so in terms of what was the output from my script, it's got that PDF report, and then it's generated this PowerPoint file. I can download that, and I can open it natively in PowerPoint because it's a PowerPoint file, and now I have this kind of branded, specific PowerPoint file that I've generated automatically with, you know, ggplot, with tables, with text, that I'm able to generate quickly into PowerPoint. So if I did need to meet someone in the spot where they want to be, where that's Microsoft Office, or if they're comfortable receiving HTML as an R Markdown document, you can do it multiple different ways.

Package versions and tidy models

As far as using specific package versions, so let's say that we're inside RStudio. If I install .packages and I do dplyr, it's going to install whatever is the latest available on CRAN or on Package Manager. So we're at dplyr 1.0.7. If I wanted to have a very specific package version, then I could use the renv package, and that actually allows me to set very specific environments that are recreatable. You don't have to use this. This is something that's optional, but this can allow you to not only install very specific packages, but capture all those packages at a specific version if, say, you were collaborating on a more complex document.

Another benefit of the Workbench environment is that not only do I have, like, all these different sessions I can open, but they are larger than my desktop in terms of my laptop may have, like, eight gigabytes of RAM and two cores. My server could have 256 gigabytes of RAM and 60 cores. You know, it can go as big as your budget is in terms of you can make a server much larger.

Per that example, let's say I have a background tuning script that I'm running. So this is some tidy models code. It's doing an SDM model, and it's doing grid search. So it's, you know, going across a bunch of resamples, going across a bunch of different parameters, and doing this grid tuning. So, of course, I could just run this, and it'll take a few minutes to execute. Or within RStudio, I can actually source this as a local job or as a launcher job.

A local job will execute in the same server environment that my, you know, session is running, but as a background. A launcher job would launch in a remote session if you had that attached. So, say, like a Kubernetes environment where it'll scale up to meet that need and scale back down.

Let's start it as a local job first. So when I start this up, it's going to do this background tuning, and it's going to spit out some information as it goes along. Importantly, this is happening in the background. So I can still, you know, I can still interact with my console. It's not locked up, but it's just telling me, hey, this background tuning, this long-running script I'm running interactively, is happening in the background.

So if I were to go here, it'll say succeeded at 10 37 a.m. That's my local time, and you'll notice that there's nothing in my environment. What I did was actually have it save the output out as a file. So when it was done, I don't have it, you know, cluttering up my environment. I actually did save completely in the background as well as a graphic looking at the different parameters and how they all fit. So just to show you that not only can you run it as a background job, but your side effects can be saved. So you can save the model out. You can save graphics. You can even, you know, have an R Markdown document in the background.

Publishing Python notebooks and Streamlit apps

So again, let's go to Jupyter real quick. I am far from an expert Python developer, but I do dabble in Python every now and then. Just as you'd expect, you know, if I have a Jupyter notebook and I want to get it onto Connect and I want to execute it on Connect, I have the same blue publish button. So I can publish this to RStudio Connect. It gives me a similar thing saying, do you want to publish with source code? So you can, you know, schedule it or rebuild it, or you just want to publish the HTML content from the notebook itself.

So let's go to content. Here's the document I just published, which again is just a fairly basic Python notebook, IPython notebook with matplotlib, some pandas graphics, and that nice seaborn swirl here. And just like I did with our markdown, I could schedule this to execute on a schedule. So again, for both R and Python, I can execute things as needed and evaluate them whenever I want them to be evaluated, whether that's daily, down to the minute, or even as infrequently as a year. So all that's able to be customized with the schedule.

For VS Code, again, some people are like, oh yeah, I really love VS Code, I want to learn more about it. Other people may not have used it before. It's a general purpose development environment, so not necessarily specific to data science, but can do things like JavaScript or Java or, you know, general web development, as well as using something like Python inside of it.

Now, more traditionally, within a VS Code environment, I could actually go into and edit, say, like, a Dash application or a Streamlit application. And here, again, as opposed to a Jupyter notebook where you're doing, you know, very specific cells and text and cells and text to go through things, this is a more traditional .py file, which is often more preferred if you're writing, like, an application like Dash or Streamlit, which are similar to Shiny applications but in Python.

So this is just kind of a bare-bones example taken from the Streamlit server, or from their examples. Let's go ahead and create a new terminal, and I'll also open up the README real quick. So the README has some information about the getting started, as well as some information about how I could publish this application.

So, number one, you do have to form a connection with the server, so you'll have to generate an API key on Connect, and you'll have to connect to that server. Publishing Streamlit applications is done through the rsconnect Python package. So if I were to do, like, rsconnect help, that will tell me a little bit about the rsconnect package and the different things it can do. And if I wanted to deploy this application, I can quickly copy this over, and I can say I want to publish this, you know, Streamlit application to Connect.

Again, it's going to be very quick, so it kind of goes very quickly, because it's using specific packages. And I can open up that application that I just published on RStudio Connect, and we'll let that load for a second. But this is a Streamlit application that we went from VS Code to Connect very, very quickly. And again, we can interact with it and, you know, get the graphic to change. Maybe I want to add Canada to here or something. And now I have Canada added to my graphic. So again, the ability to do things in R and Python.

Pins, environment variables, and Package Manager

As far as storing keys and secure things, that's a great call. So if I go to a different asset I've created, so a pen, so this is an R Markdown document that has a very specific beneficial side effect. So within Connect, I have this thing that's being executed, and I want to pin the code or pin the data to a file here on RStudio Connect. So within the VEARS pane, you see I have a secret called RStudio Connect Key, which is essentially the API key that allows me to interact with RStudio Connect programmatically. This could be something else, this could be database credentials, this could be a username, password, whatever you want it to be.

You could think of it as very similar to like the R environment that you use interactively inside RStudio Workbench. For the RStudio Connect key, I'm using that so I can publish pins. So this document that I have scheduled generates a nice table, which is great, you know, it looks at some NFL data and creates a beautiful table, that's great, but the more interesting thing that it's doing is taking that data and then saving it out as a pin to RStudio Connect. A pin is just any file or data set that you want to store on RStudio Connect. So again, rather than emailing back and forth CSVs or putting them into a share drive or something like that, you can use the pins package, and that will actually allow you to pull in data sets and share them amongst files or amongst colleagues.

If you wanted to do something else, you can also pin files like trained models. So let's say that long model training script that we ran with tidy models, I could actually save that as an RDS file, upload it as a pin onto RStudio Connect, and then it can be used downstream in other applications.

Now, the last part that I haven't really talked about that much was Package Manager. We've been, you know, alluding to it a little bit throughout. But what Package Manager is doing is, again, supplying R packages and Python packages to your environment in terms of the R packages themselves, like let's say dplyr. You can install it directly from CRAN, sure, but Package Manager is serving a R package binary in terms of it's pre-compiled and ready to go. So I can install it from Package Manager. It's going to be much faster, and I don't have to build it from source, which can lead to avoiding errors or anything else.

Package Manager also allows me to use specific versions. So I can install not only the latest from 1.0.6, but I can also install an older version from, say, 2015. So if I had a legacy project I was working on, and there was some breaking change I wanted to avoid, I can install this old package.

What I'm trying to show here is that it also shows you system dependencies in terms of for your admins, your IT admin. They're like, I don't know what R is, but what do I have to install to make it work? So for specific packages, they might have system prerequisites that have to be installed in environment, and you can install those in Connect and Workbench so that all the packages work as intended.

So let's say sf. Let's do the sf package. So simple features is a package for creating maps and geospatial analysis in R, and it's amazingly powerful, but it kind of attaches to a few different other system dependencies. So you can see that this one, when I go down to these install system prerequisites, it actually provides a few things that your admin needs to install. But rather than you having to find all these from scratch and be like, oh, I think you need to install gdol and some other stuff, your admin can just go in, copy this, and then install those globally for everyone on RStudio Connect or globally on RStudio Workbench.

IT and security considerations

So you've gone into RStudio team evaluation, you've created something amazing, you have these analyses or tools that you want to use, and you've gotten things off your laptop. That's the huge win there is getting things into a production environment, sharing it across organization, or embedding it in your other tools. The other hurdle that we didn't really mention, but because we're solving at the same time, is satisfying IT or your security team and their requirements.

So IT and security, they want to have very specific things. They're like, I don't know, maybe it's much about open source, they don't know about R or Python, maybe they're Linux admins, and that's all they know, and that's fine. They just want to know, can it integrate with my authentication? And the answer is yes. RStudio Workbench, RStudio Connect can integrate with single sign-on, so things like Okta, as well as things like PAM or LDAP or more traditional authentication mechanisms.

They want to know that your asset that you've created scales automatically. They don't want to have to go around creating environments for you and spinning things up and spinning things down. Connect actually handles spinning up new sessions to meet user demand as they visit the server, so it's handling some of the scaling for you.

They also want to know, okay, well, we're working with sensitive data, so I want to be able to audit all the things that occur. RStudio Workbench and Connect have auditing and monitoring of the server resources, metrics, as well as user activity, so that part can be mentioned as well.

And then a few different people asked about connections to databases in terms of, I wanted to actually see things from Postgres or Snowflake. Yes, you can connect to those and have them work on Workbench as well as RStudio Connect. So that is what RStudio team is providing, the ability to get things off your laptop into production while maintaining security, scalability, best practices. With the open source software you already know and already are using in R and Python and in a single platform that can be run on premise, on bare metal servers, or in your virtual private cloud of choice, so like AWS, GCP, Azure, etc.

Git-based deployment

A very good question came in in terms of version control, and that's a best practice, so I'm actually going to hop back out of here. We're going to go back into Workbench, or actually let's go into RStudio Connect, and I'm going to show you one more thing. So someone was saying, well I have something on version control, and I don't want to use push button deployment, or I don't want to use terminal deployment. Yes, I can import directly from Git, so I can use an example for today from GitHub, because that's where I have a lot of my code.

So let's just go to my GitHub, we'll grab one of the repositories, the RStudio team demo, and if I were to grab the URL for this, I've got a few different assets in here. I can paste this into RStudio Connect, and say which branch I want to use. Let's use the dev branch, because I haven't pushed it to production yet, and then it says, okay this is the asset that you can publish directly from Git. So I'll title that Git Portfolio Dashboard, and now I deploy this.

It's going through the same process of rebuilding the environment, but I did all of it through version control, in terms of it actually pulled it in directly, and this is the asset I just created in like 15 seconds, directly from version control. And part of the beauty here, is that because this was published from version control, Connect can actually say, I'm going to watch that repository, and I'll check for updates periodically. So this is kind of a lightweight version of, not quite continuous integration, but semi-continuous integration, if I'll use that term that way, and Kelly doesn't get mad at me. But the idea that Connect is now looking for that Git repository, and saying, hey let me pull in this Git repository. I will build it if I see changes, or I can force it to be built when I click update now.

And part of the beauty here, is that because this was published from version control, Connect can actually say, I'm going to watch that repository, and I'll check for updates periodically. So this is kind of a lightweight version of, not quite continuous integration, but semi-continuous integration, if I'll use that term that way, and Kelly doesn't get mad at me.

So I have the ability to publish with the push button deployment from RStudio, or Jupyter. I can also publish from the R console, or the terminal, and I can also publish from version control, or bring in something like continuous integration, or continuous deployment, if I wanted to go that route as well.

RStudio's open source mission

So in terms of closing and wrapping up here, in terms of who we are at RStudio, and what makes us a little bit different, again, a lot of the things we're building are actually free and open source software, and we absolutely want as many people as possible to use that. That's what the Tidyverse is, that's what R Markdown, and Shiny, and Tidy models, and all the different, you know, software that we're creating, and give away for free. These professional software kind of build upon some of that, and it allows us to give away a lot of the software that we create. So if you do need things like authentication, or you're trying to use Shiny in an organization, and your IT team says it's a non-starter unless we have authentication, or scalability, that's where something like RStudio Connect can come in and solve that problem specifically for you.

Specifically in terms of how our funding works, more than 50% of our engineering resources go to free and open source software. So basically half, and more than half, of our engineering efforts go to giving away software that's free. Our pro products fund this ongoing effort so that we can kind of achieve this mission of giving away and creating free and open source software. And additionally, we contribute to organizations like NumFocus to extend our impact in organizations like Python, where we're not necessarily developing as much today, maybe we do some more in the future, but where we want to contribute financially as well.

And lastly, something I'm really proud of is we're a public benefit corporation, meaning that all of our decisions have to be balanced with the best interest of the community, our customers, the internal employees, and shareholders. We're not just here to improve stock prices for some company, we're here to actually serve the community and do the best job that we can at that.

And lastly, something I'm really proud of is we're a public benefit corporation, meaning that all of our decisions have to be balanced with the best interest of the community, our customers, the internal employees, and shareholders. We're not just here to improve stock prices for some company, we're here to actually serve the community and do the best job that we can at that.

In terms of wrapping up as we're getting close to the top of the hour, if you did want to learn more, you can read all about Serious Data Science on our blog. Again, these slides, or I saw a slightly different version that I'd published is available at this link at the bottom that I'll post in the chat again. And that has links to everything I talked about within the slide deck. Trust radius reviews, you can book a live meeting or just send an email to us if you don't want to talk live, that's fine. And if you wanted to evaluate RStudio Team and do something similar to what I showed you today, you can fill out this form, we'll send you an evaluation as soon as possible.

So that's really it for today. I'll try and answer a few more questions as they come in, but we are at the top of the hour and I know some people may have only scheduled out the existing time, but I'll stick around for a few more minutes so that we can answer some more questions. Looks like Kelly, thank you so much for answering a lot of the questions as they came up, that was a huge help. So thank you, just wanted to get, again, Kelly O'Brien, the Product Manager for RStudio Connect, many thanks for her time today in terms of seeing what kind of questions you all have as community members and for helping answer a lot of those.

Q&A: database connections, long-running jobs, and licensing

There was a good question about database information for the, do the database connections need to be created on RStudio Workbench and Connect? So yes, because they're environments in terms of you could create an interactive environment on RStudio Workbench. So if I were to go into RStudio and go into my report portfolio, I can form a connection here and it'll let me, you know, form a connection with all these different databases, but that's interactive in terms of like I'm basically killing off that connection whenever I close my session. RStudio Connect is essentially like a service account where you're actually interacting with production code and so you're going to have to potentially even use a different login credentials. You might use a service account as opposed to like my account being Thomas Mock's account, I would actually use the like RStudio Connect service account to do that.

Can you close your browser and let your long-running analysis continue? Yes, you could set, you know, I think there was a meme on Twitter I saw about someone driving around in their car with a laptop and the screen open. If this session was going to take a long time, yes, you could run it as a background job and have the timeout be long so it'll stay open and these sessions have controllable timeouts so you can have some sessions that are forced to die off very quickly or last a bit longer. The other option and what a lot of customers actually do is for scripts that run that long, it may make more sense to run them via RStudio Connect. So again, while I can kick off relatively lightweight jobs here, like I'm starting a local job, I'll rerun this because it takes like two minutes. If it takes four or five hours, I could actually build out a workflow around it and have it be done by RStudio Connect. So now Connect will handle running the entire execution and then it will generate some type of output. Maybe it generates an R Markdown report talking about the model fit or it saves the model out as a pin or it generates an email with information about it once it's done.

What was the difference between Team Standard and Team Enterprise? So that's a good question in terms of if I were to go to RStudio Team Standard and we're live opening something up together. So the main difference in between Standard and Enterprise is how many servers you need. For smaller teams, they can basically, you use one server. Like I have Workbench and Connect and Package Manager. I only need one server for those products. For larger customers, we often see that they want to have multiple servers and that's where Enterprise is a better fit because you have as many servers as you want. So you could have like a pre-built server and then you could have like a pre-production, a dev, an alpha environment or staging as well as your production environment. Or you can do load balancing across multiple servers or have high availability. So if one server goes down, you have one to flip over to. So really Standard is where a lot of teams start in terms of if they just need one server for everything, that's more than fine. Enterprise is really just like I want to get as many servers as I want. I don't want to think about server installs. I can go that route as well.

So we're a few minutes over and looks like the questions have slowed down a little bit. So thank you so much for your time today. It was a pleasure to have this time with you and thanks for hanging around with me for an hour. Again, this video will be up on YouTube immediately after I end the call. So you'll be able to access it or re-watch components if I went a little bit quickly. If you have any questions, please reach out to RStudio. There's a bunch of different links that I put up in the slide deck where you can access some of the information I showed today. You can create an evaluation, email us, book a live call, whatever you want to do. We're here to help out and answer some questions.

There was one more question that came in. I definitely want to answer that. So our products do come with professional support that's built into the price. So if you do buy one of the products that comes with support for maintenance of the RStudio's workbench, RStudio Connect, or package manager, or team, whatever you're purchasing. You also get access to what we call customer success managers. That's where I used to be working before taking on this role. And that's more of like a colleague or a peer who knows a little bit about data science and can help you with like R or Python related questions and help get you in that direction. So again, thank you so much for your time. Have a great and safe weekend, and we'll see you next time. Thanks again.