
RStudio Team Demo | Build & Share Data Products Like The World’s Leading Companies
You probably know that RStudio makes a free, open-source development environment for data scientists. It’s made with love and used by millions of people around the world. What you might not know is that we also make a professional platform, called RStudio Team. Learn How RStudio Team Can... - Help you scale your data science work - Seamlessly manage open-source data science environments - Automate repetitive tasks - And, rapidly share key insights and data science products securely to your entire organization. Timecodes 0:00 - Intro 4:18 - Hard truth of data science 10:22 - Serious Data Science 16:46 - Model management with R and Python 18:48 - Live Demo / RStudio Workbench 23:09 - RStudio support for Jupyter Notebooks 24:40 - Live Demo / RStudio Connect 28:01 - RStudio support for VS Code 30:05 - R and Python within RStudio 32:33 - Scale and share data science results 36:55 - Sharing previous versions of presentations 38:16 - Data Science team knowledge sharing 40:36 - Scheduling snd emailing data science content 43:55 - Live demo / RStudio Package Manager 48:09 - Data Science stories 49:37 - RStudio Team 52:59 - What makes RStudio different? 55:12 - Q/A - Learn More Leading organizations like NASA, Janssen Pharmaceuticals, The World Health Organization, financial institutions, government agencies and insurance organizations around the globe use RStudio’s professional products to tackle world-changing problems and we’re inviting you to learn how. You’ll learn how RStudio Team gives professional data science teams superpowers, with all of the bells and whistles that enterprises need. You can try RStudio Team free here: https://www.rstudio.com/products/team/evaluation2/ If you'd like to access presentation slides, sign up for future events, provide feedback and/or ask additional questions we've bundled everything together for you here: https://docs.google.com/document/d/1HGt7LSohhyxpCvETvVEFHugrdaSnTcZaXbI0jV5g9ok/edit?usp=sharing
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hey, everybody. Thanks for joining the RStudio YouTube Live for today. We're going to get started here in just a few minutes. I'm going to be going through quite a few different slides and we'll be having a lot of fun today. So we'll give some folks a couple of minutes.
Thank you so much for joining me today. My name is Tom Mock. I'm going to be your host for today. I'll be sharing some links and some comments in the chat. So if you do have any questions, feel free to post them there. Let me know how things are going and if you have any issues with the live stream.
So for today, we're going to be going over RStudio Team, which is one of the professional products and kind of the overall software suite that RStudio provides. You probably know RStudio from the self-named RStudio IDE or integrated development environment. And that's one of the core things we produce in addition to all the free open source software that we provide. Now we have to make money to kind of give away a lot of the things we create. So we also sell professional products that enhance some of the open source work that we produce and give away.
You may be familiar with what is the RStudio IDE. So this is, you know, just looks like any old RStudio IDE session. The benefit being that kind of for what I'm doing today, I'm actually running an RStudio workbench. So you'll notice that I'm working through a browser as opposed to my desktop.
So let's knit some slides so we can create a quick Schrodingen presentation that I can actually talk about for today. So I'll take this file and let's confirm that I'm working on the latest one. Okay. So this was created about 20 seconds ago. Great. Let's take that file and we're just going to deploy it to RStudio Connect.
So in about five seconds, I went from working in my environment to having this hosted on RStudio Connect. And now I have a URL that anyone can access anywhere in the world. So these slides are now up at Colorado.RStudio.com.
The hard truth of data science
Part of what we're covering today is the hard truth that data science is really hard and it can be challenging to kind of get value from some of the work you're doing or get, you know, executive buy-in or business value from what you're doing. So in many cases, many data science teams actually fail to live up to what they promise or what they want to deliver because there's these challenges or these kind of momentum that they have to fight against.
So first off, data science teams find it difficult to create insights or impact decision-making or find it difficult to create, maintain, and improve their data products, their models, their APIs, their applications over time. There's a lot of different ways that this shows up in terms of it could be difficult to recruit data scientists because you're using a very specific proprietary tool or you don't have all your data connected. For decision-making, maybe your stakeholders just don't understand the insights or there's a long, you know, duration between what if this happens or what if this happens and the actual answer to that question.
It's also often difficult for teams to build on previous work for new use cases. They basically have to reinvent the wheel every time rather than kind of building off previous work. Also, they might have analysis constraints or delays imposed by some of those proprietary tools. This can lead to slow iteration, which hinders alignment between the business unit and the data science team, or even irrelevant or outdated insights because things have been slowed down.
Additionally, this can lead to insights become obsolete or difficult to reproduce. Again, the reproducibility kind of tenet of RStudio is true for both academia and for business settings. And then lastly, siloed teams often lead to redundant work and there's lack of collaboration with too much time just spent like maintaining the tools or the difficulty of maintaining data science environments. Lastly, you know, teams can't often self-deploy. So they have to like go ask IT, can you take this and run with it or hand it off to a new team? And there's this disconnect between what you're actually creating and getting it into decision-making or getting it into production.
Additionally, this can lead to insights become obsolete or difficult to reproduce. Again, the reproducibility kind of tenet of RStudio is true for both academia and for business settings.
Serious data science: open source, code-first, centralized
So part of what we try and approach this from is here at RStudio, we try and solve these problems in terms of creating insights, impact and decision-making, maintain and improve over time things that are difficult by focusing on our three kind of core tenets. Open source work, a code-first mentality, and centralized or cloud-based kind of data science environments and production.
So open source basically means widely used. It eases recruiting, retention and training. You have an entire massive community in R and Python creating content that you can learn from, as well as libraries and different code examples that you can adapt. Code-first meaning it's flexible. You can actually craft it into exactly what you want. There's not this black box mentality where you just kind of plug something in and get something back. You understand the process. And then centralized or cloud-based by reducing the necessary work and enhancing collaboration, again, I'm working from a browser so I can access my work from anywhere and collaborate with my colleagues even though we're all remote in this time.
Open source also means things like comprehensive. So based on community contributions, there's tens of thousands of R packages on CRAN and tens of thousands of Python libraries across PyPI or other locations. Code-first also means you can iterate quickly and update quickly and you can adapt the code and understand by just changing small portions. And then for centralized work, your deployment provides stakeholders self-service access. So you can actually say here's the data product I've created or the model that we're using and have them actually interact with that as well as have it work against, say, your website in production.
And then lastly, open source means interoperable. So you can break down some of these analytic silos. You're not stuck with a specific vendor. You can move between clouds or even move between products because you own the kind of data science code that you've written there in terms of you're able to adapt it. Code-first meaning reusable and fully extensible. So again, you can diff it. You can put into version control and check all your changes and see what did we do two years ago versus what we're doing today. And lastly, package management is something that we're very keen on to support, again, reproducibility and easing the administration of open source data science in your enterprise.
So kind of summarizing this down, we've talked a little bit about these different slides. This serious data science concept is summarized as open source, code-first, centralized computation, or a cloud environment. The leading languages that we're focused on is R and Python. That's what we see the most of the data science teams using today, and that's where we've built a lot of our tooling around.
And that's where RStudio Team comes into play. So RStudio Team is a suite of all three of our professional products. RStudio Workbench, which is basically the RStudio IDE running in a server, as well as VS Code and Jupyter for the Python users. RStudio Connect for sharing insights with decision makers via web apps, emails, APIs that are working production or other different assets. And then RStudio Package Manager, which is really the backbone between those two products, controlling, managing packages that data science need to create and share their insights.
Bridging the gap between data science work and outcomes
What we often see is this massive chasm or this gap between the hard work that the data science teams are doing and the actual outcomes of that work. So either influencing decision makers or informing them or making decisions, as well as automated decisions or things working, models in production, things working on your website, things making decisions automatically within your organization or informing your organization.
Data science teams are creating some type of insights or sharing insights to impact decision making, whether this is again, interactive web applications, reports, or APIs or other assets that are making automated decision through machine learning.
These data science teams are probably wanting to use a data science workbench where they can choose to work in their primary language. Maybe it's R, maybe it's Python, maybe it's both. They want to have that flexibility of having all this available, ready to go in one environment. So they're building things in R and Python. These assets need to live somewhere. There's tailored applications that actually have a front end so the decision makers can go and interact with it. There's reports that need to go out kind of automatically.
There's APIs that kind of, again, can work against the website or other, you know, software languages like Java or Scala or anything else. So these need to be published to a deployment server, basically somewhere to host all these data science products you're interacting with or creating. Things that exist on this deployment server can then be delivered, again, by web apps and emails and reports that inform decision makers when they need it. And they can actually have self-service applications so they can kind of understand what you're doing in your data science role.
The last part of this puzzle is package management in terms of R and Python or open source libraries, which is their strength. But you also need to kind of manage those different environments because the open source landscape is changing over time. So package management is important for easing maintenance as well as reproducibility of your analyses over time.
Now, if we overlay exactly kind of what we provide here at RStudio, this is how it fits into the world. We do have a workbench environment in RStudio Workbench that allows you to analyze data in R and in Python. We have RStudio Connect that can host all the different R products as well as many Python products. So APIs in R and in Python via Plumber and Flask. Pins for datasets in R. Shiny applications with interactive front ends. R Markdown for reports, websites, or even entire emails and other things. As well as Jupyter, Streamlit, Dash, FastAPI, Bokeh, and Flask.
Real-life end-to-end example
Let's dive real quickly into a real life example. So this is an example that lives at solutions.rstudio.com, BytePredict. And this is basically an end-to-end example of the exact workflow I showed before. But we've now replaced all those little images with actual data science products.
So the first part of your data science workflow might be data import, cleaning, fitting a model, training a model, you know, all the different things that you're doing in your day-to-day work. This can be done inside R Markdown. And with RStudio Connect, you can host that R Markdown so you can reference in the future or even schedule it to be executed on some type of schedule. So you might do nightly model training or batch training that goes against a database and then saves back the model somewhere.
You can also save existing data sets. So say you do have a subset that you want to pull into an application. You can also extract specific data sets and save them as a pin or a data set living on RStudio Connect. RStudio Connect can connect to your databases. So you can have a live database connection against either a production database or one specifically tailored just for your web applications.
The models will be served up as an API. So your model is trained, it's saved, and then it's actually put into production as an API via the Plumber package. So this will then serve both a front end that's interactive in terms of your business users or your decision makers can go to a Shiny application and interact with it and see how does the model work and they can change different parameters and see how it changes. But it's also working in production in terms of doing the predictions for you on your site.
RStudio Workbench demo
So the kind of the takeaway from the initial part of the presentation was I can take these slides that I needed to rebuild and send them essentially instantly to RStudio Connect. So that ability to quickly iterate is very powerful. You might notice that in this RStudio environment, I've got a couple new things that aren't available in RStudio Desktop because I'm running in the cloud and because I'm running RStudio Workbench as opposed to the open source desktop version.
Number one, I'm authenticated. So for your IT team or for security reasons, I'm authenticated through single sign-on. So I have enterprise-grade authentication, basically making sure that any of the data that I'm working with is secure and that all the different proprietary things I'm working on are protected. I also have the ability to change my version of R. So I can switch from R version 4.0 to the latest at 4.1.1.
The other benefit is because I'm working in the cloud or I'm working outside of my desktop is I'm no longer tied to a specific kind of compute environment. So sure, I've got a powerful laptop, but my cloud environment is almost always going to be much, much more powerful.
So specifically, you might have, say, like an AWS instance or an Azure or, you know, Google Cloud, whatever different kind of cloud provider you want to use is supported, and you can actually have support for things like Kubernetes for additional scaling. What that basically means is that I can actually, from within my session, run an additional background script against the Kubernetes cluster and have that scale up as needed.
You might think that's kind of silly. It's like, okay, I don't want to run an upload script as a background job, but what about some grid or model tuning? So this is a quick example of an end-to-end kind of long grid search that's going across a bunch of different hyperparameters, fitting a model and saving it out. Of course, I could run this in my existing environment, but that's going to lock up my console. I'm not able to kind of continue working. So what I can do with RStudio Workbench is start a launcher job.
It automatically pulls in the script I'm working on. So this tuning script that's probably going to run for a while. I'll set it to the maximum of three CPUs and eight gigs of memory, and then I'll start that. And now this is going to run in a background environment, and it's going to return back examples as it goes along. I'm still able to use my primary environment, so I can still do my math or do any exploratory data analysis or plotting, but I have this launcher job that's going to background.
Jupyter notebooks and VS Code in RStudio Workbench
So RStudio Team, RStudio IDE here, this is where I just was. So I backed out to the kind of home page or landing page, which has a few different projects I'm working on, as well as Jupyter Notebook and a VS Code session. Again, because we're trying to support both R and Python workflows or a mix of those workflows, we also have support for things like Jupyter Notebooks.
So I can open up a Jupyter Notebook inside RStudio Workbench with the exact same authentication. I still have my home directory that is embedded here. So all the files that are available to me in RStudio and R are also available to me in Jupyter or VS Code.
Again, I didn't have to set anything else up. I just am working from inside RStudio Workbench. I now have a Jupyter Notebook, and I'm running Python code in here. So here is the output from just reading in a quick data set and getting a summary of the different columns. I can create a quick graphic in Matplotlib and look at that. And then I can get the raw data as a table here, all this in kind of an interactive notebook that I wanted to use, all behind my authentication.
Now, when I mentioned authentication, I said single sign-on, but we support a lot more than that. We also support things like LDAP or SAML, other different Active Directory or Azure Active Directory support. So again, most of the enterprise-grade authentication you might use in your organization are supported here. And I just have to log in one time and then have access to all these different environments.
I can also take this Jupyter Notebook and publish it to RStudio Connect. So importantly, you can publish the finished document or the source document. So both in R Markdown and in Jupyter, you can publish these documents and actually have them rerun on a schedule or just post the finished document so that you can actually interact with them and rerun them in the future.
If I go to RStudio Connect, that's where all my content is going to live. So we can actually let that pop up in a second. We've got a lot of cool content in RStudio Connect, so it'll take a little bit to load, and we can look at the content I own. So right here, this is the demo or the actual slides I was using for today, and then we can see the Jupyter Notebook that I was actually working on published to RStudio Connect as well. And again, very quickly and just under a minute, I was able to get the notebook I was working on interactively onto RStudio Connect, and now I can share it with specific users in my enterprise.
While we're here inside RStudio Connect and looking at a notebook, you'll notice that it defaults to opening up the access portal. So with regards to authentication, this basically means once I'm authenticated in, I can do things, but I also want to open up access to specific users or to specific, you know, within my entire company. So it defaults to basically only I can see the content that I've published, which is a good idea. Like maybe you don't want to release it to the whole company because it has sensitive data. I can add specific people, like maybe I want to add my colleague Alex who can see this document because I need his review on it, or I need the entire solutions team to look at this and give me feedback on the document.
Of course, if you just wanted to have it available to anyone in the org, you could always save it and make it where it's available to all users, login required. Basically, as long as they authenticate, they can access it. Or for what I did with the slides so that y'all could actually see it, I made it where anyone in the world could see it. So I made it anyone can see it, no login required.
So we showed a quick Jupyter Notebook. It's interactive. We published it to Connect, and we were able to kind of control the authentication around it. But let's go one step further. So let's say that we want to, you know, we're not wanting to work inside Jupyter Notebook. We actually want to work in, say, VS Code, as we're a Python developer, and we want kind of a full-blown IDE as opposed to a specific notebook.
So I've logged into VS Code. Again, it's using my existing authentication. So I've logged into RStudio Workbench, and then I can hop directly into VS Code. It's got, again, my whole home directory. So all the files that are available in R are available in VS Code here as well.
While that's loading and connecting the different kernel, it's connecting to Python 3.7.5. It's basically making a REPL, or a read-eval print loop, basically saying, like, I can interact with Python live. So it has to do that for the first time when we set up the environment. Once that's set up, it'll go pretty quickly.
And because we've embedded things like Jupyter and JupyterLab and VS Code inside of here, you can fully customize it. So you can extend it with the extensions you'd like, or other kind of integrations that you want to show. And I can always go back to RStudio if I want to get out of here and go back to R.
Using R and Python together
Still within RStudio, and we've published some Python products, we've published some R products. Let's show a quick example of using R and Python inside RStudio. So you may not know this, but you can actually use R and Python chunks inside RStudio. So this will actually use a library called reticulate, which is a kind of a play on words for reticulated Python. This allows you to call Python code from R.
So I'll run this. It's got my Python environment running. And then inside of our Markdown notebook, I can get the similar output to what we are seeing inside our Jupyter notebook. We have the exact same kind of table that we're doing with year. And again, I can change this from year to 365 days and re-execute the whole thing.
Inside RStudio, you can actually see both R and Python objects. So the R is just showing the connection to a Postgres database, but Python actually has the prices data frame, and you can explore that here.
Now, this might be a little bit kind of overwhelming. It's like all these different things going on. Again, RStudio Workbench is providing this front end to all these different environments. So RStudio Workbench provides RStudio, the ID you know and love and that you're very comfortable in. If you're a Python user, maybe you prefer Jupyter notebooks or VS code, and that's fine. We support those as well.
Now, the data products you create in terms of R Markdown reports, Jupyter notebooks, Flask APIs, Plumber APIs, Shiny apps, Dash apps, Bokeh, Streamlit, all these different things you're creating in R and Python, those all get published to RStudio Connect. So you're doing all of your coding initially in Workbench. Once you've created something, then you host and publish it on RStudio Connect.
RStudio Connect: sharing, scheduling, and scaling
Let's open up my portfolio dashboard. So this is actually a Shiny application running on RStudio Connect. So this might look like most Shiny apps you've seen. It's got some different parameters on the side you can look at. It's got some built-in interactivity through, I believe, the dy-graphs R package. Or no, this is actually Plotly. So you can actually see the hover on different bars. And then you can change parameters, and the graphics will update in place based on the changes you're making to Shiny.
With all the interactive web apps, so if you think of like Shiny in R or Dash and Streamlit in Python, those are actually server-based products. So they actually have a R or a Python runtime behind it. So what RStudio Connect provides is, as a data scientist, you probably don't want to worry, you know, figure out how do I run a Linux box? How do I scale things automatically? How do I set up, you know, parallel work or multiple R or Python processes? Like that's not necessarily where you want to spend your time. That's what RStudio Connect does for you.
So it provides the ability to scale up the, you know, scale up the actual server or the actual compute environment as more users come to visit it. So if we read this, this basically says this Shiny application can spin up three separate R processes that can each serve 20 connections. So this Shiny application just out of the gate can serve 60 people. Now I could change these parameters and make it serve 100 or 1,000, or in some cases we have examples of like tens of thousands of users that could be potentially supported.
So it provides the ability to scale up the, you know, scale up the actual server or the actual compute environment as more users come to visit it. So this Shiny application just out of the gate can serve 60 people. Now I could change these parameters and make it serve 100 or 1,000, or in some cases we have examples of like tens of thousands of users that could be potentially supported.
That's going to be based around how big your server is, but you as a data scientist don't have to worry about how does this actually scale in terms of figuring out all the different components. You have the ability to just publish your Shiny application and then change in these parameters if you need to.
Let's go to a quick Dash application and let's do this Dash application for my colleague David. So this is just one of the examples from the Dash team in terms of like their hello world examples, but a rich one that they have. And again, I have the ability to change these parameters as I see fit. So as a data scientist, I can publish my application. I can change different things and it makes the graphic change on its own. But I don't have to worry about how does Linux work or how does authentication work or how does scaling of an environment work. I just get to create the data science content that I want to put out there.
Here's a Streamlit example. Streamlit's very similar to like a Shiny-based R Markdown in terms of they're kind of like a notebook style setup or a linear flow, but the same idea in that you change a parameter and then the code is re-executed to regenerate something. In this case, showing an output or a graphic, but this is running Python on the back end as well.
The other benefit in terms of reproducibility is let's see if, let's go back to one of mine. So I have a presentation on production databases. We can look that I actually have multiple versions of this presentation. So I publish an older one at the very end of September and actually go back to a previous version. So I can maintain or I can overwrite. So let's say I have a presentation that I'm building or R Markdown report or a Jupyter report I'm building. Connect will handle the history of that report in terms of showing you like, oh, I can look at older versions of the same thing. So I can always look at the latest if I wanted to, but I could roll back to an older version if I wanted to show that to someone else.
This is an R Markdown report hosted on RStudio Connect that aggregates multiple data science things within an entire project. So if we scroll down, this is that end-to-end workflow I showed in the slides from earlier, which is bring some data in, creating some datasets that are saved to RStudio Connect as a pin, taking a model and saving it onto RStudio Connect and then serving that model up as an API that's interacting with Shiny applications or a website.
So this aggregates all those different components. So as a data scientist, sure, I'm probably going to have some of this on version control and that's where I'm going to be like actually writing code. But if I want to just go interact with the things in production, this shows me all the different components of this very complex data science task.
RStudio Connect has support for email servers. So if a parameter is met or is not met, it can actually send you an email and inline in that email embed ggplot or a table or attach a CSV or attach a presentation or attach a report. Whatever you want to do in R, you can kind of build that into a presentation and send it along.
Let's dive real quick though in terms of one of the things that resonates for me and what I use a lot is scheduling of content. I often have tasks where, you know, it's something I have to do all the time and I want it to rerun on a schedule. I don't want to have to manually pull from the database, click run, and then publish to connect. Like that sounds like a lot of me clicking a lot of buttons.
I can actually schedule this to run every single day. So built into Connect, I can say schedule this output, rerun this exact code at this time every day, and publish the output. So it's going to go through, and if we look at the history, there's many, many different versions of this report because it's been running every single day. So this report runs every day, does some type of cleaning or automate some type of model training step that I don't have to do manually. So now I can go spend my time on other things that are more useful of my time rather than clicking run every day or rerunning something manually every day.
And importantly, because it's on RStudio Connect, the package environment is stable. So while my dev environment, I can install new packages and I can change my version of R, once I publish something to RStudio Connect, it's stable in terms of it's living there. It's got its own little environment that's controlled with a very specific version of R, very specific version of all these different packages that is static and it's controlled. So this report has been running for, I think, over a year. And I have other reports that have been running on RStudio Connect since I've joined RStudio over three years ago. So it's nice to be able to put something in production and just let it run, even though my dev environment where I'm writing new code has changed or the actual outside world has changed a bit.
RStudio Package Manager
So a lot of cool things I could talk about RStudio Connect basically all day, but I do want to talk a little bit about RStudio Package Manager because it's kind of the unsung hero behind all of this. So the amazing part of the R and Python ecosystems is not only do they have an amazing kind of core language and base packaging available within them, but the package ecosystem that extends the capabilities. So again, there's tens of thousands of packages available, all of which can have many different versions. So from an IT perspective, they get nervous when they hear open source sometimes because they're like, oh, you know, I can't control it. How do I manage this?
Package Manager allows you to store an entire copy of CRAN behind your firewall air gapped on your environment. So let's say for ggplot, for example, which I pulled up here, our version of Package Manager internally that we're using has the latest version as well as the archive versions going back to 2007. So if I needed to, if I had a very old script, I could install a binary pre-compiled version of this library from 2007 and install it.
That binary component is really important. Inside Linux, packages actually have to be built, but we provide binaries that are pre-compiled. So this means that rather than it taking like an hour to install a package, it's almost instantaneous because you're just downloading the package and installing it rather than compiling it from source. So that's another benefit that Package Manager provides.
You can also see here on the left, we have different repositories or collections of packages. So this basically means we're able to curate specific aggregations of packages and what they look like. So I can have internal only for packages that we've written inside like RStudio that aren't public or for your organization, packages that you're writing to do very specific things that you don't necessarily want to make fully available and you're just using internally.
And then Package Manager also supports PyPI. So you can make a copy of PyPI and let's see if we can get pandas. And we can get pandas. We can get the latest version of 1.3.3 as well as older versions going back to older releases. So again, both R and Python support.
Wrapping up: customer stories and RStudio's mission
So while I'm really excited about RStudio Team and all the different amazing things it can do, we often you want to hear from what your peers think about this. So if you go to our customer stories pane, which again is linked here, or you can go to rstudio.com slash about slash customer stories, you can read about what other data science teams have done with our products, basically how they've used RStudio to harness the full capabilities of their data science team, or streamline their deployment process, or merge software engineering and data science without making it painful for everyone to interact with it.
People talking about a mix of R and Python on RStudio, or how RStudio products are great for their technical teams and their business team members. Integration of both languages helps them out a lot in terms of being able to be productive. And a lot of people use RStudio Connect as their like one-stop shop for publishing. So documents, apps, APIs, you have a clean professional interface where you can work with all your different clients and stakeholders internally, along with your authentication and along with all of your kind of professional database products.
So if we think about RStudio Team, it's really a modular platform. So you can always choose one of these if that's all you needed. Some teams only have Workbench, some teams only have Connect, some teams have Connect and Package Manager. If you get them all together, there's actually a bundle discount, but you can always purchase them individually if that was the only data product you needed.
But importantly, this is complementing your existing analytic investments. Every organization has different needs. Every organization has all sorts of different data products or kind of integrations that they want to bring to the table. So your corporate data, all the different databases you're using, things like Spark or distributed computing, basically any database that you can connect to through ODBC and R and Python, you can pull into RStudio Workbench or RStudio Connect.
Now, in terms of closing out part of the presentation for today and something that I'm, again, very passionate about is what makes RStudio different as a company? Our mission and kind of our purpose for existing is actually creating free and open source software for data science, scientific research, and technical communication. Basically, we want to make sure that we allow anyone with access to computer to participate freely in a data-centric global economy. We want to enhance the production and consumption of knowledge, and we want to facilitate collaboration and reproducible research and science education industry.
So this is why we create so much open source software and why more than 50% of all of our engineering resources go to free and open source software. Things like all the tidyverse or the R packages we release, some of the new support that we're doing for Python in terms of contributing either financially or with code to other Python products. So overall, revenue from our products goes into funding our open source so that we can essentially achieve this mission.
So this is why we create so much open source software and why more than 50% of all of our engineering resources go to free and open source software.
Additionally, we're a public benefit corp. So this was announced at the 2020 RStudio Conf. So if you haven't seen that video, I highly recommend going and looking at Keynote. JJ Allaire, our CEO and founder, basically helped convert RStudio from an incorporated organization to a B corporation or a public benefit corporation. So this basically means that our open source mission is codified or built into our company charter. So any decisions we make must balance the best interest of our community, customers, employees, and the different shareholders.
So I'll stick around for some more questions. But if you just want to kind of read some links or do some things, you can read all about serious data science and RStudio Team with these links. Again, this presentation is available at colorado.rstudio.com slash RSC slash RStudio dash team press. If you want to know more about what actual RStudio Team users think, you can read our reviews at Trust Radius. Those are all the public reviews of our product from customers. If you do want to chat and get specific questions answered, or you wanted to talk about something and you didn't feel comfortable asking in a group setting, you can always book a live meeting with us and we can connect you with one of our experts internally and we can talk about whatever you need to. Again, no strings attached, just let's have a conversation.
And then lastly, maybe you don't even want to have a meeting. You just want to evaluate the product. So you want to be like, okay, this looked cool, but I want to actually try it out myself. You actually go and evaluate RStudio Team today in a hosted environment. You can go actually request this and try it out in the cloud. No strings attached, no purchase necessary and try things out.
Does RStudio Connect still require root-level privileges to run? So RStudio Connect, if you're talking about running Connect inside a Docker container, it still requires root-level privileges. But if you want to talk further about that, please do reach out. We have some very exciting kind of work happening in that area. So if you're interested in kind of talking a bit deeper about root-level privileges, please open up in chat to us and we can talk about it.
So there's a question. I don't want to steal the thunder. But we've had a couple questions about Tableau. So let's talk a little bit about that. Yes. In short, we do have integrations. There's the Shiny Tableau package, which is a way of embedding Shiny into RStudio Connect as well as interacting with Tableau. The Plumber Tableau package for R or the FastAPI Tableau package for Python actually allow you to directly create RESTful APIs in R or in Python that directly connect with Tableau via the Tableau analytics extensions. So this is kind of a first class citizen of R and Python running within Tableau. And again, trying to merge those ideas of data science teams and business intelligence teams working happily together as opposed to being at odds with not being able to interact together and not being able to work together cleanly.
Thank you again so, so much for joining me today. It's always a pleasure to kind of connect with the community. Thanks for taking the hour out of your day to hang out with me for a while. Definitely if you want to stick around for other things, our data science hangouts are happening, and those are great for more kind of informal discussions or just talking a bit more about data science and being a data science practitioner. But thank you so much. And if you have further questions, please do reach out to us here at RStudio. Always happy to chat. Other than that, have a great weekend, stay safe, and thank you for your time.
