Matching Tools to Titans: Tailoring Posit Workbench for Every Cloud

Transcript#

This transcript was generated automatically and may contain errors.

Great. It's great to be with you all. I'm James Blair. I work at Posit as a product manager for cloud integrations. So today we're going to be talking about different ways that Posit Workbench can operate in different cloud environments through some of the partnerships that we're establishing today.

Now, at Posit, we work really hard to make sure that Posit Workbench meets the needs of the modern data science developer. This includes giving you tools and environments that allow you to do the day-to-day workloads that are important for your work. Exploratory data analysis, model training and tuning, deploying models, managing things of that nature, exploring data visually with tools like ggplot and other things like that, building shiny applications. Coding and working in other environments like VS Code and Jupyter Notebooks are all things that are supported under the Posit Workbench platform.

Now, what's not often apparent to developers as they're working in this environment is the underlying infrastructure requirements that are sometimes complicated in order to support an enterprise product like Posit Workbench. To illustrate some of this complication, I want to take a moment to look at some of the documentation that we frequently share with customers when they're setting up Posit Workbench for the first time.

It almost reads like a choose-your-own-adventure script. You can install Posit Workbench as its own environment. It can be a standalone server. This is typically the simplest approach, often the approach most people breeze right past. You can also choose to use Posit Workbench in a load balance environment where I have multiple nodes configured behind a load balancer and users are assigned different nodes based on different rules that might exist. This can be more robust, but obviously adds additional complication to the setup process.

I could use Posit Workbench with an external resource manager like Kubernetes. Now I have Workbench set up. I have a separate Kubernetes environment set up. Workbench communicates with that environment. Sessions launch in that environment. This is a supported infrastructure. Or I could run the whole thing in Kubernetes. Workbench runs there. The sessions run there. Everything runs in Kubernetes. And then if I really want to make things interesting, I could also run and integrate with Slurm.

Now, some people look at this, some internal IT organizations look at this documentation and this is how they feel. They are salivating because they have the know-how. They have the expertise. They have the infrastructure. And it's almost a plug-and-play exercise to get Posit Workbench up and running. And we applaud those organizations.

However, I also recognize that lots of people look at this and they feel more like this. I'm overworked. In many cases, I may be underpaid and I don't have time to figure out what Kubernetes is, let alone figure out how to use it with this tool called Posit Workbench that if I'm in the IT organization, I probably have very limited familiarity with.

What we've noticed is that a lot of these administrators are starting to turn to the cloud to find solutions. They're looking for ready-made services or platforms that can alleviate some of this administrative burden and supply users with the experience they want without creating an undue maintenance overload for the IT professionals within the organization.

Today, we're going to talk about a number of different solutions that exist today and are coming in the near future that allow Posit Workbench and support Posit Workbench inside of these different cloud environments. We'll talk about some of the advantages, how they differ from one another, and provide some resources where you can learn more.

AWS and Amazon SageMaker

We'll start with AWS or Amazon Web Services. We've partnered with Amazon SageMaker so that it's possible for administrators to configure an Amazon SageMaker domain to have access to RStudio . In this process, an administrator goes in, they either create a new domain or they go to an existing domain that exists in their SageMaker environment and they make sure that there's a Posit license in place and they can configure access to Posit Workbench and RStudio within that environment.

Once that's been done, individual users can come into the SageMaker platform and can request RStudio sessions from within that platform. When users make this request, they're brought to the familiar Posit Workbench homepage, but there's a few key differences from what you might expect out of a traditional installation. One of the biggest things here is that users can request a specific compute instance type for their session to run in. This means that every user gets an isolated EC2 instance within AWS for their particular session.

This accomplishes two distinct things. One, users can request resources based on the workload they anticipate doing. If I have a huge data set and I plan on bringing it into memory and analyzing it, I know I'm going to need a lot of memory. So, I can make a choice when I start my session to have an instance that provides me with adequate memory for my analysis. On the other side, maybe I have an analysis that's going to require a lot of parallelization across multiple CPU cores. To improve efficiency, I can request a compute instance that has a high CPU count so that my workload finishes faster.

The other advantage of this architecture is that everybody's sessions run independent of one another. This is hugely advantageous if you're like me and you occasionally do something a little bit silly in RStudio and all of a sudden the entire server is locked up. I'm sure I'm not the only one who's tried to read a data set into memory that far exceeded the available memory in my environment. If that happens in a traditional single-server install, I've now brought the server down for everyone. I had to buy more than one round of drinks to make up for that.

Now, if you do the same thing here in the SageMaker environment, you need to reset your session, but other users are unaffected. If I do something in my session that causes me to exceed the available resources, I'll need to start over and make sure that my new session contains adequate resources, but I am not interrupting anyone else's workflow.

The advantage here, and this is true for all of these solutions that we talk about, is that this comes without additional IT administrative burden. The IT office does not need to worry about managing this environment. SageMaker handles the orchestration of the resources and everything behind the scenes.

The advantage here, and this is true for all of these solutions that we talk about, is that this comes without additional IT administrative burden. The IT office does not need to worry about managing this environment. SageMaker handles the orchestration of the resources and everything behind the scenes.

Matching Tools to Titans: Tailoring Posit Workbench for Every Cloud - posit::conf(2023)

Transcript#

AWS and Amazon SageMaker

Google Cloud Workstations

Databricks integration

Snowflake and Posit Cloud

Looking ahead

Q&A