
Connect on Kubernetes: Content-level Containerization - posit::conf(2023)
Presented by E. David Aja, not Kelly O'Briant Running Connect with off-host content execution in Kubernetes is very cool and allows you to enable some powerful and sophisticated workflows. The question is, do you really need it? How do you evaluate and decide? Let's have a candid conversation about whether Connect content execution on Kubernetes is right for you and your organization. Moving to Kubernetes will introduce complexity, so it's important to have a strong motivating reason for making the switch. This talk will introduce new Connect features that are made possible by content-level containerization. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Data science infrastructure for your org. Session Code: TALK-1116
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, my name is David Alja. I'm a member of the Solutions Engineering team at Posit and the world's worst Kelly O'Brien impersonator. And with that said, let's talk about Connect.
So I like to think about deploying things to Connect as a magic trick. Not magic. In this house, we respect the laws of physics and computers, large language models notwithstanding, are deterministic. But as a magic trick in the taxonomy proposed by a particular Christopher Nolan movie. A trick comes in three phases. The first is the pledge. I show you something you've seen before. I encourage you to inspect it. I tell you that it's ordinary. It's a shiny app that runs at extraordinary zoom on your laptop. Or alternatively, it's a shiny app built in Python, which I guess is a little bit more novel, that runs again on your laptop.
I encourage you to look at the shiny app. If it was cooler in here, I would remove my jacket. Or warmer in here, I would remove my jacket. There's nothing up my sleeves.
The second phase of the trick is called the turn. In the turn, you begin to suspect that something about this is not ordinary. And so that's the wrong app. Hooray for command history. But I'm going to deploy the shiny app. Or I could jump over to the IDE and publish this application. And so I'm showing you now, you know, a bunch of logs. Something is clearly happening. It's not clear what. But a bunch of logs does not really make a magic trick. I can't just make the app disappear. It has to go somewhere else.
And so that's where we get to the third phase of the trick. The hardest part. Where I make the app reappear on Connect. And we call that part the prestige. It's a great movie. But in this case, the trick is not actually that the app is running on Connect. The trick, and I'm typing super fast right now, is that the app is running in a container. And so now, like any magician, what I will do is I'm going to take you behind the trick and then we'll talk a little bit about whether this is a thing you should try at home.
The trick, and I'm typing super fast right now, is that the app is running in a container.
What Connect does
So to begin, let's start by talking about Connect. What is Connect? Connect, I think, I like to try to define things by what they do. And what Connect does is it drives down the cost of application publishing. I think people at this conference are probably pretty used to the idea that data scientists are people who in the course of doing their work builds web applications. But that's not a typical view that most people hold about one of the things that data science is. And so part of what Connect does is it makes it easy for data scientists to get work off their machines by reducing the demands on what it asks them for when they're trying to publish something.
So Connect's deployments do lots of work to inspect your local environment, gather up some information about what you have there, and then send that to the server where Connect reconstructs the environment for you. And Connect servers, and I have one of those architecture diagrams that James pulled up before, a single Connect server can run multiple versions of R, multiple versions of Python, multiple versions of Quarto. It can handle conflicting versions of packages for each of those runtimes. And it deploys the applications in an isolated and atomic way. So it makes it, again, it's trivial to publish a new thing. If you break it, you just publish again. You roll forward, you iterate, you make progress. And for most, many workflows, that's good enough.
When more complexity is required
There are, however, circumstances where more complexity is required. That might be because you're working with packages that are for some reason difficult to install. They might have complicated dependencies that are difficult for something running at the application layer to reconstruct. There might be conflicting package dependencies. You might have data that's stored somewhere that demands something else about the way you access that data. So the Linux file system is no longer enough. And so for all of those reasons, we've put a lot of work into making it possible to run Connect in Kubernetes.
We've decoupled where Connect runs and where applications run. Applications sit in their own container context, and Connect is responsible for spawning them and managing some other things. And we call this mode off-host execution. And so what I want to do now is I want to show you around some of the Connect interfaces that reflect this change so that you can understand what kinds of things are possible when Connect is running in this mode.
Connect interfaces for off-host execution
So I'll jump back to our shiny application here. And if you look at the right side, you'll see that below the settings where we can control the access to the application or the sub-path we serve it on, we have some additional configuration settings. And so in addition to controlling the, for example, the user executing the content, we might also provide a specific Kubernetes service account, which would have permissions to access particular data sources in a way that isn't going to sort of correspond to how we do that in the Linux file system. And that's going to make it easier to provide that fine-grained permission.
The other new sort of option you're seeing here is the ability to configure execution environments on a per-deployment basis. And so what you're seeing here is I pushed a container to the GitHub container registry that just has a version of Python in it, but I can change that environment to be something more complicated. It might be the case that, for example, I have an image that has R and Python in it. And if I'm thinking about making changes to my application, I can select a different image for subsequent execution. And I can also select that as a default environment image.
If I come back to the Connect home page, you'll see there's an additional tab that explains the execution environments that are available. And so in addition to the default images that Connect ships with in the Helm chart I used to provision this instance, I've added this Python 3.10 image and also an image that has a version of Quarto 1.4 in it. I am extremely excited about Quarto 1.4. And if you want to understand why, you should go to Carlos Scheidegger's talk.
But that's not all, right? So in addition to wanting to, say, constrain the environment that we're constructing as Connect is being built, one of the other things we might need is the ability to control the resources that are being consumed by the application while it runs. And so in addition to memory limits, which you can apply now in both local and off-host execution mode, you can also apply CPU limits. This means that if you have applications that have a way of spinning out, taking down servers, and forcing you to buy people beers, you can put a little bit more guardrails around that in a way that's going to make your application safer to use.
And then the last thing, and this is one of the newest pieces of functionality that we've introduced to Connect, is the ability to bypass certain elements of environment reconstruction. You can see here on both the R environment and Python environment management, we have the option to select a setting which gives us control over who has responsibility for environment construction. So I pushed a container that has just a Python image in it, and so I allowed Connect to build the content in my container. It's also the case that I could construct an environment that has all the dependencies that my application requires, and then push that container to Connect, and Connect doesn't have to actually do that environment restoration step. And so that's giving you, again, more fine-grained control over what exactly is executing and where.
So that's, I think, a quick overview of some of the things that we've changed about Connect to make it a better citizen in Kubernetes. And I think I'll stop there and take any questions that anyone has.
Q&A
Thank you, David. The first question I have here reads, can we also run Connect workloads in the Slurm cluster in the future to match Workbench?
So that's an excellent question. One thing I would say on the question of matching Workbench is that if you're running Workbench in an off-house execution mode, there's some documentation we're polishing on this now, but one of the things people often want to do is take the container image they've defined for Workbench and then pull that over to Connect and use it. It's not necessarily the fastest way to do that, but it gives you a very clear idea about what's executing here and here. And so that is definitely something you can do. Actually, we are still thinking about other venues for this distributed execution model. Let's chat later about your interest in Slurm.
Thank you. All right. The next one says, is there a way to speed up the launch of content of pods when they're open? Takes a few seconds longer than I'd like.
Some of that will, I think, yeah, I believe that an option where more of the, where if you have, essentially if you bake the content into the image, then you will bypass some of the steps that increase the startup time in off-host execution mode. So that might be one way to think about increasing the startup time if you're concerned about that.
Thank you. All right. On a single server instance, we'd normally patch the server, for example, apt-get, and the content would not be affected. Can I patch execution images without worry?
Can you patch execution images? I, that's an interesting, I suppose you could patch the image, but I wouldn't. I guess rebuild the image?
Yeah, I would just build a new, I would build, I would think about that as an opportunity to build and publish a new image. Yeah. Makes sense. Yep. All right. So do you find that some of your clients want more control of the resources that Connect apps or APIs consumed, similarly of how RStudio on SageMaker works?
So the, I think some of the settings that I showed on the, in the runtime panel give you a little bit more control over the maximum CPU and memory that applications will consume. So that's one limit you can impose. The lifecycle is a bit different to SageMaker because you're starting each, each SageMaker instance is essentially its own, its own EC2 instance, whereas these are typically pods that are running on a, on a node. And so I think it's slightly different, but you get to control things in either way.
Okay. Let's see here. So what is handling the workload resources for say a fractional CPU course or memory? Is Connect communicating with the Kubernetes engine or changing node pool specs?
So Connect embeds the service. If you've heard people talk about the launcher service, it's responsible for communicating with the Kubernetes API.
Makes sense. All right. Next one says, is there a case where one can use Kubernetes and open a Shiny server perhaps for local pod-based testing without ingress or egress consequences on Connect in a cloud?
I'm trying to parse that one too.
You certainly could. I think there's a, it's kind of a philosophy of testing question. And at that point, I'm not sure what you would be testing.
Can you elaborate on baking content to speed up content launches? Very interesting proposition.
I will, in the place where I get to dump notes for this talk, I will put the link to, there's a, or I guess I can just pull it up here. There is a reference in the Connect API cookbook that illustrates some of the workflow I'm describing. And so if you have, it's in the cookbook under custom execution environments. If you're very curious about that, you can email me at david.posit.co and I'll send it to you.
All right. So can I build arbitrary content in an image and then have it hosted, Posit Connect on Kubernetes?
I mean, arbitrary is a large word, but that is the idea.
That was a very brave response on your side.
I'm going to eat that one later.
They're going to send a recording of this talk to us when we say no.
No. So the, and I elided this because of the magic metaphor, but the environment restoration is happening on the NFS persistent volume that's attached to the cluster. And so the concept that you'll have seen in local execution modes of Connect about the cache, that mechanism is still preserved. So the content is ultimately associated with a runtime cache that you can sort of do surgery on as needed.
That's going to depend a lot on what, it's going to depend a lot on the environment, right? An application that doesn't have any complicated dependencies like the ones I deployed, you can see they were running locally on my laptop. I deployed them into a cluster, into a container. I didn't really have to do anything to make that happen. If you have more complicated dependencies then you might need to be a little bit more thoughtful about how you describe your desired deployment. Both the RSConnect package and the Python command line interface provide ways for you to select a specific image to execute the content as long as that image is known to the server. And so you do get a lot of flexibility there about either deciding to let Connect pick an image for you or specifying one if that's the desired outcome.
Thank you. Next question says, can we specify a base image for Connect to build a container from?
I would not say right now that Connect builds containers. So you can specify an image that you would like your R or Python environments to be restored into.
Let's see. The next one says, is there a planned support for CUDA, ONNX, runtime selection, any GPU allocations, and Posit Connect UI?
There is some work underway to make it easier to select GPU resources.
All right. So I think that's the questions. Thank you so much. Please join me in thanking David again.
