R & Python playing nice, in Production (Claudia Penaloza, Continental Tires)

Transcript#

This transcript was generated automatically and may contain errors.

Hi, everybody. Thanks for having me and thanks for sticking around this long. There's one unfortunate thing that I just realized happens when you're the last speaker in a session of people talking about the same thing. Everybody has already spoiled my talk. So if you learn anything new, it probably isn't going to be about the R & Python playing nice part. Anyway.

So in the next 15 to 18 minutes, I hope, I will give you guys an introduction and a project overview. I will talk about how life tends to interfere with even the best plans and what strategies we use to get over those challenges that life threw at us in the middle of our project.

Package management in Python is extremely easy with UV.

Containerization and orchestration

Okay. The next step in this process is containerization, which I've already spoiled a little bit. We need to use these Docker images to be able to contain our code in isolated environments. And we created one for each language. So all of the Python code runs in a Docker image. Okay. Scratch that. We have— UV using this, right? So we have a Docker image with a base Python Docker image. And then we have a separate Docker image for our R code. And once this all comes together, the orchestration is actually deployed on a language-agnostic MLOps platform. It doesn't care what you're running on it. All of the different steps in the pipeline are configured. How the different steps interact with each other, how— you know, what kind of resources you need, which outputs go where, which inputs get taken and where, that is all configured on this MLOps platform.

So our final infrastructure looks something like this. This is a diagram of what the machine learning pipeline looks like. We have a lot of AWS stuff in here. ValoHi is the MLOps platform that we use, which is based on Kubernetes. We have the EC2 instances, which are the orange squares, and the S3 buckets, which are the green ones. And the different— the combination of Python and R is something like this. So, you know, some of these EC2s run Python code. Some of them run R code. And in the middle, we're just saving the outputs and grabbing them from the next node with whatever is coming next. The whole pipeline looks something like this. As I mentioned, those two databases on either side. And a Tableau dashboard at the end, which the tire developers are using.

Results and takeaways

So we were finally able to get to a deployed multilingual ML model that is nowadays used by over 100 developers a day. And we can give them overnight predictions instead of them having to wait about four months for a tire test. So how do you make R and Python play nice in production? Or at least how did we do it? We standardized the data exchange formats. We took a pragmatic approach to coding and code review. And here I would say anything that works for your team is what works. Containerization for consistent environments. And language agnostic orchestration. And yes, we can make R and Python work together and be productive. And we can leverage the benefits of both languages without having to compromise reliability.

And yes, we can make R and Python work together and be productive. And we can leverage the benefits of both languages without having to compromise reliability.

Q&A

Excellent. We have a question or two. Why different Docker images for R and Python?

Faster.

Yeah. I mean, there's really no need to load the R. I mean, I don't want to dis R. But the Docker image was about twice the size for R. It also had, like, we had everything that happened in the ML pipeline in the R Docker. We could have probably cut that up into ETL and maybe, you know, preprocessing and stuff like that. And then, like, the whole tidyverse gets loaded in one. But just having to load the whole tidyverse and tidy models and it gets pretty fat and heavy. So separate the Python from the R, and that makes both Dockers a little lighter and a little faster.

Okay. Are predictions on individual tires?

Yes. Yes, yes. And I'm going to elaborate a little bit. Right now we have a batch prediction for any tire that any developer has put together. We want to get to the point where they have an instantaneous prediction. Like, it's not a problem of our prediction speed. It's more of a platform inside the company problem that we haven't been able to give them an instant prediction.

Okay. All right. Let's thank Claudia and all the other speakers.

R & Python playing nice, in Production (Claudia Penaloza, Continental Tires) | posit::conf(2025)

Transcript#

About Continental Tires

The project and team changes

Rules of engagement

Deployment and environment management

Containerization and orchestration

Results and takeaways

Q&A

Featured software#

renv