Resources

Extending the horizons of R with Rust (Andrés Quintero, ixpantia) | posit::conf(2025)

Extending the horizons of R with Rust Speaker(s): Andrés Quintero Abstract: Data volumes have skyrocketed for years, outpacing advances in hardware. When R users hit performance bottlenecks, the traditional remedy has been to reach for C++ and include it in their R code using Rccp. In the last few years Rust has emerged as a modern, high-performance alternative for extending R. In this talk, we’ll explore why Rust is a natural fit for data teams—from its robust safety to its concurrency advantages. You’ll also see real-world case studies of how organizations are leveraging Rust and R together to tackle large-scale, compute-intensive challenges. Join us to learn how Rust is expanding the horizons of what’s possible in R, some tips on extending R with Rust, and how your team can benefit from these new possibilities. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Okay, hello everyone. My name is Andrés Quintero. I am a Colombian data professional, and today I'm gonna give you a little presentation called Extending the Horizons of R with Rust.

So, before I get into the presentation, I work at ixpantia. We help organizations become better through artificial intelligence, and we do it always with an ethos of sharing knowledge. So we try to share everything we do, and this presentation is part of that.

So, before I get into the presentation, I wanna answer a very big question, what is Rust? And this is a very complex question to ask. So I'm gonna go to a little bit of an easier question. Maybe why Rust? And more specifically, why Rust to me? So why is Rust important to Andrés? Why is this guy over here talking to us about this?

Why Rust matters: a Colombian healthcare story

Well, I'm from a beautiful nation called Colombia, as I mentioned a little bit earlier, and if you think of Colombia, you might think of a lot of things. You might think of, ah, the very cool beaches in Cartagena, or like the parties in Medellín. I don't know what you guys imagine, but this might be some of the images that come to mind.

But the reality is that most of Colombia looks like this. If you've ever been to Colombia, I don't know how many of you guys have been to Colombia. It looks mostly like this, right? And this sort of terrain leads to some very particular challenges, specifically in healthcare.

So, Colombian healthcare is very particular. First, it has to mostly go for rural areas. So we need to serve patients in rural areas, which is very, very expensive. Also, resources are very limited. We are a developing nation. Also, Colombia has a lot of people, has over 50 million people. That's a lot of people, a lot of patients, and those patients are very well being taken care of. Like, 99% of people are covered by the Colombian healthcare system, which a lot of healthcare systems can't say the same. So it's 99% of 50 million people.

And I started my career working in data science and healthcare in Colombia, and I did a lot of things, mainly data enrichment pipelines. Also, I worked in risk and fraud detection, so an invoice comes in. Is it fraudulent? Is it something that we should pay for? Is it something that not? Again, the resources are quite scarce, so if we can avoid paying for invoices, then perhaps we're saving a lot of people's lives. Also, these have to be done in real time, right? If you get an invoice, you gotta pay it very soon, right? You don't have like a year to get some data, analyze it, and then in a year, decide if you pay it or not. This had to be done almost in real time.

Also, this had to be done on-site. So this was about 2018, 2019. Back in those days, healthcare and cloud didn't mix very well. I think they still don't mix very well. This was before the pandemic, so if it doesn't mix very well now, it didn't back then. So this was the server that I had to run all of my data pipelines and everything on. That's the server. If I reach the resource limit of that server, too bad, it has to run in that. And I wrote also, well, a very limited budget, again, especially in terms of hardware. And all of this was written in our favorite programming language, R.

It was all written in R, which, very lovely programming language. We all here love R. But the truth is that this did not scale, right? So I was trying to run way too much on too little hardware. The hardware wasn't even that bad, but it was just a lot of things trying to run really quickly, and things did not scale. So what was the answer? How did we fix this problem? How did we make it scale? Of course, the answer is Rust. That's why I'm giving this presentation.

What is Rust?

So that's why Rust is important to me and why I love it so much. So now we can go back to the original question. What is Rust? And Rust is, I could start saying it's a programming language based on whatever. That doesn't really matter. Rust is a programming language empowering everyone to build reliable and efficient software. That is its main goal, and that's its definition. This is according to the website.

So how does Rust do this? Rust has three main promises. The first one is performance. So Rust has to be fast. It has to run very well. It also has reliability. So people need to be able to rely on it. Yeah, it can be fast, but it's inconsistent. It deals to weird behavior. No, it needs to be very reliable. And it also needs to be productive. So yeah, sure, you can write Rust, but it takes like 10 months to write anything in Rust, and it's not worth it. It also needs to be productive. So these are the three main goals and promises of the Rust programming language.

And I want to focus on specifically one of them, and more specifically a single part of the message on the performance one, which is that it must easily integrate with other languages. So the point of Rust is not to replace everything, although for some people it might seem like that, but it's also to easily integrate with other languages. And that's why I'm here. We're talking about R and Rust together.

How the R ecosystem can benefit from Rust

So how can the R ecosystem benefit from Rust? There are many ways it can. One of them is packages, another one is tools, and another one is culture. We're gonna kind of go through them in order, and I'm gonna start with packages.

So to answer this very hard question of how can the R ecosystem benefit from Rust, I'm gonna take a little bit of a step back and look at one of our sibling data communities, which is the Python ecosystem. So how is the Python ecosystem benefiting from Rust today? This is a little plot I made. I went and downloaded all of the packages published to PyPy. For you guys that don't know what PyPy is, PyPy is like the CRAN of Python, so it's where people publish Python packages.

And we can see packages that use C++, that use Cython, which is like a domain-specific Python C language, and also Rust. And you can see Rust very quickly becoming one of the dominant languages that people extend Python with. Nowadays, in 2025, from what's come of the year, one out of every three packages that need native code in Python use Rust, which is huge for a language that's very new, and the exponential curve is very clearly seen right there.

one out of every three packages that need native code in Python use Rust, which is huge for a language that's very new, and the exponential curve is very clearly seen right there.

So what are these packages? There are so many packages being published in the Python ecosystem using Rust. What are those packages about? Well, a lot of them about AI. So anyone that uses AI or has used WhisperX, you've probably used a Rust package already. So OpenAI, StickToken, the Tokenizer, or HuggingFaces, VectorDatabases, if you are doing RAG and experimenting with RAG, you're probably using Rust already in the Python ecosystem. As you can see, there's a very dominant color, which is like this little pinkish-reddish color, which is Rust on GitHub.

Also, data lakes and data processing have a lot of Rust in them. So if you've used folders, a lot of the demos we've seen here at Posit.com have used folders. Folders is a data frame library that has a Python interface, but has a Rust core. Also, things with Delta tables, so if you've ever had to interact with a Delta lake, you've probably also used Rust from the Python ecosystem, and there are a bunch of other packages in data processing and data engineering on the Python side that, underneath the hood, use Rust to make themselves fast, reliable, and productive.

So by now, I think it's very clear that the Python ecosystem is leveraging Rust to make itself way better. So how about R? How can R tap into that success and become better as well?

Packages: Extender and real-world case studies

Well, one of the answers, because there are many, is Extender, or R Extender on CRAN, if you want to install it on your machine. Extender is a way of taking Rust code and running it inside your R sessions. So it's kind of like RCPP, so it's a way of extending R with Rust, and it's very cool.

So here's a little R function that I wrote. I took some elements from a string, I split them up by a comma, converted them to numbers, and kind of summed them up. I thought of this function because it was complex enough that it would be beneficial to maybe add some performance with it using native code, but also not complex enough to fill out a whole screen. So I can take this code and magically transform it into Rust code, which is different. Rust code is not R code. But also, there's map, there's flatten. The functions don't look that different. There are some similarities in there. And there's this little Extender macro or Extender decorator on the top, and that means, hey, Extender, make this function available from R. And by adding this, we can magically use it from R.

So I wrote this little benchmark where I just generated a bunch of random strings to sum them up. I created a benchmark and compared the results, and you can kind of see the Rust line over there. It's very small. It's very fast code. It's native code. This is running on machine code. And it was as easy as that to integrate both of the languages.

This is an 18x improvement in performance, and for more complex logic, that would probably be better. Even if we added parallelization and stuff, we could go into the 100x's. We've actually had to do this in projects with different clients. So for a Fortune 500 client that we have, they needed to do very fast queries on a directed acyclic graph. For you guys that don't know what a directed acyclic graph is, it's finding relationships between nodes. And this is a problem that is, I mean, there are a lot of good libraries that solve this. iGraph is one of them. But we need it to be very, very performant, and we need to specialize on these sort of directed acyclic graphs. So we built a package called ORFweaver. It's in CRAN. It's mostly written in R. And there is some rust in there. And that rust is to specifically optimize those directed acyclic graph queries. And it's a very fast package.

Tools: RIG and Air

So I think it's very clear that we can benefit in the Oracle system by making better packages, faster packages, and leveraging rust to make them. But how about tools? Tools is one of these things that, to me, are very important. There's been a lot of talk about tools. There was even a keynote, a lot about tools and Positron being a tool. So what about tools?

I'm gonna talk about one tool that was just mentioned. So the R Installation Manager, or RIG. RIG is an amazing tool. If you've ever had to install multiple versions of R, you know it can be a headache. Like, it used to be a nightmare. Nowadays, it's a non-problem. Just RIG add, and you'll get a new version. And this is written in rust. So this is me typing up my password wrong like three times. But it was harder to type my password than it was to install a new R version.

There's also another one that, actually, Davis and Lionel gave a talk on yesterday that's called Air. Air is a very fast, very good R formatter, also written in rust. So these tools that need performance to make themselves good to the users, rust is a very good answer for them. Rust is empowering them to build these sort of tools. This is me just typing out random code and I'll save immediately. And you can see, boom, gets formatted immediately. This was shown in their talk yesterday, but again, it's super cool and it really does help. So now formatting is not something that my team is spending their pull request review time on. Formatting is just something that's done by default. Again, it's a no-brainer, it's a non-problem.

Culture: what Rust brings beyond code

So that's what tooling is. That's what rust is enabling with tooling. So now let's move on to culture. Culture is a little bit more abstract. So tools are quite concrete, like, oh, I can install RIG, I can install Air, I can install Arc or whatever. For packages, it's the same. I can write better, faster packages. What does culture mean?

I'm gonna show a little bit of a quote from Linus Torvalds. For you guys that don't know who Linus Torvalds is, Linus Torvalds is the maintainer of the Linux kernel. So anyone that's ever deployed to Linux, you have to thank this guy for building that. He doesn't build everything anymore, he mostly does code review. But this is what he had to say about incorporating rust into the Linux kernel. He said that it not only made technical sense, but it was mostly, we don't wanna stagnate as a kernel. Linux kernel is almost a 30-year-old project, and they are still trying, they are very hesitant of trying new things, but rust is motivating them to try new things. And I think that talks to the culture of what rust is bringing to the different ecosystems.

All of the tools that I'm showing on screen, and there are many more, are heavily inspired by things that might have come from other languages, but at rust, kind of push towards the community. So if anyone's tried air, air has a little notice that says, hey, this is very inspired by a lot of different packages, one of them being cargo format, so the format from rust. Or RV, like if you tried dependency management with RV, that's heavily inspired by UV, and UV's heavily inspired by something like cargo, which is the rust package manager. There are a bunch of tools that rust has enabled people to do, and that's when I talk about culture, I speak over there, about how it empowers people. And that goes back to the definition of rust.

So rust is a language that, in a technical sense, yeah, it makes sense, whatever, but most importantly, it is a language that's empowering everyone to build efficient, performant, reliable software. And what does that mean for R? Well, that we can make R more reliable, we can make R more performant, and we can also make R much more productive.

Should your organization adopt Rust?

So this is a big question that I've actually been asked by many people, it's like, hey, should my organization consider adopting rust? Or, eh, we don't know about it. So the question is, should your organization consider adopting rust? My answer would be yes. And yes, not in the sense that, yeah, go all in on rust, everyone, all your team needs to now write rust instead of R or Python, that's not the case. Because what we try to do is extend instead of replace. We don't wanna rewrite all your data analysis tools and your data engineering from R or from Python to rust, we want to extend. The project is literally called Extender, like, that's kind of the point.

Because what we try to do is extend instead of replace. We don't wanna rewrite all your data analysis tools and your data engineering from R or from Python to rust, we want to extend.

So you don't need to know everything about rust to start adopting rust. So the barrier to entry, the cost to start using rust in your organization, it's not really that high.

If you wanna start learning about how to use Extender and wanna write your first little package that maybe calls into a function in rust, please scan that QR code. With that QR code, you will be able to get into the user guide. The user guide is written by a dear friend and it's very complete.

If you wanna get into more serious rust, so if you wanna start doing, perhaps, tools, maybe you want to write a tool to fix a very specific problem, then I would highly advise you read the Rust Programming Language book. It covers a lot of things about rust, not everything, but everything you need to start writing tools, for example. So if you wanna write the next formatter, it's kind of a soft problem, but if you wanna write the next AR or the next RV or the next whatever, I would highly suggest you start with the Rust Programming Language book.

And I would also advise you run the Rust in Data community. So this is a newsletter and we'll be having a monthly hangout where we send news and things about the intersection between rust and data. The next iteration of the newsletter will actually have a newer package that uses rust from behind to make some things really fast and really performant and really efficient. So have a rust evening. It's not really evening yet, but now I've made this talk specifically a little bit shorter to have a bunch of time for Q&A because I know this is a topic that maybe a lot of you guys, it's the first time you hear about it. So yeah, let's get into Q&A.

Q&A

So the first question is how many of you have had the experience of spinning a package with CRAN that came from? I've spoken to, like every single time I speak to someone, I was like what, what CRAN? So, it's a not that pleasant of an experience but it is doable. So, for example, when we were submitting Orweaver to CRAN, we probably had like five iterations of them coming back to us. Yeah, but this, there are certain rules in the way that CRAN validates in their automatic checking that use like a really old Rust version. Why, because they want to guarantee backwards compatibility with really old systems but that makes a challenge that things move fast, right? So we don't wanna work with like a three year old version of Rust, we wanna work with like, maybe like a two year old version of Rust would be better than a three year old version of Rust. So yeah, the submitting packages to CRAN can be a little bit of a headache but it's doable. If you're a decent person and you talk to them like a decent person, you'll get through but that's definitely something to improve.

How does Xamarin.org compare to RCD? So R Extender is really cool. There are many ways of extending R packages but R Extender uses Rust's guarantees which I don't wanna get into the details but it really maps the R data types into Rust really well and it works really well. So I'm gonna go back actually to the slide where I showed like the little snippet. So this is a snippet of X Extender, right? And you have this type strings, strings is like the characters array in R and you just can use it like you would use any type in Rust. You don't need to do anything else to implement it. Of course, there are edge cases, there are cases where things don't translate as well but the fact that I can just write this function, now I have an 18 times faster version than my, it's magical and it's really easy for certain things. It's harder for other things but also the problems that you need to solve to do harder things are harder. So I think it's a really pleasant experience especially compared to RCPP, for example.

Okay, last question. Your experience, have you used L1 and you're helping to translate more into Rust? Yes. So mostly, kind of, yeah. So we had this geospatial problem that we did to kind of get, like calculate what's the nearest beach and we had all that code in R and it basically one-shotted it except for using like some deprecated functions because the training data used like old GDAL functions instead of new GDAL functions but that was an easy fix but other than that, it basically one-shotted it. And other times, it just gets it absolutely wrong but that's kind of what we've seen like in talks like L1s are sometimes, like it's unpredictable whether they're gonna do good or bad. You do need to know Rust to evaluate the harder problems but for the easier problems, again, for a function as small as this, it will one-shot this function.

All right, that's it from me. Thank you so much. Thank you.