Resources

quickr: Translate R to Fortran for Improved Performance - Tomaz Kalinowski

This talk introduces 'quickr', an R package designed to make numerical R code faster by translating R functions to Fortran. While R code offers great flexibility, it often comes at the expense of performance, especially for computationally intensive tasks. To achieve better speed, users typically need to rewrite performance-critical code in compiled languages like C or Fortran, which adds complexity and creates maintenance overhead. Quickr simplifies this process by allowing users to add simple type declarations to their existing R functions, which enables quickr to then automatically translate the entire function into efficient Fortran routines. The presentation will demonstrate quickr in practical applications, with benchmarks showing performance improvements comparable to native C implementations. The talk will also cover current limitations, including supported data types and language features, and show how quickr can be easily integrated into existing R packages. Participants will learn how quickr can help improve their R code performance without significantly increasing development complexity or sacrificing the readability of their code. https://github.com/t-kalinowski/quickr

Oct 28, 2025
14 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Thanks for coming. I'm very excited to be here to tell you about quickr. So quickr is an R package that you can use to compile an R function, and a quickr compiled function will typically run somewhere between 20 and 200 times faster than the R function. And in general, you get the same speedups that you would get if you were to rewrite that function into C or Rust or Fortran. And one bonus of compiling a function is you get really robust runtime type checking with some great error messages if you give the wrong inputs.

Motivating example: convolve

So to kick things off, I want to start with an example of a slow function, and I'm going to use convolve here. Convolve is a nice example because, well, it fits on one slide. It's a small function. It, I imagine, is an operation many of you are familiar with. It also has two nesting for loops and an in-place modification of an array. So these are things that typically make for a slow R function.

And if you have a slow R function like this and you seek advice for how to speed it up, the first piece of advice you'll get is to vectorize it. And this is great advice, especially if it makes your R code simpler, but convolve makes the case that sometimes you don't want to vectorize it. You want to write the for loop. Trying to vectorize it is not going to be the answer.

The next piece of advice that you might get is to rewrite it in a lower-level language like C. And this is also great advice if you're a full-time software engineer and you already know about memory management and build systems and the foreign function interface. But that comes with a huge learning burden if you don't already know this stuff. And a side effect of rewriting your function in C like this is that now this portion of code is gated off from future collaborators who have allocated their learning efforts differently from you. You've basically boxed that off and probably the collaborators you most want who are domain experts in something that you're interested in.

So that's the motivation behind quickr. Sometimes R is slow. Sometimes you want to write the for loop. Vectorization is not the solution. And another language is not the solution. So quickr is the alternative.

Sometimes R is slow. Sometimes you want to write the for loop. Vectorization is not the solution. And another language is not the solution. So quickr is the alternative.

Using quickr

So coming back to convolve, to make convolve fast, we just have to do two things to it. One is you wrap the function in quick. And then the second thing you do is you add type annotations. So here's what that looks like in practice. Here's convolve. We wrap it in quick. Pretty simple. And then the next thing we do is we add some type annotations. So here we're saying that A is a doubles vector of any length and B is a doubles vector of any length.

And that's it. So if you take this new fast function and you benchmark it, you'll see that it's about 200 times faster. If the R function takes one second on a set of inputs, the quickr function takes five milliseconds. And the C function also takes five milliseconds. It's the same speedup.

So this is remarkable. What's the catch, you might ask? The catch is this doesn't work for all of R. This works on a subset of R and it only works with atomic types today. So the vocabulary is about 70 verbs that are supported. Here's it in broad strokes. It's roughly all the control flow operators, all the binary operators, all the vectorized math operators, and the math reduction operators. You have your allocators, some reflection functions, and some diagnostics. About 70 symbols. I expect this to grow over time, but this is where we are today.

The readme for the package has a few more substantial examples that don't fit in this talk or on a single slide. There is a probabilistic programming implementation of a hidden Markov model where you see a speedup of about 50 times. There's a heat diffusion simulation where you get 100x speedup and a handwritten weighted rolling mean where you get a 20x speedup, which also beats the C++ implementation.

Here's just a note on the declare syntax real quick I imagine some of you are interested in. So declare comes from base R. It does nothing on its own. You can think of it as a comment that just lives in the parsed function body. And quickr looks for a comment of this specific shape where within a type call you have the variable name, the mode of that variable, and the shape. And the mode can be one of these four right here today, and then the shape can be a number of things. It can be an integer for a fixed size array, NA for a variable size array. It can be a symbol for a named dimension or another variable that's an input argument. And you can also have simple expressions here for runtime computed shapes.

So here are examples of some fixed size arrays, a 1D vector and a matrix. You can mix and match that with variable size shapes. So here Y is going to be a four row matrix with any number of columns, for example. You can also declare size relationships. Here we're saying X is an integer vector of any length, and we'll call that length N, and Y is an integer vector that's also of length N, so X and Y have to be the same. And if they're not, you get a nice error message telling you that. And N can be an implicitly named dimension like this, or it can be one of the input arguments that you also declare. And then finally you can have expressions here. It's not just symbols and constants, but runtime derived expressions.

How quickr works under the hood

So if you just want to use quickr, that's roughly all you need to know. But I'm sure many of you are curious, how does it work under the hood? So what happens when you pass a function to quickr is it takes that function and it translates it to Fortran, and then it lays down the plumbing to make that Fortran routine callable from R.

So this R function becomes this Fortran subroutine. And then this C function is built. This is the bridge between Fortran and R, and here we unpack the R arguments. This is where we do the type checking, call the Fortran subroutine, and then return the result. And then this compiled C function gets wrapped in an R closure, and that's returned to the user at runtime.

Now the sequence of events here has two modes of operating, which I'll call the JIT mode and the AOT mode. So JIT stands for adjust in time, and this is what happens when you just call Quick in a normal R session. So it'll generate the Fortran and C code, write it out to a temporary directory, call the compiler, load the shared object, build the closure, and return it. And that happens quickly. It's fine. It's fractions of a second.

But in an R package, Quick does almost nothing, right? Something very different happens. Quick returns just the closure with basically a placeholder for the pointer to the compiled code. And the code generation of the Fortran and C actually happens in a separate motion when you do devtools load all. So I think this is something most package developers use. If you don't, quickr also has a helper to help you invoke this code path.

So quickr collates all the Quick functions that are in your package, it builds them into a submodule or a shared compilation unit, and then dumps them out into these C and Fortran files. And that's it. It doesn't do anything else after that point, because the R package machinery already knows what to do with an F90 file and a C file. So from the perspective of the R package machinery or external reviewers like CRAN, your package just has some Fortran code in it. It's not doing any kind of runtime compilation.

And so this overall design, I think, sidesteps a lot of issues that you might have with other approaches to building a compiler or a runtime compiler. Probably the two most successful ones that come to mind for me are Numba in Python and Julia, the language. So both of these ship basically most of LLVM as part of what they offer. And that comes with a lot of complexity. And it's like a lot of work. And here we're basically short-circuiting that. And we're using Fortran, the existing infrastructure that's already there, so we don't have to ship a full compiler as part of quickr.

And if you really want LLVM, you can also use Flang as basically a frontend to LLVM, which gives you a lot of the same things. The other thing this does is there's no compilation overhead at startup time. So Numba and Julia and sort of things in that space, when they're used to a sufficient amount, it really degrades the interactive experience. Because every time you start up your REPL, you're waiting many seconds for things to recompile every time. And so using the R package machinery like this, there's no degradation of the startup time.

Why Fortran?

So that's how it works in broad strokes. And now I'll talk about sort of the why and how. So if you're not familiar, actually, how many of you, show of hands, know what Fortran is or have used it in the past? Okay. About half. That's great. Fortran is the first high-level language that was... Before that, people wrote machine code directly. It predates R and Python and C by over two decades.

And many, many languages have come and aimed explicitly to displace Fortran and have failed and are just in the history bin. It consistently ranks in one of the top ten languages that's still in use today. And it's used especially in the physics community and the high-performance computing community. So any time you're doing any kind of simulation, like climate modeling, weather prediction, or plasma physics, whatever, things like that, you're probably running Fortran code. And about 23% of R itself is today Fortran sources.

So Fortran is very, very fast. And it's fast for a couple of reasons. One is the people writing Fortran code and running Fortran code really care about speed. And they've cared about it for a long time. There are decades of effort behind making Fortran fast. The language itself also makes it really easy for the compiler. So the arrays are first-class citizens in Fortran with declared sizes and types and a memory layout. And the array semantics means that the compiler can assume arrays don't overlap. So this leaves a lot of low-hanging fruit for the compiler to do things like pipeline instructions or use SIMD or skip bounce checks and so on. And these are things that you don't have to be aware of as an author. You just kind of drop into this pit of success for optimization.

And the language is very, very active still. There are new language additions every couple of years. And it continues to see compiler improvements and also new compilers coming out every few years. It is actually quite an active space.

So that's one reason. Fortran is fast. The second reason is Fortran comes with very, very strong support from R. So this is a talk John Chambers gave at this very conference in 2006 showing the scale sketch of S, the initially proposing S. And from conception, from day one, the goal was to provide an interface to Fortran subroutines. And this comes through in a lot of ways, including how R was designed and evolved. And one of the ways it comes through is that everything with Fortran just works. You don't have the growing pains that you use with something like Rust today or what C++ experienced in the past and sort of the inherent complexity that comes with that. Fortran just works. It works simply and it works well in an R package.

The other way it comes through is that there is quite a bit of overlap in semantics and syntax and even culture between Fortran and R. And this comes through in some technical ways, like, for example, one-based indexing, first class support for arrays with the same exact memory layout and very similar slicing operator or subsetting operator. First class complex values, vectorized operations on those arrays, scalar broadcasting. So if you multiply a scalar in an array, that just works. The base namespaces of both these languages come fully populated with everything you need to do math.

Both of these languages, I think, also are designed for practicing researchers, which comes through as some design decisions that maybe people who aren't practicing researchers are dismissive of sometimes. And both Fortran and base R have a strong aversion to breaking changes. So if you come across Fortran code written 30 years ago, it probably runs just fine today, and the same holds true for R, for well-written against base R.

So the way to think about it is R is a big language with a lot of features. And Fortran is a big language with a lot of features. And the areas where they overlap is substantial. And this is the space where quickr sits. quickr will let you compile the parts of R that overlap with Fortran.

R is a big language with a lot of features. And Fortran is a big language with a lot of features. And the areas where they overlap is substantial. And this is the space where quickr sits.

So that is essentially it. quickr is a compiler for R. It lets you trade R's dynamicism for speed for just the context of a single function. You don't have to learn a whole new language. And it's a way to make your R code run faster. So thank you.