Resources

Jeroen Ooms | A preview of Rtools 4.0 | RStudio (2019)

Rtools is getting a major upgrade. In addition to the latest gcc, it now includes a full build system and package manager to build, install, and distribute external c/c++/fortran libraries needed by R packages. Thereby it bridges the long-standing gap between Windows and MacOS/Linux with respect to the availability of high quality, up-to-date system libraries. In this talk, we will show how to build and install system libraries with Rtools, and manage your Rtools build environment. It should be interesting both for Windows users as well as non-Windows package authors that are interested in reducing the pain of making things work on Windows. VIEW MATERIALS https://resources.rstudio.com/rstudio-conf-2019/a-preview-of-rtools-4-0 About the Author Jeroen Ooms Postdoc hacker for @ropensci at UC Berkeley

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

My name is Jeroen Ooms. If you don't know me, I'm part of the rOpenSci group at UC Berkeley. This is our team.

And I maintain a lot of R packages, many of them wrapping external libraries, external C and C++ libraries. And in addition, I'm also the maintainer of the Windows installers for R base and Rtools. And the current talk will be on the intersection of these two things. And I will go into what's involved with building R on Windows, and in particular, how we're going to try to improve that for the next generation of the compiler tool chain.

So first I'm going to explain a little bit about how it currently works and how Rtools currently works. And then in the second part, I'll introduce the new tool chain, which is currently experimental in which we're beta testing, hopefully to be included with the next major release of R.

What is Rtools?

So what's Rtools? Well, to understand that, you need to take a step back. So in order to compile things, both base R as well as packages, you need a compiler. And on Linux and Mac OS, there is a native compiler on the system. Linux is usually GCC. And on Mac OS, we have Clang with Xcode. But on Windows, there's no such thing. So on Windows, we need to provide our own compiler tool chain. And that is Rtools.

So Rtools, if you've ever had to install a package from source or Windows, it's this thing you install separately. And it helps you with building stuff on Windows.

So Rtools has been around for quite a while. I didn't invent this. As you can see from this retro 90s homepage on CRAN, it has an archive of all of the Rtools going back like a decade or two. So this is something that's been supported for quite a while. And I've only taken over on maintenance last year. So please don't blame me for everything that's suboptimal.

So what does the workflow look like? So you install this thing. And then you have this installation dialog. And you can choose which components you want to install. So there's the tool chains, which are like the most important thing. Some build utilities, mostly make and shell and tar to extract stuff, which are also needed to install R packages. And then the other things are sort of optional. The last thing is only needed if you want to build R itself, which you should never do.

So to summarize, Rtools is needed to build base R. And if you want to build R packages on Windows, which contain C or C++ or Fortran code, and this is the part that most users are familiar with. But then there's also a very big third thing, which is external libraries. So a lot of both base R and a lot of R packages are using external system libraries. And those system libraries need to be compiled with the same tool chain and same configuration as we're using for base R. And that is sort of a dark piece of the ecosystem right now.

So if you've seen any of my previous talks, you've maybe seen this picture. So this is what the dependency system looks like. You have all of your packages, and they depend on R. And R itself depends on a lot of external libraries that we all need to build with the same tool chain in order to build R for Windows. And many R packages, they also depend on a bunch of other external libraries. And this is a lot of work to build all these things.

The current manual approach

So currently we're doing this manually. So there's our WinLib organization in which we manually release binaries for the most important external C and C++ libraries, which are used by CRAN packages. And there's currently, as you can see there, there's 73 libraries in there right now. But that's not counting dependencies. So for example, one of these libraries can depend on a dozen more other sub-libraries. So there's like hundreds of external libraries that are in there.

And this is kind of suboptimal. Maintaining these builds is a lot of work. If you want to build an R package that needs any of these libraries, it's kind of a bit of a hack, because the package author or the build server would need to manually download these libs and then put them in a special location and then mingle with the compiler and the linker flags on your system, or you can have a script in your package that automatically downloads them. But it's not the way the world's supposed to work.

Introducing Rtools 4.0

So for the next R tools, I wanted to sort of improve the situation, and I've looked around a bit and how this is done in other places. So we want to take advantage of a proper build system, so that we can more easily build all of these external libraries using the same compiler toolchain and the same R tool system as we use to build R packages or base R itself, and then we want to automate that build process. And also, I would like to open and collaboratively maintain these package build scripts, so that it's more easy for other people to contribute a build script of a new external package if they want to use it in their R package.

So that is sort of the goal of R tools 4.0. And I want to emphasize again that this is something that we're currently beta testing. It's not sure at all if and when it's going to ship, but hopefully it will, maybe for the upcoming release or otherwise the one after that.

So you can already test it. If you go to CRAN on Google R tools 4.0, there's a special page, and there you can install R tools 4.0 for your system, and then there's also a special version of R, which I call R testing, and it's like the R devel builds, if you've ever tried this, but it has been configured to use this new toolchain, so it's sort of hardcoded the path to the new toolchain, and it has a custom location where it installs the packages, so you can use this, you can install this, and you can play with the new R tools, and it won't conflict with any other package libraries or so that are using the current stable toolchain.

So what's the experience like? You install this thing, and then suddenly in your start menu there's this new thing, which wasn't there before, because before R tools was just sort of a bundle that you install over in system, and now there's actually a shell, and there's a thing, so you're like, what if I click on that? So if you click on that, it opens a terminal window, so R tools now actually includes a full terminal with Bash, a shell, and all of the tools that you're used to, or probably not if you're on Windows, but it's a full Unix-like environment based on MSYS2 and Sequin, which has all of the build tools that you need to build all of these open source libraries.

So what's in the tin? Well, the most important thing is there's a nice terminal, and there's the shell and everything that comes with that, Bash, and then there's all of these utilities which are part of Sequin, like Make and Set and Perl and whatnot. And of course there's the compilers, there's the MinGW compilers, one for 32-bit and one for 64-bit Windows. But the most important thing in new R tools is there's a package manager, and there's a real package manager to build and install external libraries, just like apt on Debian or yum on Enterprise Linux, or homebrew on OSX.

But the most important thing in new R tools is there's a package manager, and there's a real package manager to build and install external libraries, just like apt on Debian or yum on Enterprise Linux, or homebrew on OSX.

Using pacman to manage libraries

So what would that look like? The package manager is called pacman, and then the first command, pacman does s, stands for sync, and then y is it syncs with the repository index, and for example, syu would upgrade all of the installed packages, and then the second block of code shows, for example, how you would install the libxml2 package. And in pacman, there's a separate package for the 32-bit and the 64-bit package, but you can install them both with this syntax. And I want to emphasize, I didn't invent pacman, this is the package manager from R Linux, which has been ported to Windows, and it's a really nice fit, and it's really nice. It has all of the features that you can expect from a modern package manager, so it really starts to feel like a proper operating system.

So, I didn't have the confidence to do a live demo like Gabor, so what does that look like? There's a shell, and then this is pacman, and if you do pacman s y, it updates the repository, and here I'm installing the curl package, and it says, you know, curl depends on libssh and openshell, so we need these things as well, so it starts pulling in these things, and then it's installing, blah, blah, blah, and then this is for speed, so it's slower in real time, and then it's done, right? And now it's there. So now you can do package config curl, and it's there. So it's really like you would expect from, you know, from app.yaml or so.

All right, so which packages do we have? So there's a repository under r-windows, rtools packages, which currently contains all of the, like, I think about 100 packages that I've ordered to this system, so this repository contains all of the build formulas for this package, or not the actual binaries, but these are like the pacman config file to build these things, and if you look what's in there, every of these subdirectories contains a file called package build, and this defines the name of the package, like the URL of the source, you know, a bunch of patches, checksums and stuff, and, you know, what's needed to build the package and install the package, so usually there's like some CMake or a configure line and then make and then make install.

So what does that look like? So, for example, here's an example where they're building the popular package, which uses CMake to build, you always have to do make package, MGW, so if you do that in the directory that contains this package build file, like, everything else is automatically done, so it starts pulling in the source for the popular library, it runs CMake on all of these things, and you get the output, and then, you know, if everything works, in the end, it will compress libraries, strip out some debug symbols, and create an actual package, which is a tar.xc file. So let's run, and here it's done, and now it's sort of tidying up the install, stripping, loads, and then you have an actual R package, and the binary package is this new tar.xc file that was created, right, so it's once the formula is in place, like, maintaining these things is hopefully going to be much easier than it is now.

Automated builds and deployment

All right, so that is how you would do it manually, but of course you want to automate it, so if you send a pull request to this library, or in my case, if you push straight to the master branch, AppVeyor will start building these things, and they look like this, and you can use, you can look at, in the log file of the AppVeyor, which are the latest things that you've built, and this is sort of the most beautiful part of the system, so it automatically deploys, so if you look in the AppVeyor log file, towards the end, and if it's in a pull request or a branch, it will then just say it was complete, but if you're in the master branch, it will then automatically deploy everything to Bintray, so all of these, all of the binaries for these Windows packages, they are hosted on Bintray, and they are automatically deployed once the build is successful from AppVeyor, so there's nothing that I have to do other than merging the pull request to actually release these libraries, right, so once the build is complete, at the bottom here you see it's, AppVeyor starts uploading the binaries to Bintray, and it's sort of recalculating the index file, and then you can see it on this URL, so this is the URL where all of the binary packages are stored, and this is the URL that then Pacman, in our tools, uses to grab the packages, so it's live immediately, there's no manual step anymore, you know, you send the pull request, AppVeyor builds, if it succeeds, it deploys on Bintray, and then the next time any user would do Pacman as update, it starts pulling in these new libraries, so it sort of ties up the circle really nicely.

Yeah, so I hope that this will make the maintenance much easier, and I should emphasize that once you have installed one of these system libraries using Pacman, our tools will automatically find it when you build an R package that needs a library, so there's no need to set special compiler flags or special linker flags to point to a particular location, because Pacman installs it exactly where the linker expects to find it, and where the compiler expects to find it, right, so it's going to feel much more like apt, or yum, or brew, where if you need a system dependency, you do the Pacman install, and the system dependency is there, and then your R package should just build without any further hacking.

Summary

So, in summary, last slide, the new version of R tools will include a proper build environment that will make it easier for me, for us, to build and install all of the external libraries. Pacman makes it really nice to package and distribute and install these libs, so there's no manual copying of files anymore on build server. R will then automatically find those installed libs, so you don't need to pass special linker flags or include flags when you're installing an R package on Windows from source, and the most important part is that, you know, these libraries are all going to be transparent and reproducible and automated via the CI, so there should be much less maintenance, but also I think there's going to be a bit more accountability, because currently, you know, I'm just uploading these binaries to R winlib, but, you know, nobody knows how they are built and, you know, what I put in there, and, like, this system is going to be fully open, fully transparent, fully automated, and everything gets built in a public place and deployed, and, you know, everyone can contribute, so, yeah, that's it.

R will then automatically find those installed libs, so you don't need to pass special linker flags or include flags when you're installing an R package on Windows from source, and the most important part is that, you know, these libraries are all going to be transparent and reproducible and automated via the CI, so there should be much less maintenance.

Thank you very much, Jeroen, for your talk and also for making my life as a Windows user so much easier than it otherwise has been. We have time for one question. This question at the back, I think it's Rich here with the golden shirt.

Thanks. Good throw. So this sounds like it makes it a lot easier to have more external dependencies, so are there going to be some new dependencies that R uses?

So, because it's now easier to have external dependencies for R, are there going to be some more dependencies in base R? Exciting new tools.

Yes, so I hope that this system will make it easier to use, to take advantage of these external libraries in R more easily, because previously you're sort of tied to what I mean on Linux and Fedora, most of it is available, but on Windows, the system was really limited by, you know, people that were willing to build these things for you, and there's, like, only two of them that would probably do that, so I hope that this will, I hope not open the floodgates, but at least give R users better access to up-to-date high-quality libraries that now also work on Windows.