
Enabling geospatial workflow management with targets (Eric Scott, UArizona) | posit::conf(2025)
Enabling geospatial workflow management with targets: an R package origin story Speaker(s): Eric Scott Abstract: {geotargets} is the latest addition to the ‘targetopia’ of extensions for the workflow management package {targets} allowing integration of geospatial packages such as {terra}. {geotargets} provides custom target constructors for raster and vector objects as well as ‘target factories’ for common geospatial workflows such as iterating over tiles or creating spatial datasets. {geotargets} is currently under review at rOpenSci. In addition to discussing features of {geotargets} and some technical challenges we experienced, I’m excited to talk about how {geotargets} was made possible by the extensible design of {targets} and a group of developers from across the globe brought together by the same cryptic error message. Geotargets documentation: https://docs.ropensci.org/geotargets/ posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
So I'm going to tell you an open-source success story about the geotargets package, which is an R package that extends targets to make it work seamlessly with geospatial data. My name is Eric Scott. I'm a scientific programmer and educator at the University of Arizona. I also work as a mentor for Posit Academy and I work with grantwitness.us.
And I want to introduce to you my package co-authors, Nick Tierney and Andrew Brown. So Nick Tierney is the current maintainer of geotargets, and Andrew Brown is a contributor that brought a ton of knowledge about geospatial data types to the project. And the thing that all three of us have in common is our love for targets. Big thanks to Will Landau, who created it.
What targets does
Targets is a workflow management package for R, and it allows you to define a workflow in a series of steps. So here in this kind of contrived example, we're reading in some data about penguins. We are maybe fitting a model that, let's say, looks for flipper length as a function of sex, and then we're plotting the model results along with the raw data. And targets can figure out the relationship between those steps automatically.
And then if you were to change something, like let's say I change the flipper model function to, let's say, add a random effect of island. Targets knows that when you run that workflow again, you can safely skip the file and data steps and only needs to rerun the model step and anything downstream of it. And this saves you a ton of time and makes your workflow more reproducible. And it also allows you to do things like run targets that can be run independently in parallel on multiple cores or on HPC.
The problem with Terra and targets
So when both Nick and I independently, but around the same time, first started working with geospatial data with the Terra package in R, we naturally thought targets would make this magical combination. But in fact, we got this mysterious error message instead. External pointer is not valid.
So let me explain a little bit about where this error is coming from. When you read data into R with Terra, it creates these R objects that don't actually contain any data. They instead contain a pointer to where that data exists. And in this example, that's in my computer's memory. And so if you were to save that R object out to disk as a .rds file and then read it back in in a fresh R session, the pointer is still there. It's pointing to the same place, but the data is not there anymore. And so that pointer is not valid, and you get this error message. And this is exactly what targets is doing by default.
And this is exactly what targets is doing by default. It's writing out the end of that step to disk as a .rds file and then reading it back in in a subsequent step when it needs it or when you ask for it. And you're getting this error bubbling up.
It's writing out the end of that step to disk as a .rds file and then reading it back in in a subsequent step when it needs it or when you ask for it. And you're getting this error bubbling up.
Building geotargets
So fortunately, targets has a really active discussion board on GitHub. And so I went there looking for answers. I saw that Nick had asked similar questions to me recently. And that Andrew had helped answer them. And together, like among the three of us and with help from Will, we figured out a workaround that looks something like this. This is a lot. It's a lot to write for every step of your workflow. It works because targets was designed to be extensible, but we wanted to make this easier. So I suggested to Nick that this should be an R package. And ta-da, now we have GeoTargets.
Which, wow, simplifies that workaround on the last slide, which was like a big mess, into something that looks more familiar and easier to use, like this tar-terra-rast function at the bottom. So right now, targets works with the terra package, works with raster object, vector objects, and other data types that it has. But it also has kind of evolved beyond that and has a bunch of functions that let you take advantage of features of targets that work, we think, really well with geospatial workflows.
So there's a lot of cool stuff in there. It's been reviewed by rOpenSci, and it's part of this growing rTargetopia of packages that extend targets to make it work with different data types and all that.
So you can read more about it if you want to read more about the functionality or get involved in development of it. That QR code just leads to the documentation. I have stickers for the GeoTargets logo if anybody's interested. And there's my info. And I also want to thank our consortium for funding open-source projects like this one. So thanks.
