Resources

Davis Vaughn | Sliding Windows and Calendars | RStudio (2020)

A number of R packages exist to make computing moving averages on a single numeric series straightforward. But generally “real” life is much messier than that! Try computing a moving average over a twenty-day sliding window when you have a time series with missing data. Oh! By the way, you should also skip over weekends when looking back twenty days. And you know that random holiday that your company celebrates that no one else does? Skip over that too. These are hard but realistic problems, and until now there has been a lack of tools necessary to solve them. In this talk, I’ll present two packages designed to tackle these issues, slide and almanac. slide is a package designed to perform arbitrary sliding window calculations. The simplest example of this would be a moving average. What makes slide unique is its support for sliding relative to an index, such as a date vector, which allows you to correctly compute the boundaries of that twenty day window. almanac is package for creating custom business calendars, and then adjusting dates relative to them. Inspired by lubridate, almanac allows you to shift dates by a set number of “business” days while respecting the weekends and holidays defined by a user-specified calendar. For example, shifting a Friday forward by 1 business day would land on a Monday, unless that Monday happened to be a holiday, in which case the next business day would actually be Tuesday. Together, slide and almanac provide the tooling necessary to solve the problem mentioned earlier. Additionally, because slide works with any arbitrary function, we can use the same procedure to compute rolling regressions, cumulative sums, and any other sliding computation. A 5-minute presentation in our Lightning Talks series

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So, my name is Davis Vaughn, I'm a software engineer at RStudio. As Max said, this was supposed to be my beautiful e-poster that was going to be shown at the introduction, and it was not. So I'm going to talk about this, but this looks horrible, so I'm just going to scrap it entirely and start over.

So this is a talk about two packages. The first one is called Slider, and the second one is called Almanac. Yes, I'm going to do two packages in five minutes.

Slider: rolling and expanding windows

So Slider is a package for rolling and expanding windows. If you've ever used Zoo's RollApply or Sybil's Slide before, this is kind of the more supercharged version of those. So with the function Slide, you take a vector as input, you take a function that you want to apply to different sliding windows of that vector, and then you can control the window size with these extra arguments.

The only one I'm going to talk about today is the before argument. So here what this is saying is for each element, I'm going to take that current element of the vector and look one element before as well, and that's going to make up my sliding window. So at the bottom here, there's another example of a different type of rolling window with before equals two, and if yet, the animation is going, so you're seeing six back to four, ten back to five, and then with the expanding window, you can actually do things such as cumulative sums or other different types of expanding ones. So for that, you set before equals to infinity, which says give me the current element plus everything before.

This is supposed to have very, very, very similar syntax to per. Slide is very much like map. In fact, the defaults are the same, and then it always returns a list. It's completely type and size stable. There are variants such as slide double, slide DFR, slide two, and PSlide, everything that you might expect with per, there is in slide as well.

I'm more excited, though, about the fact that there's this idea of time-aware sliding built into this package. So slide index is this other function, and it allows you to pass this secondary date-like normally secondary index. What this means is that if you have this example here, and whether you can or cannot see it, we've got a vector and then this secondary index where it's one, two, four, and then five. You can think of those like days of the month, the first, second, fourth, and fifth day of the month.

If I want to take the current element and one day backwards, that might be kind of problematic here because we have an irregular gap in this series. But slide index is smart enough to know that when you're on the second day, you want a range of days one to two, and when you're on the fourth day, you want a range of days three to four, so it knows that in that third function call, you don't want day two. So slide doesn't know how to do this on its own because it doesn't have that extra index, but slide index does, kind of solving this completely new problem that's really useful in time series analysis.

slide index does, kind of solving this completely new problem that's really useful in time series analysis.

Almanac: holidays and business day logic

So Almanac is a completely different package. It kind of came out of a use case for slide where I really needed to control weekends and holidays. So you can create kind of a holiday object with Almanac by starting with this base frequency, like yearly or weekly, and then you add on these recurrence conditions. For Labor Day, it happens in September, and it's on the first Monday in September. And then you can do fun things with that object, like say, what days are Labor Day? Returns yes or no.

I'm more excited for Almanac about these lubridate extensions, though. We have this bDays object that you can use. So in the bottom right corner there, say that we're on Friday, and we want to go one day ahead. You might use lubridate days and just say plus one day. That takes you to Saturday. But in the business world, if I was on Friday, one business day ahead is actually Monday. And if Monday is a holiday, it might actually be Tuesday. With Almanac, you can kind of bake in this scheduling logic of what are holidays, what are weekends, what are the events that you're interested in, and then create this bDays object in the bottom left with that schedule, which says skip over these events. So in the bottom right there, we have Friday plus one business day takes me to Monday.

Slider and Almanac together

Lastly, Slider and Almanac are made to play very well together when you use slide index in conjunction with these bDays objects. You can say, for example, if I'm on Monday, look back one business day. That creates a range of Friday to Monday, and that kind of gives that nuanced difference in that third function call there on the right.

And that is Slider and Almanac. Thank you.