Earo Wang | Melt the clock Tidy time series analysis | RStudio (2019)

Transcript#

This transcript was generated automatically and may contain errors.

Good day, everybody. Last year I was at the RStudio conference as well, and several questions have been addressed about tidy time series analysis during the conference, especially forecasting with tidy objects. So today I'm going to present a solution to that. So I want to talk about a streamlined workflow for time series under the Tidyverse framework using two packages. I introduce you three big ideas behind those two packages. So they are Sible, Mable, and Fable, and how they link together. I hope to explain them concretely with data stories.

So I believe this diagram isn't foreign to you. This is a Tidyverse model, and each module here is powered by one of the Tidyverse packages. And the Tidyverse plays seamlessly with each other, and one of the fundamental reasons is they all share the same underlying data structure, which is data frame or a table. So the data is actually placed in the center of the diagram. But why we cannot bring this workflow easily into time series? So because the current time series objects in R are model-focused. By saying model, I mean it not only includes statistical models, but also forecasting, decomposition, autocorrelation function, and other time series tools. And all the methods or functions expect matrices as inputs. But the data arrives in the right beginning of this process. It really comes into the matrix form. And we have to write so much ad hoc code to get the data into a time series model-ready object. It is a pain because of the mismatch between temporal data and time series models.

So we hope to change that. So some new tools are provided to streamline this workflow and make time series analysis a bit easier, more fun, and more intuitive. So the Sibyl package will focus on the tidy and transformation part. The Fable package will do time series forecasting, and the visualization is done with ggplot2 and its extensions. So to make this workflow work, they're going to share the same underlying data structure, which is a new data abstraction for time series, and we call it Sibyl. So Sibyl is a time series table, and if you know TS is a native object to represent time series in R, so this is where the name Sibyl comes from.

Second, a Fable is like a forecast. It's never true, but it tells you something useful.

So let's take a look at the data in the first 30 days of January. So each facet gives a daily snapshot of hourly electricity demand. And the peak in the late afternoon is driven by the use of air conditioning. And January is the summertime in Australia. So you can see some days have much higher usage, colored by red, because they are very hot days with maximum temperature greater than 32 degrees. And I'm going to use this subset to forecast the demand one day ahead. And I hold the data of January 31 as a test set.

So let's model the data. So I construct two models for the energy consumption with a model verb. And the first model I built is a naive model as a benchmark. So the naive method simply uses the observed values from yesterday as forecasts. And the second model is ETS, exponential smoothing, which can be thought of as a weighted average of the past values. And ETS is also short for error, trend, and seasonality. So the model function uses a formula interface. So on the left-hand side, we specify the average supply as a response variable. And on the right-hand side, we can put some specials related to the method. So I've specified the naive function to use the 24 values from yesterday instead of the value from the previous hour. And if we don't specify the right-hand side like ETS, so it will do automatic model selection by picking up the best model for you.

So now we have a Mabel back. So a Mabel is a model table that contains model objects. So each cell shows a succinct model representation saying I have a seasonal naive model and an ETS with three selected components. So models are reduced form of the data. So the model function is an analog of summarize because they use the same semantics. And the model function also reduces the data down to a single summary, but it happens to be a model object of that summary. So in order to look at like parameter estimates, information criteria, or residuals from model objects, we just use a familiar Bloom functions, tidy, glance, and augment. By applying the tidy function on the Mabel, we get the parameter estimates for the models we've built. So we've got a bunch of parameters for ETS, like alpha, beta, and I know those are boring parameters.

So it's time to forecast. We pipe the Mabel into the forecast function, and we're doing a one-day-ahead forecast, equivalent to 24 steps ahead. And it supports human-friendly input, so we can read more naturally, like forecasting with one-day horizon. It's convenient because we no longer need to mentally compute how many hours, minutes, seconds in a day, but it can still do H equal to 24. And we're done with forecasts. We have a forecasting table, which is a fable. So it's a special table, which includes the future predictions. It not only tells you the point forecasts, but also our underlying prediction distribution that involves uncertainty, because we are forecasters, not fortune tellers. And you can see the normal distribution with its mean and standard deviation in the last column dot distribution. And this is one of my favorite features about Fable, reporting distribution forecasts. So you're able to produce any level of prediction interval you like.

It not only tells you the point forecasts, but also our underlying prediction distribution that involves uncertainty, because we are forecasters, not fortune tellers.

And we'll see the forecast more clearly with plots using geom forecast. So the naive method repeats yesterday's pattern, but the 80 and 95 predictions intervals are quite large, and some even goes below zero. So how about ATS? So the ATS nicely captures the daily trend and produces a much narrower prediction interval. And which model performs better? So use the accuracy function to compare the predictions with a test set that I held before. So looking at the accuracy measures, ATS does slightly better than the naive in terms of root mean squared error. But they all tend to give underestimated predictions. And the black line in the plot is actual data. So it looks like it's another hot day in January. If we have weather information like temperatures, we can include them to improve the forecast, but need to use a different model that allows for exogenous regressors. For example, a Ramer model.

Scaling to many time series

So far, I have showed all the steps from model building to model assessments for just one time series. Would it be any different if we have multiple time series in a table? No, because models are fundamentally scalable. So it's simple as a modern reimagining of time series. It designs for hosting many time series together. Especially the series have already been defined when we created the table. And we are obviously interested in forecasting the demands for each household here. So I've removed some troublesome series, ended up with 1,480 households. So no extra steps needed for forecasting and scale. Just as what we did before, we can directly pipe them into the model function. It will fit an ETS model for each customer at once, and then happy forecast. So I also take a log transformation on the response variable to ensure that I get positive forecast back. And the forecast function will take care of the back transformation for you. So you can see 1,480 models have been faded in the Mabel. And the key variable is always a key to refer to a series across Siebel, Mabel, and Fabel. Models are scalable, but visualization is not. So I just plot four customers with their forecast here. And the individual level, lots of noise, producing much larger prediction interval as well.

So I've shown a proportion of what Siebel and Fabel can do. We've got decomposition simulation based on model fades, interpolation of missing values, and model supports for streaming data. So please check them out. So it's a joint work with Di Cook, Rob Hyman, and Mitch O'Hara-Wild. I need to mention that Siebel is on Chrome, but Fabel is on GitHub at the moment. And they all belong to tidyverts.org. Those are the useful links to those packages, my slides, and the source code behind the slides. That's all from me. Thank you.

Q&A

Thank you so much, Eero. We have time for a couple of questions, and we have throwable mics moving around, so just raise your hand.

Hello? Can you hear me? Okay. I'm wondering if any of these tools are well-suited for irregular time series?

Not yet. But Siebel will support irregular data structure.

Hi. I was wondering if this supports hierarchical time series, reconciliation?

Yes, we're considering that probably second half of this year.

What is the level of maturity of the package with compare the models that are available today in the forecast package?

So we sort of like think go to Chrome like sometime in March or April, and if like to replace the whole forecast package would be like the end of this year, I think.

I noticed that there's a rival Tyble time package, and I'm just wondering what the difference between Siebel and Tyble time is.

So basically there's two time series structures, and we've got a couple of things in common. So first, both are built on top of Tyble, and second, both declare the index variable, and for Siebel, we require the key as well, and also does the check, so it needs to be distinctive observations. So this makes Siebel is different from Tyble time, and also the function like interface are also different. Do I answer your question?

So kind of a shoot off of the last question, do you have any plans to incorporate the ability to specify time windows using shorthand like XTS does or like Tyble time does?

Yes, I do have the functions. It's called a filter underscore index, so it does support the shorthand. Just to check the documentation, please.

Earo Wang | Melt the clock Tidy time series analysis | RStudio (2019)

Transcript#

Introducing the tsibble data structure

Wrangling time series with tsibble

Tidy forecasting with fable

Scaling to many time series

Q&A

Featured software#

rstudio