On performance of temporal aggregation in time series forecasting

.large[.alert-bottom1[CMAF FTT] <br> <br> <br> <br>]
.center[.title[On performance of temporal aggregation in time series forecasting]]
.sticker-float[![logo](resources/carbts_t.png)]

.bottom[
Bahman Rostami-Tabar (<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#1da1f2;overflow:visible;position:relative;"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg>[@Bahman_R_T](https://twitter.com/Bahman_R_T)) <br>
Website [www.bahmanrt.com](https://www.bahmanrt.com/)
]

---
background-image: url("resources/hierarchy-left.jpeg")
background-size: contain
background-position: left
class: middle

.pull-right2[
## Outline

- Temporal aggregation: why do we need it in time series forecasting and what are the common approaches?

- How does temporal aggregation approaches perfrom on M4 competition data?

- Whether combining forecasts generated by temporal aggregation improves the forecast accuracy? how to combine (**Working paper 1**)?

- How data temporal aggregation changes time series features and how might time series features affect the forecasting performance of AD versus AF (**Working paper 2**)?
]

---
background-image: url("resources/hierarchy-left.jpeg")
background-size: contain
background-position: left
class: middle

.pull-right2[
## Outline

- .remember[Temporal aggregation: why do we need it in time series forecasting and what are the common approaches?]

- How does temporal aggregation approaches perfrom on M4 competition data?

- Whether combining forecasts generated by temporal aggregation improves the forecast accuracy? how to combine (**Working paper 1**)?

- How data temporal aggregation changes time series features and how might time series features affect the forecasting performance of AD versus AF (**Working paper 2**)?
]

---

## Using time series forecasting to inform decisions

.center[<img src="figs/Framework.png" width="700px">]

.footnote[Babai, M. Zied, John E. Boylan, and Bahman Rostami-Tabar. "Demand forecasting in supply chains: a review of aggregation and hierarchical approaches." International Journal of Production Research (2021): 1-25.]

---
## Data and forecast time granularity

*  Forecasting time granularity level and its horizon are determined by decisions made in the light of forecast.

* One common assumption is that time series granularity matches forecast requirement, i.e. to produce daily forecasts, we use daily time series.

* However, the level of time series granularity .remember[does not necessarily match] the level of forecast granularity.

* The level of temporal granularity in the forecast might be lower than the existing time series granularity. For instance, while a forecast might be required at the annual level, a monthly time series is available. With advances in IT, data is often recorded at the finest temporal granularity (e.g. arrival time)

---
## Time series forecasting problem
<br><br>

* We consider a time series forecasting problem where an original time series has a higher temporal granularity (e.g. monthly) than the required forecast (e.g. annual).

* We aim to generate a forecast of the total value over a number of time periods ahead,  .remember[forecast horizon aggregation] or forecast over the leadtime period.

.footnote[1 Mohammadipour, Maryam, and John E. Boylan. "Forecast horizon aggregation in integer autoregressive moving average (INARMA) models." Omega 40.6 (2012): 703-712.]

---
class: middle

**A key question then to be answered is: **

should the original series be used to generated the forecast for the required horizon and then sum them up to obtain the forecast horizon aggregation (lead-time), i.e. .remember[Aggregate Forecast (AF)] or should we first aggregate time series to match the forecast requirement granularity and then extrapolate directly at that level, i.e. .remember[Aggregate Data (AD)].

** I will illustrate these approaches usign a simple example.**

.footnote[**There is no disaggregation to the original time granularity**]

---
class: inverse
## Terminilogy

**One time series**

- Data time granularity (e.g. daily, monthly, annual)
- Forecast time granularity (e.g. daily, monthly, annual)
- Forecast horizon (e.g. 12 months ahead)
- Forecast horizon aggregation /leadtime (e.g. 1 week, 1 quarter, 1 year)
- Temporal aggregation
    * Aggregate Forecast (or Bottom-Up)
    * Aggregate Data
      - Non-overlapping temporal aggregation (NOA)
      - Overlapping temporal aggregation (OA)
          
---
## Forecast horizon aggregation: an example

---
## Temporal aggregation: aggregate forecast

---
## Temporal aggregation: aggregate forecast

---
## Non-overlapping temporal aggregation: aggregate data

---
## Overlapping temporal aggregation

---
## Using information at multiple levels of time granularity instead of a single level - .remember[MAPA]

.footnote[Kourentzes, Nikolaos, Fotios Petropoulos, and Juan R. Trapero. "Improving forecasting by estimating time series structural components across multiple frequencies." International Journal of Forecasting 30.2 (2014): 291-302.]

---
## Using information at multiple levels of time granularity instead of a single level- .remember[temporal hierarchies]

.footnote[Athanasopoulos, George, et al. "Forecasting with temporal hierarchies." European Journal of Operational Research 262.1 (2017): 60-74.]

---
class: inverse, center, middle

**It is often recommended to aggregate data and then forecast when a time series history is recorded at a higher frequency time granularity (e.g. monthly) and forecast is required at alower level (e.e. annual).**

For an exmpel, please refer to page 153 of Profit from Your Forecasting Software, by Paul Goodwin.

**Let's examine the performance of aggregating data versus aggregating forecat approaches using M4 competition dataset**

---
background-image: url("resources/hierarchy-left.jpeg")
background-size: contain
background-position: left
class: middle

.pull-right2[
## Outline

- Temporal aggregation: why do we need it in time series forecasting and what are the common approaches?

- .remember[How does temporal aggregation approaches perfrom on M4 competition data?]

- Whether combining forecasts generated by temporal aggregation improves the forecast accuracy? how to combine (**Working paper 1**)?

- How data temporal aggregation changes time series features and how might time series features affect the forecasting performance of AD versus AF? (**Working paper 2**)
]

---
## Time series data

.pull-left[
- M4 competition data time series

- 24,000 Quarterly
    - 48,000 monthly
    - 4,227 daily
    
- Time series features
    - 42 features
    - Extract features using `tsfeatures::tsfeatures()` in R
]

.pull-right[
- Forecasting methods: Exponential Smoothing State Space (ETS) (ARIMA is also considered).
- Point forecast accuracy measure: Mean Absolute Scaled Error (MASE), Root Mean Squared Scaled Error (RMSSE), and more.
- Time series cross validation is performed.
]

.footnote[https://supplychainanalytics.shinyapps.io/Evaluation_of_ML_models/.]

---
## M4 Monthly time series features

---
## M4 Monthly time series features

---
### Percentage of series for which each approach was more accurate ( using MASE)

---
## Performance of AF vs. AD (based on non-overlapping temporal aggregation)

---
## Questions

Given the comparative performance of temporal aggregation approaches :

* Whether combining forecasts generated by Bottom-Up (BU), Non-overlapping (NOA) and Overlapping approaches (OA) improves the forecast accuracy? how to combine?

* How data temporal aggregation changes time series features and is there any association between time series features and the forecasting performance of AD versus AF?

---
background-image: url("resources/hierarchy-left.jpeg")
background-size: contain
background-position: left
class: middle

.pull-right2[
## Outline

- Temporal aggregation: why do we need it in time series forecasting and what are the common approaches?

- How does temporal aggregation approaches perfrom on M4 competition data?

- .remember[Whether combining forecasts generated by temporal aggregation improves the forecast accuracy? how to combine (**Working paper 1**)?]

- How data temporal aggregation changes time series features and how might time series features affect the forecasting performance of AD versus AF (**Working paper 2**)?
]

---
## Experiment design - 1

---
## Combining algorithm

.footnote[Cesa-Bianchi, Nicolo, and Gábor Lugosi. "Potential-based algorithms in on-line prediction and game theory." Machine Learning 51.3 (2003): 239-261.]

---
### Mean (median) MASE for M4 monthly series with ETS forecasting method

---
background-image: url("resources/hierarchy-left.jpeg")
background-size: contain
background-position: left
class: middle

.pull-right2[
## Outline

- Temporal aggregation: why do we need it in time series forecasting and what are the common approaches?

- How does temporal aggregation approaches perfrom on M4 competition data?

- Whether combining forecasts generated by temporal aggregation improves the forecast accuracy? how to combine (**Working paper 1**)?

- .remember[How data temporal aggregation changes time series features and how might time series features affect the forecasting performance of AD versus AF (**Working paper 2**)?]

]

---
## Experiment design - 2

---
## How does non-overlapping TA change time series features?

---
## How does non-overlapping TA change time series features (continue)?

---
## Features relationship and AD/AF performance

---
# MCB test for all classiefiers

We also use missclassification error, F-statistics and Area under the Curve(AUC).

---
## Important features

???
feature importance or variable importance, help us understand which features are most important in driving the predictions of these two models overall, aggregated over the whole training set.
One way to compute variable importance is to permute the features (Breiman 2001a). We can permute or shuffle the values of a feature, predict from the model, and then measure how much worse the model fits the data compared to before shuffling.

---
## Partial dependence plot
### Probability of AF performing better
<img src="figure/pfinal.png" width="45%" style="display: block; margin: auto;" />

---
## Partial dependence plot (continue)
### Probability of AF performing better

---
## Summary and conclusions (continue)

- Although aggregating time series seems to be intutive, it might not always improve forecast accuracy. Our results indicate that Aggregate Forecast is a competitive approach, but neither of them dominate. They both have a merit.

- Combining aggregate data (non-overlapping and overlapping) and aggregate forecast approaches improve forecast accuracy. Combination again works here.

- Aggregate data using temporal aggregation changes the features of time series. The magnitude of the change varies for different features. In particular, we observe that with increase in the aggregation level, the strength of seasonality, the autocorrelation, coefficient of variation, linearity, curvature  and KPSS unitroot statistic decrease. However, non-linearity, mean, variance, ARCH.LM, trend , unitroot pp statistics increase. Entropy is the only measure that both increases and decreases based on its initial value.

---
## Summary and conclusions

- Random Forest model is the most accurate classifier among ML algorithm in predicting which approach provides more accurate forecast given a set of time series features as input.

- The most important features for predicting whether AF or AD should be used for a given monthly time series in M4 competition include *curvature*, *nonlinearity*, *seas_pacf*, *unitroot_up*, *mean*, *ARCHM.LM*, *Coifficient of Variation*, *stability*, *linearity* and *max_level_shift*.

- Increasing trend, ARCH.LM, hurst, autocorrelation lag 1 and unitroot_pp and seas_pacf may increases the chance of AF performing better.

- Increasing lumpiness, entropy, no-linearity, curvature, stremgth of seasonality may increase the chance of AD performing better, so the strong presence of these features may favorite AD over AF.

---

.pull-left[
### Wrok in progress
- Rostami-Tabar B., Goltsos T. Wang, S. (2022), Forecasting for lead-time period by temporal aggregation: Whether to combine and how

- Rostami-Tabar B., Mercetic D. (2022), On time series features and the perfromance of emporal aggregation
]

.pull-righ[
### Published recently
- Mircetic, D., et al. (2021), "[Forecasting hierarchical time series in supply chains: an empirical investigation](https://www.tandfonline.com/doi/full/10.1080/00207543.2021.1896817)." International Journal of Production Research, 1-20.
- Babai. M.Z., Boylan, J., Rostami-Tabar, B. (2021), "[Demand Forecasting in Supply Chains: A Review of Aggregation and Hierarchical Approaches](https://www.tandfonline.com/doi/full/10.1080/00207543.2021.2005268)", International Journal of Production Research, 1-25.
]

---
## References for temporal aggregation forecasting

- [An aggregate–disaggregate intermittent demand approach (ADIDA) to forecasting: an empirical proposition and analysis](https://www.tandfonline.com/doi/full/10.1057/jors.2010.32?casa_token=FLX_iKeIDXcAAAAA%3ACXYWY6jICM_1_ayaadc8GXxN05kAFo5I_qqmt7XvBjEMTHBUTWLA8kziBWQhUVj-BdNWTwJnIw). Journal of the Operational Research Society.
- [Improving forecasting via multiple temporal aggregation](https://www.sciencedirect.com/science/article/pii/S0169207013001477?casa_token=PhrGiXHJJzsAAAAA:-PU7metoOVL4G7avKR6NT9m5kzGNHPy5Lo14iEhVHqtju_L_hRUatM0M3CV3UilcBA47EuU). International Journal of Forecasting.
- [Demand forecasting by temporal aggregation](https://onlinelibrary.wiley.com/doi/full/10.1002/nav.21546?casa_token=wfP5AIk8wAQAAAAA%3A4skkyZgQCyVdftE194ZG_16CgG7CfL6-6_kb2Sqi0aiJ0aC4cWL4x2bmmRMPdupj4P4_9lihPLj3), Naval Research Logistics
- [Forecasting with temporal hierarchies](https://www.sciencedirect.com/science/article/pii/S0377221717301911?casa_token=wVe_QYpCEFoAAAAA:LT-rFP_KTK8Wbr1iQnqpGpNXjKiocfoSBuM4-0SfYTEB_6njOQcELohyPLiuPQuSgEkstCc), European Journal of Operational Research

---
class: inverse, middle

- Slides and papers: [www.bahmanrt.com](www.bahmanrt.com)
- Check out also [www.f4sg.org](www.f4sg.org)

<br><br>
<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> Say hello: [@Bahman_R_T](https://twitter.com/Bahman_R_T)