Modern Time Series Forecasting with Python - Manu Joseph - E-Book

Modern Time Series Forecasting with Python E-Book

Manu Joseph

0,0
38,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

We live in a serendipitous era where the explosion in the quantum of data collected and a renewed interest in data-driven techniques such as machine learning (ML), has changed the landscape of analytics, and with it, time series forecasting. This book, filled with industry-tested tips and tricks, takes you beyond commonly used classical statistical methods such as ARIMA and introduces to you the latest techniques from the world of ML.
This is a comprehensive guide to analyzing, visualizing, and creating state-of-the-art forecasting systems, complete with common topics such as ML and deep learning (DL) as well as rarely touched-upon topics such as global forecasting models, cross-validation strategies, and forecast metrics. You’ll begin by exploring the basics of data handling, data visualization, and classical statistical methods before moving on to ML and DL models for time series forecasting. This book takes you on a hands-on journey in which you’ll develop state-of-the-art ML (linear regression to gradient-boosted trees) and DL (feed-forward neural networks, LSTMs, and transformers) models on a real-world dataset along with exploring practical topics such as interpretability.
By the end of this book, you’ll be able to build world-class time series forecasting systems and tackle problems in the real world.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 787

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Modern Time Series Forecasting with Python

Explore industry-ready time series forecasting using modern machine learning and deep learning

Manu Joseph

BIRMINGHAM—MUMBAI

Modern Time Series Forecasting with Python

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Dhruv Kataria

Senior Editors: Roshan Ravikumar, Tazeen Shaikh

Content Development Editor: Shreya Moharir

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Subalakshmi Govindhan

Production Designer: Alishon Mendonca

Marketing Coordinator: Shifa Ansari

First published: November 2022

Production reference: 1181122

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80324-680-2

www.packt.com

For my son, Zane,

For his boundless curiosity,

For his endless questions,

And for his innocent love of learning.

(All great qualities for adults who read this book as well.)

Contributors

About the author

Manu Joseph is a self-made data scientist with more than a decade of experience working with many Fortune 500 companies, enabling digital and AI transformations, specifically in machine learning-based demand forecasting. He is considered an expert, thought leader, and strong voice in the world of time series forecasting. Currently, Manu leads applied research at Thoucentric, where he advances research by bringing cutting-edge AI technologies to the industry. He is also an active open source contributor and has developed an open source library—PyTorch Tabular—which makes deep learning for tabular data easy and accessible. Originally from Thiruvananthapuram, India, Manu currently resides in Bengaluru, India, with his wife and son.

About the reviewers

Dr. Julien Siebert is currently working as a researcher at the Fraunhofer Institute for Experimental Software Engineering (IESE), in Kaiserslautern, Germany. He studied engineering sciences and AI and obtained a PhD in computer science on the topic of modeling and simulation of complex systems. After several years of research both in computer science and theoretical physics, Dr. Julien Siebert worked as a data scientist for an e-commerce fashion company. Since 2018, he has been working at the intersection between software engineering and data science.

Gerzson David Boros is the owner and CEO of Data Science Europe and a senior data scientist who has been involved in data science for more than 10 years. He has an MSc and is a candidate for an MBA. In the last 5 years, he and his team have made business proposals for 100 different executives and worked on more than 30 different projects on the topic of data science and artificial intelligence. His motto is “Social responsibility is also achievable with the help of data.”

Table of Contents

Preface

Part 1 – Getting Familiar with Time Series

1

Introducing Time Series

Technical requirements

What is a time series?

Types of time series

Main areas of application for time series analysis

Data-generating process (DGP)

Generating synthetic time series

Stationary and non-stationary time series

What can we forecast?

Forecasting terminology

Summary

Further reading

2

Acquiring and Processing Time Series Data

Technical requirements

Understanding the time series dataset

Preparing a data model

pandas datetime operations, indexing, and slicing – a refresher

Converting the date columns into pd.Timestamp/DatetimeIndex

Using the .dt accessor and datetime properties

Slicing and indexing

Creating date sequences and managing date offsets

Handling missing data

Converting the half-hourly block-level data (hhblock) into time series data

Compact, expanded, and wide forms of data

Enforcing regular intervals in time series

Converting the London Smart Meters dataset into a time series format

Mapping additional information

Saving and loading files to disk

Handling longer periods of missing data

Imputing with the previous day

Hourly average profile

The hourly average for each weekday

Seasonal interpolation

Summary

3

Analyzing and Visualizing Time Series Data

Technical requirements

Components of a time series

The trend component

The seasonal component

The cyclical component

The irregular component

Visualizing time series data

Line charts

Seasonal plots

Seasonal box plots

Calendar heatmaps

Autocorrelation plot

Decomposing a time series

Detrending

Deseasonalizing

Implementations

Detecting and treating outliers

Standard deviation

Interquartile range (IQR)

Isolation Forest

Extreme studentized deviate (ESD) and seasonal ESD (S-ESD)

Treating outliers

Summary

References

Further reading

4

Setting a Strong Baseline Forecast

Technical requirements

Setting up a test harness

Creating holdout (test) and validation datasets

Choosing an evaluation metric

Generating strong baseline forecasts

Naïve forecast

Moving average forecast

Seasonal naive forecast

Exponential smoothing (ETS)

ARIMA

Theta Forecast

Fast Fourier Transform forecast

Evaluating the baseline forecasts

Assessing the forecastability of a time series

Coefficient of Variation (CoV)

Residual variability (RV)

Entropy-based measures

Kaboudan metric

Summary

References

Further reading

Part 2 – Machine Learning for Time Series

5

Time Series Forecasting as Regression

Understanding the basics of machine learning

Supervised machine learning tasks

Overfitting and underfitting

Hyperparameters and validation sets

Time series forecasting as regression

Time delay embedding

Temporal embedding

Global forecasting models – a paradigm shift

Summary

References

Further reading

6

Feature Engineering for Time Series Forecasting

Technical requirements

Feature engineering

Avoiding data leakage

Setting a forecast horizon

Time delay embedding

Lags or backshift

Rolling window aggregations

Seasonal rolling window aggregations

Exponentially weighted moving averages (EWMA)

Temporal embedding

Calendar features

Time elapsed

Fourier terms

Summary

7

Target Transformations for Time Series Forecasting

Technical requirements

Handling non-stationarity in time series

Detecting and correcting for unit roots

Unit roots

The Augmented Dickey-Fuller (ADF) test

Differencing transform

Detecting and correcting for trends

Deterministic and stochastic trends

Kendall’s Tau

Mann-Kendall test (M-K test)

Detrending transform

Detecting and correcting for seasonality

Detecting seasonality

Deseasonalizing transform

Detecting and correcting for heteroscedasticity

Detecting heteroscedasticity

Log transform

Box-Cox transform

AutoML approach to target transformation

Summary

References

Further reading

8

Forecasting Time Series with Machine Learning Models

Technical requirements

Training and predicting with machine learning models

Generating single-step forecast baselines

Standardized code to train and evaluate machine learning models

FeatureConfig

MissingValueConfig

ModelConfig

MLForecast

Helper functions for evaluating models

Linear regression

Regularized linear regression

Decision trees

Random forest

Gradient boosting decision trees

Training and predicting for multiple households

Using AutoStationaryTransformer

Summary

References

Further reading

9

Ensembling and Stacking

Technical requirements

Combining forecasts

Best fit

Measures of central tendency

Simple hill climbing

Stochastic hill climbing

Simulated annealing

Optimal weighted ensemble

Stacking or blending

Summary

References

Further reading

10

Global Forecasting Models

Technical requirements

Why Global Forecasting Models (GFMs)?

Sample size

Cross-learning

Multi-task learning

Engineering complexity

Creating GFMs

Strategies to improve GFMs

Increasing memory

Using time series meta-features

Tuning hyperparameters

Partitioning

Bonus – interpretability

Summary

References

Further reading

Part 3 – Deep Learning for Time Series

11

Introduction to Deep Learning

Technical requirements

What is deep learning and why now?

Why now?

What is deep learning?

Perceptron – the first neural network

Components of a deep learning system

Representation learning

Linear transformation

Activation functions

Output activation functions

Loss function

Forward and backward propagation

Summary

References

Further reading

12

Building Blocks of Deep Learning for Time Series

Technical requirements

Understanding the encoder-decoder paradigm

Feed-forward networks

Recurrent neural networks

The RNN layer in PyTorch

Long short-term memory (LSTM) networks

The LSTM layer in PyTorch

Gated recurrent unit (GRU)

The GRU layer in PyTorch

Convolution networks

Convolution

Padding, stride, and dilations

The convolution layer in PyTorch

Summary

References

Further reading

13

Common Modeling Patterns for Time Series

Technical requirements

Tabular regression

Single-step-ahead recurrent neural networks

Sequence-to-sequence (Seq2Seq) models

RNN-to-fully connected network

RNN-to-RNN

Summary

Reference

Further reading

14

Attention and Transformers for Time Series

Technical requirements

What is attention?

The generalized attention model

Alignment functions

The distribution function

Forecasting with sequence-to-sequence models and attention

Transformers – Attention is all you need

Attention is all you need

Transformers in time series

Forecasting with Transformers

Summary

References

Further reading

15

Strategies for Global Deep Learning Forecasting Models

Technical requirements

Creating global deep learning forecasting models

Preprocessing the data

Understanding TimeSeriesDataset from PyTorch Forecasting

Building the first global deep learning forecasting model

Using time-varying information

Using static/meta information

One-hot encoding and why it is not ideal

Embedding vectors and dense representations

Defining a model with categorical features

Using the scale of the time series

Balancing the sampling procedure

Visualizing the data distribution

Tweaking the sampling procedure

Using and visualizing the dataloader with WeightedRandomSampler

Summary

Further reading

16

Specialized Deep Learning Architectures for Forecasting

Technical requirements

The need for specialized architectures

Neural Basis Expansion Analysis for Interpretable Time Series Forecasting (N-BEATS)

The architecture of N-BEATS

Forecasting with N-BEATS

Interpreting N-BEATS forecasting

Neural Basis Expansion Analysis for Interpretable Time Series Forecasting with Exogenous Variables (N-BEATSx)

Handling exogenous variables

Exogenous blocks

Neural Hierarchical Interpolation for Time Series Forecasting (N-HiTS)

The Architecture of N-HiTS

Forecasting with N-HiTS

Informer

The architecture of the Informer model

Forecasting with the Informer model

Autoformer

The architecture of the Autoformer model

Forecasting with Autoformer

Temporal Fusion Transformer (TFT)

The Architecture of TFT

Forecasting with TFT

Interpreting TFT

Interpretability

Probabilistic forecasting

Probability Density Function (PDF)

Quantile functions

Other approaches

Summary

References

Further reading

Part 4 – Mechanics of Forecasting

17

Multi-Step Forecasting

Why multi-step forecasting?

Recursive strategy

Training regime

Forecasting regime

Direct strategy

Training regime

Forecasting regime

Joint strategy

Training regime

Forecasting regime

Hybrid strategies

DirRec Strategy

Iterative block-wise direct strategy

Rectify strategy

RecJoint

How to choose a multi-step forecasting strategy?

Summary

References

18

Evaluating Forecasts – Forecast Metrics

Technical requirements

Taxonomy of forecast error measures

Intrinsic metrics

Extrinsic metrics

Investigating the error measures

Loss curves and complementarity

Bias towards over- or under-forecasting

Experimental study of the error measures

Using Spearman’s rank correlation

Guidelines for choosing a metric

Summary

References

Further reading

19

Evaluating Forecasts – Validation Strategies

Technical requirements

Model validation

Holdout strategies

Window strategy

Calibration strategy

Sampling strategy

Cross-validation strategies

Choosing a validation strategy

Validation strategies for datasets with multiple time series

Summary

References

Further reading

Index

Other Books You May Enjoy

Part 1 – Getting Familiar with Time Series

We dip our toes into time series forecasting by understanding what a time series is, how to process and manipulate time series data, and how to analyze and visualize time series data. This part also covers classical time series forecasting methods, such as ARIMA, to serve as strong baselines.

This part comprises the following chapters:

Chapter 1, Introducing Time SeriesChapter 2, Acquiring and Processing Time Series DataChapter 3, Analyzing and Visualizing Time Series DataChapter 4, Setting a Strong Baseline Forecast

Introducing Time Series

2

Acquiring and Processing Time Series Data

In the previous chapter, we learned what a time series is and established a few standard notations and terminologies. Now, let’s switch tracks from theory to practice. In this chapter, we are going to get our hands dirty and start working with data. Although we said time series data is everywhere, we are still yet to get our hands dirty with a few time series datasets. We are going to start working on the dataset we have chosen to work on throughout this book, process it in the right way, and learn about a few techniques for dealing with missing values.

In this chapter, we will cover the following topics:

Understanding the time series datasetpandas datetime operations, indexing, and slicing – a refresherHandling missing dataMapping additional informationSaving and loading files to diskHandling longer periods of missing data

Technical requirements

You will need to set up the Anaconda environment following the instructions in the Preface of the book to get a working environment with all the packages and datasets required for the code in this book.

The code for this chapter can be found at https://github.com/PacktPublishing/Modern-Time-Series-Forecasting-with-Python-/tree/main/notebooks/Chapter02.

Handling time series data is like handling other tabular datasets, but with a focus on the temporal dimension. As with any tabular dataset, pandas is perfectly equipped to handle time series data as well.

Let’s start getting our hands dirty and work through a dataset from the beginning. We are going to use the London Smart Meters dataset throughout this book. If you have not downloaded the data already as part of the environment setup, go to the Preface and do that now.

Understanding the time series dataset

This is the key first step in any new dataset you come across, even before Exploratory Data Analysis (EDA), which we will be covering in Chapter 3, Analyzing and Visualizing Time Series Data. Understanding where the data is coming from, the data generating process behind it, and the source domain is essential to having a good understanding of the dataset.

London Data Store, a free and open data-sharing portal, provided this dataset, which was collected and enriched by Jean-Michel D and uploaded on Kaggle.

The dataset contains energy consumption readings for a sample of 5,567 London households that took part in the UK Power Networks-led Low Carbon London project between November 2011 and February 2014. Readings were taken at half-hourly intervals. Some metadata about the households is also available as part of the dataset. Let’s look at what metadata is available as part of the dataset:

CACI UK segmented the UK’s population into demographic types, called Acorn. For each household in the data, we have the corresponding Acorn classification. The Acorn classes (Lavish Lifestyles, City Sophisticates, Student Life, and so on) are grouped into parent classes (Affluent Achievers, Rising Prosperity, Financially Stretched, and so on). A full list of Acorn classes can be found in Table 2.1. The complete documentation detailing each class is available at https://acorn.caci.co.uk/downloads/Acorn-User-guide.pdf.The dataset contains two groups of customers – one group who was subjected to dynamic time-of-use (dToU) energy prices throughout 2013, and another group who were on flat-rate tariffs. The tariff prices for the dToU were given a day ahead via the Smart Meter IHD or via text message.Jean-Michel D also enriched the dataset with weather and UK bank holidays data.

The following table shows the Acorn classes:

Table 2.1 – ACORN classification

Important note

The Kaggle dataset also preprocesses the time series data daily and combines all the separate files. Here, we will ignore those files and start with the raw files, which can be found in the hhblock_dataset folder. Learning to work with the raw files is an integral part of working with real-world datasets in the industry.

Preparing a data model

Once we understand where the data is coming from, we can look at the data, understand the information present in the different files, and figure out a mental model of how to relate the different files. You may call it old school, but Microsoft Excel is an excellent tool for gaining this first-level understanding. If the file is too big to open in Excel, we can also read it in Python and save a sample of the data to an Excel file and open it. However, keep in mind that Excel sometimes messes with the format of the data, especially dates, so we need to take care to not save the file and write back the formatting changes Excel made. If you are allergic to Excel, you can do it in Python as well, albeit with a lot more keystrokes. The purpose of this exercise is to see what the different data files contain, explore the relationship between the different files, and so on. We can make this more formal and explicit by drawing a data model, similar to the one shown in the following diagram:

Figure 2.1 – Data model of the London Smart Meters dataset

The data model is more for us to understand the data rather than any data engineering purpose. Therefore, it only contains bare-minimum information, such