# Time Series in 5-Minutes, Part 5: Anomaly Detection

Written by Matt Dancho

Have 5-minutes? Then letâ€™s learn time series. In this short articles series, I highlight how you can get up to speed quickly on important aspects of time series analysis. Today we are focusing analyzing anomalies in time series data.

## Time Series in 5-Mintues Articles in this Series

I just released timetk 2.0.0 (read the release announcement). A ton of new functionality has been added. Weâ€™ll discuss some of the key pieces in this article series:

# Have 5-Minutes? Then letâ€™s learn Time Series Anomaly Detection

Anomaly detection is an important part of time series analysis:

1. Detecting anomalies can signify special events
2. Cleaning anomalies can improve forecast error

In this short tutorial, we will cover the plot_anomaly_diagnostics() and tk_anomaly_diagnostics() functions for visualizing and automatically detecting anomalies at scale.

# Letâ€™s Get Started

First setup the libraries weâ€™ll use:

# Data

This tutorial will use the walmart_sales_weekly dataset:

• Weekly
• Sales spikes at various events

# Automatic Anomaly Detection

To get the data on the anomalies, we use tk_anomaly_diagnostics(), the preprocessing function.

The tk_anomaly_diagnostics() method for anomaly detection implements a 2-step process to detect outliers in time series.

#### Step 1: Detrend & Remove Seasonality using STL Decomposition

The decomposition separates the â€śseasonâ€ť and â€śtrendâ€ť components from the â€śobservedâ€ť values leaving the â€śremainderâ€ť for anomaly detection.

The user can control two parameters: frequency and trend.

1. .frequency: Adjusts the â€śseasonâ€ť component that is removed from the â€śobservedâ€ť values.
2. .trend: Adjusts the trend window (t.window parameter from stats::stl() that is used.

The user may supply both .frequency and .trend as time-based durations (e.g. â€ś6 weeksâ€ť) or numeric values (e.g. 180) or â€śautoâ€ť, which predetermines the frequency and/or trend based on the scale of the time series using the tk_time_scale_template().

#### Step 2: Anomaly Detection

Once â€śtrendâ€ť and â€śseasonâ€ť (seasonality) is removed, anomaly detection is performed on the â€śremainderâ€ť. Anomalies are identified, and boundaries (recomposed_l1 and recomposed_l2) are determined.

The Anomaly Detection Method uses an inner quartile range (IQR) of +/-25 the median.

With the default alpha = 0.05, the limits are established by expanding the 25/75 baseline by an IQR Factor of 3 (3X). The IQR Factor = 0.15 / alpha (hence 3X with alpha = 0.05):

• To increase the IQR Factor controlling the limits, decrease the alpha, which makes it more difficult to be an outlier.
• Increase alpha to make it easier to be an outlier.
• The IQR outlier detection method is used in forecast::tsoutliers().
• A similar outlier detection method is used by Twitterâ€™s AnomalyDetection package.
• Both Twitter and Forecast tsoutliers methods have been implemented in Business Scienceâ€™s anomalize package.

# Anomaly Visualization

Using the plot_anomaly_diagnostics() function, we can interactively detect anomalies at scale.

The plot_anomaly_diagnostics() is a visualtion wrapper for tk_anomaly_diagnostics() group-wise anomaly detection, implementing the 2-step process from above.

# Have questions on using Timetk for time series?

Make a comment in the chat below. đź‘‡

And, if you plan on using timetk for your business, itâ€™s a no-brainer - Join the Time Series Course.