| Top | SciKit-Learn Way | SKTime Way | Multivariate | Panel Data | SKLearn & SKTime | Univariate Forecasting | Advanced Workflow | Forecasting with Exogeneous | Building a Forecaster | Time Series Classification | Time Series Regression |
SKTime
| Documentation | PyData Global 2021 | SKTime GitHub |


| Top | SciKit-Learn Way | SKTime Way | Multivariate | Panel Data | SKLearn & SKTime | Univariate Forecasting | Advanced Workflow | Forecasting with Exogeneous | Building a Forecaster | Time Series Classification | Time Series Regression |
The SciKit-Learn Way

Diabetes Dataset (documentation)

Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline.

Inputs and Targets
Initial Visualization
Visualization: age vs disease progression
Visualization: bmi vs disease progression
Visualization: bp vs disease progression
Workflow with SciKit-Learn
  1. Model Specification
  2. Fitting
  3. Prediction
  4. Evaluation
Model
Modular Model Building & Pipelines with SciKit-Learn
Summary


| Top | SciKit-Learn Way | SKTime Way | Multivariate | Panel Data | SKLearn & SKTime | Univariate Forecasting | Advanced Workflow | Forecasting with Exogeneous | Building a Forecaster | Time Series Classification | Time Series Regression |
The SKTime Way

Lynx Dataset (documentation)

The annual numbers of lynx trappings for 1821–1934 in Canada. This time-series records the number of skins of predators (lynx) that were collected over several years by the Hudson’s Bay Company. The dataset was taken from Brockwell & Davis (1991) and appears to be the series considered by Campbell & Walker (1977).

Dimensionality: univariate Series length: 114 Frequency: Yearly Number of cases: 1

This data shows aperiodic, cyclical patterns, as opposed to periodic, seasonal patterns.


| Top | SciKit-Learn Way | SKTime Way | Multivariate | Panel Data | SKLearn & SKTime | Univariate Forecasting | Advanced Workflow | Forecasting with Exogeneous | Building a Forecaster | Time Series Classification | Time Series Regression |
Multivariate Data
* Repeated observations over time from multiple related variables or kinds of measurement

Longley Dataset (documentation)

This mulitvariate time series dataset contains various US macroeconomic variables from 1947 to 1962 that are known to be highly collinear.

Dimensionality: multivariate, 6 Series length: 16 Frequency: Yearly Number of cases: 1

Variable description:

TOTEMP - Total employment GNPDEFL - Gross national product deflator GNP - Gross national product UNEMP - Number of unemployed ARMED - Size of armed forces POP - Population

Visualization: multiple input series


| Top | SciKit-Learn Way | SKTime Way | Multivariate | Panel Data | SKLearn & SKTime | Univariate Forecasting | Advanced Workflow | Forecasting with Exogeneous | Building a Forecaster | Time Series Classification | Time Series Regression |
Panel Data
Arrowhead Dataset (documentation)

Dimensionality: univariate Series length: 251 Train cases: 36 Test cases: 175 Number of classes: 3

The arrowhead data consists of outlines of the images of arrowheads. The shapes of the projectile points are converted into a time series using the angle-based method. The classification of projectile points is an important topic in anthropology. The classes are based on shape distinctions such as the presence and location of a notch in the arrow. The problem in the repository is a length normalised version of that used in Ye09shapelets. The three classes are called “Avonlea”, “Clovis” and “Mix”.”

Visualization: multiple input samples


| Top | SciKit-Learn Way | SKTime Way | Multivariate | Panel Data | SKLearn & SKTime | Univariate Forecasting | Advanced Workflow | Forecasting with Exogeneous | Building a Forecaster | Time Series Classification | Time Series Regression |
SKLearn + SKTime: Multiple Learning Tasks
Reduction: From one learning task to another

**Overview**

**Example: From forecasting to regression**

Creating a unified framework
What's a framework?

Check out our glossary of common terms:

A collection of related and reusable software design templates that practitioners can copy and fill in. Frameworks emphasize design reuse. They capture common software design decisions within a given application domain and distill them into reusable design templates. This reduces the design decision they must take, allowing them to focus on application specifics. Not only can practitioners write software faster as a result, but applications will have a similar structure. Frameworks often offer additional functionality like toolboxes. Compare with toolbox and application.

Check out our extension templates!


| Top | SciKit-Learn Way | SKTime Way | Multivariate | Panel Data | SKLearn & SKTime | Univariate Forecasting | Advanced Workflow | Forecasting with Exogeneous | Building a Forecaster | Time Series Classification | Time Series Regression |
Univariate Forecasting

Univariate forecasting

In forecasting, we're interested in using past data to make temporal forward predictions. sktime provides common statistical forecasting algorithms and tools for building composite machine learning models.

The basic workflow
  1. Specify data
  2. Specify task
  3. Specify model
  4. Fit
  5. Predict
Data Specification

Shampoo Sales Dataset (documentation)

This dataset describes the monthly number of sales of shampoo over a 3 year period. The units are a sales count.

Dimensionality: univariate Series length: 36 Frequency: Monthly Number of cases: 1

Task specification

Next we will define a **forecasting task**

The Forecasting Horizon

When we want to generate forecasts, we need to specify the forecasting horizon and pass that to our forecasting algorithm. We can specify the forecasting horizon as a numpy array of the steps ahead relative to the end of the training series:

Init signature:
ForecastingHorizon(
    values: Union[int, list, numpy.ndarray, pandas.core.indexes.base.Index] = None,
    is_relative: bool = None,
    freq=None,
)
Docstring:     
Forecasting horizon.

Parameters
----------
values : pd.Index, pd.TimedeltaIndex, np.array, list, pd.Timedelta, or int
    Values of forecasting horizon
is_relative : bool, optional (default=None)
    - If True, a relative ForecastingHorizon is created:
            values are relative to end of training series.
    - If False, an absolute ForecastingHorizon is created:
            values are absolute.
    - if None, the flag is determined automatically:
        relative, if values are of supported relative index type
        absolute, if not relative and values of supported absolute index type
freq : str, pd.Index, pandas offset, or sktime forecaster, optional (default=None)
    object carrying frequency information on values
    ignored unless values is without inferrable freq
Converting between absolute and relative forecast horizons

to_relative() - cutoff allows you to determin when/where to convert to relative from absolute

Signature: ForecastingHorizon.to_relative(self, cutoff=None) Docstring: Return forecasting horizon values relative to a cutoff.

Parameters

cutoff : pd.Period, pd.Timestamp, int, or pd.Index, optional (default=None)
    Cutoff value required to convert a relative forecasting
    horizon to an absolute one (and vice versa).
    If pd.Index, last/latest value is considered the cutoff

Returns

fh : ForecastingHorizon
    Relative representation of forecasting horizon.