Quick Start#

Knowledge discovery in time series data#

KI_logo.png


KnowIt (Knowledge discovery In Time series data) is an extendable toolkit to build and interpret deep time series models. It is developed in Python and uses PyTorch Lightning for model building and Captum for model interpreting.

Installation#

  • Clone the KnowIt directory locally

  • Create a conda environment with the provided KnowIt/environment.yml file

  • Installation package TBD

Usage overview#

# Import the KnowIt module
from knowit import KnowIt

# Instantiate a KnowIt object linked to an output directory
KI = KnowIt(custom_exp_dir='my_output_path')

# Define your dataset
data_args = {'name': 'synth_2',
             'task': 'regression',
             'in_components': ['x1', 'x2', 'x3', 'x4'],
             'out_components': ['x5'],
             'in_chunk': [-32, 0],
             'out_chunk': [0, 0],
             'split_portions': [0.6, 0.2, 0.2],
             'batch_size': 64}

# Define your architecture
arch_args = {'task': 'regression',
             'name': 'MLP'}

# Define your trainer
trainer_args = {'loss_fn': 'mse_loss',
                'optim': 'Adam',
                'max_epochs': 10,
                'learning_rate': 0.01,
                'task': 'regression'}

# Train your model
KI.train_model(model_name='my_new_model_name',
               kwargs={'data': data_args,
                       'arch': arch_args,
                       'trainer': trainer_args})

# Generate model predictions
KI.generate_predictions(model_name='my_new_model_name',
                        kwargs={'predictor': {'prediction_set': 'valid'}})

# Interpret model predictions
KI.interpret_model(model_name='my_new_model_name',
                   kwargs={'interpreter':
                               {'interpretation_method': 'DeepLift',
                                'interpretation_set': 'valid',
                                'selection': 'random',
                                'size': 100}})

Current Features#

  • Flexible problem class definition allows facilitating various time series modeling tasks, including: uni- and multivariate, single- and multistep, time series regression, classification, forecasting, and detection.

  • A growing list of default deep time series architectures. Currently includes: MLP, CNN, TCN, and LSTM.

  • Convenient entry points to import custom architectures (PyTorch based) and custom dataset (Pandas based).

  • Automated data management (storing, splitting, scaling, sampling, loading, padding etc.) of complex time series data.

  • Convenient feature attribution extraction and storage.

  • Hyperparameter tuning through Pytorch Lightning and Weights and Biases API.

  • Extendable design for additional future training and interpretability paradigms.

Current Limitations#

  • Data is assumed to be equidistantly sampled and time-indexed time series data. Gaps are removed during preprocessing to construct contiguous blocks of data for model training.

  • Data is stored on and loaded from disk in contiguous slices. These slices have to fit into memory.

Coming soon#

  • Adding Transformer-based models to default architectures

  • Package installation options

  • Testbench for synthetic data experimentation

  • Additional feature attribution visualizations

We are open to any suggestions.

Citation#

If you make use of KnowIt, we kindly request that you cite it with the provided citation file. This link will be updated to an official link soon.

Acknowledgments#

This project is made possible due to funding and support from:

  • MUST Deep Learning

  • North-West University (NWU), South Africa

  • National Institute for Theoretical & Computational Sciences (NITheCS), South Africa

  • Centre for Artificial Intelligence Research (CAIR), South Africa

  • National Research Foundation (NRF), South Africa

Contributors:

  • Marthinus Wilhelmus Theunissen (tiantheunissen@gmail.com)

  • Randle Rabe (randlerabe@gmail.com)

  • Harmen Potgieter (potgieterharmen@gmail.com)