Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting
This repository is the official PyTorch implementation of the paper "Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting", which has been accepted by the KDD 2026 Datasets and Benchmarks Track.
TSCOMP is the first large-scale benchmark that systematically deconstructs deep multivariate time-series forecasting (MTSF) methods into their core, fine-grained componentsβspanning series preprocessing, encoding strategies, network backbones (including specific, LLM, and TSFM models), and optimization methods.
- Key Features & Innovations
- Prerequisites
- Quick Start
- Supported Components & Design Space
- Supported Datasets
- Supported Baseline
- Repository Structure
- Citation
- Acknowledgments
TSCOMP stands out as a pioneering benchmark and framework for multivariate time-series forecasting (MTSF) with three core academic contributions:
-
Comprehensive Benchmark via Hierarchical Deconstruction:
Rather than evaluating models holistically as indivisible "black boxes," TSCOMP deconstructs deep forecasting methods into a multi-stage modeling pipeline (4 stages, 11 dimensions, and 49 fine-grained components). To rigorously assess these elements, we implement a constrained orthogonal experimental protocol that systematically isolates the core mechanisms driving forecasting performance, reducing over
$10^6$ combinatorial variants into a computationally tractable pool. - Rigorous Multi-View Analysis & Insights: We conduct a large-scale analysis using a multi-tiered statistical framework to examine component-level dynamics. Beyond general performance rankings, we extensively investigate component sensitivities and interaction synergies across diverse backbones (including MLPs, RNNs, Transformers, and emerging LLMs/TSFMs) and data characteristics.
- Open-Sourced Corpus & Automated Zero-Shot Construction: We release a massive, fine-grained performance corpus consisting of over 20,000 evaluations. Leveraging this corpus, TSCOMP trains a pre-trained meta-predictor utilizing TabPFN-extracted meta-features to adaptively construct optimal component configurations for unseen datasets in a zero-shot mannerβconsistently outperforming prevailing SOTA forecasting models and AutoML tools.
- Python 3.8+ (recommended via Conda)
- PyTorch 2.0+
- CUDA-enabled GPU (Highly recommended for running large-scale experiment pools)
- Dependencies listed in environment.yml
Get TSCOMP up and running quickly with this step-by-step guide.
# Clone the repository and enter directory
git clone https://github.com/SUFE-AILAB/TSCOMP.git
cd TSCOMP
# Create and activate conda environment
conda env create -f environment.yml
conda activate tscompGenerate the batch execution shell scripts for short-term and long-term forecasting:
# Generate short-term forecasting execution scripts
python notebooks/bash_generator_short_term_forecasting_sota_seed.py
# Generate long-term forecasting execution scripts
python notebooks/bash_generator_long_term_forecasting_sota_seed.pyThis will populate ready-to-run .sh script files in the scripts/ directory.
bash scripts/<generated_script_name>.shYou can directly perform statistical analysis on our Hugging Face Dataset page corpus or your local experimental logs:
python notebooks/analyze_orthogonal_pool.pyBased on our performance corpus, you can directly perform meta-learner training, meta-feature extraction, and zero-shot model selection:
-
Run meta-learning experiments (train the meta-predictor):
python meta/run.py --mode simple --test_dataset ETTh2 --meta_model_type mlp
-
Extract meta-features for datasets:
python meta/meta_features/get_meta_features_LTF.py --meta_feature_type tabpfn
-
Apply meta-selection (zero-shot component recommendation) to new datasets:
python meta/run_custom.py --new_dataset my_dataset --checkpoint_path <path> --new_dataset_path <csv_path> --scripts_root <scripts_dir>
The meta-features extracted by TabPFN exhibit a more pronounced normal distribution compared to traditional statistical methods, significantly enhancing the prediction accuracy of our meta-learning predictor:
TSCOMP systematically maps the MTSF pipeline into a standardized, modular design space:
| Pipeline Stage | Component Dimension | Supported Components | Reference Methods |
|---|---|---|---|
| Series Preprocessing | Series Normalization | w/o Norm, Stat, RevIN, DishTS | RevIN, DishTS |
| Series Decomposition | w/o Decomp, Moving Average (MA), MoEMA, DFT | MoEMA, TimeMixer | |
| Series Sampling/Mixing | w/o Mixing, w/ Mixing | TimeMixer | |
| Series Encoding | Channel Dependency | Channel Dependent (CD), Channel Independent (CI) | PatchTST, iTransformer |
| Series Tokenization | Point Encoding, Series Patching, Inverted Encoding, Ortho Encoding | PatchTST, iTransformer, OLinear | |
| Timestamp Embedding | w/o Embedding, w/ Embedding | - | |
| Network Architecture | Network Backbone | MLP: DNN, NormLin <br>RNN: GRU, xLSTM <br>Transformer: w/o Attn, SelfAttn, AutoCorr, SparseAttn, FrequencyAttn, DestationaryAttn <br>LLM: GPT4TS, TimeLLM <br>TSFM: Timer, Moment, TimeMoE, Chronos |
Informer, Autoformer, FEDformer, GPT4TS, TimeLLM, Timer, Moment, TimeMoE, Chronos |
| Feature Attention | w/o Attn, SelfAttn, SparseAttn | - | |
| Retrieval Augmented (RAG) | w/o RAG, w/ RAG | RAFT | |
| Network Optimization | Sequence Length | 48, 96, 192, 512 | - |
| Loss Function | MSE, MAE, HUBER, DBLoss, PSLoss, FreDFLoss | DBLoss, PSLoss, FreDFLoss |
To navigate the massive combinatorial design space (over
TSCOMP includes 14 benchmark datasets covering various domains and forecasting settings:
- Long-Term Forecasting (LTF) Datasets:
- ETT (ETTh1, ETTh2, ETTm1, ETTm2): Electricity power transformer datasets containing load and oil temperature measurements.
- ECL (Electricity): Hourly electricity consumption records of 321 clients.
- Traffic: Hourly road occupancy rates measured by 862 sensors on SF Bay Area freeways.
- Weather: Meteorological dataset featuring 21 indicators recorded at 10-minute intervals.
- Exchange: Daily exchange rates of 8 different countries.
- Stock (NASDAQ, NYSE): Daily stock market trading records (Open, Close, Volume, High, Low).
- FRED-MD: Monthly macroeconomic indicators from the Federal Reserve Bank.
- ILI: Weekly influenza-like illness patient tracking data from the CDC.
- Covid-19: Daily infectious disease transmission tracking data.
- Short-Term Forecasting (STF) Datasets:
- M4: The classic M4 Competition dataset containing 100,000 unaligned time-series across Yearly, Quarterly, Monthly, Weekly, Daily, and Hourly frequencies.
TSCOMP deconstructs and benchmarks 28 state-of-the-art baselines across four major architectural paradigms:
- MLP-Based Models: DLinear, OLinear, FiLM, TSMixer, LightTS, FreTS, Koopa, TimeMixer
- RNN/SSM-Based Models: SegRNN, Mamba, xLSTM
- CNN-Based Models: TimesNet, SCINet, MICN
- Transformer-Based Models: Informer, Autoformer, FEDformer, PatchTST, iTransformer, Reformer, PyraFormer, NSTransformer, ETSformer, Crossformer, RAFT, TimeXer, PAttn, DUET
TSCOMP/
βββ data_provider/ # Dataset loading and preprocessing pipelines
βββ models/ # Forecasting model architectures and deconstructed backbones
β βββ DNN.py # MLP baseline implementations
β βββ GRU.py # RNN baseline implementations
β βββ Informer.py # Informer and variant baseline implementations
β βββ TimeLLM.py # TimeLLM baseline implementations
β βββ ... # 28+ other forecasting baseline models
βββ layers/ # Reusable neural network building blocks (attention, patches, etc.)
βββ exp/ # Experiment engines for training, validation, and testing
βββ scripts/ # Generated batch execution scripts for benchmarking
βββ meta/ # Meta-feature extractors and meta-learning selection model
β βββ meta_features/ # TabPFN and statistical feature extraction scripts
β βββ run.py # Simple meta-learning trainer and predictor
β βββ run_custom.py # Apply zero-shot meta-selection to custom user datasets
βββ figures/ # Framework charts, innovation diagrams, and analysis plots
βββ notebooks/ # Batch generator notebooks and analysis scripts
βββ environment.yml # Virtual environment package lists
βββ run.py # Main entry point for custom/standard forecasting runs
βββ README.md # Repository documentation (this file)
If you find this benchmark or the TSCOMP framework helpful in your research, please consider citing our paper:
@inproceedings{liang2026beyond,
title={Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting},
author={Liang, Shuang and Hou, Chaochuan and Yao, Xu and Wang, Shiping and Huang, Hailiang and Han, Songqiao and Jiang, Minqi},
booktitle={Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)},
year={2026},
doi={10.1145/3770855.3817551}
}We thank the developers of the Time-Series-Library (TSL) and all baseline models incorporated in this benchmark (e.g., Informer, Autoformer, FEDformer, PatchTST, iTransformer, GPT4TS, etc.) for open-sourcing their outstanding implementations.


