Abstract
Time series data in real-world applications such as healthcare, climate modeling, and finance are often irregular, multimodal, and messy, with varying sampling rates, asynchronous modalities, and pervasive missingness. However, existing benchmarks typically assume clean, regularly sampled, unimodal data, creating a significant gap between research and real-world deployment. We introduce Time-IMM, a dataset specifically designed to capture cause-driven irregularity in multimodal multivariate time series. Time-IMM represents nine distinct types of time series irregularity, categorized into trigger-based, constraint-based, and artifact-based mechanisms. Complementing the dataset, we introduce IMM-TSF, a benchmark library for forecasting on irregular multimodal time series, enabling asynchronous integration and realistic evaluation. IMM-TSF includes specialized fusion modules, including a timestamp-to-text fusion module and a multimodality fusion module, which support both recency-aware averaging and attention-based integration strategies. Empirical results demonstrate that explicitly modeling multimodality on irregular time series data leads to substantial gains in forecasting performance. Time-IMM and IMM-TSF provide a foundation for advancing time series analysis under real-world conditions.
Time-IMM Dataset
Cause-driven taxonomy and dataset overview for irregular multimodal time series.

Taxonomy of irregularities across trigger-, constraint-, and artifact-based causes.

Overview of the nine real-world multimodal datasets in Time-IMM.
IMM-TSF Benchmark Library
Modular framework for forecasting irregular multimodal time series.

Fusion architecture combining asynchronous text and numerical data.

Timestamp-preserving pre-alignment for irregular time series models.
Experimental Results
Multimodality consistently improves forecasting across irregularity types.

Radar plot: multimodal forecasting improves accuracy across all baselines.

Dataset-wise performance gains from textual context.

Fusion strategy comparison: GR-Add yields most stable results.

Text encoder comparison: GPT-2, BERT, Llama-3.1, DeepSeek perform similarly.
Poster
BibTeX
@inproceedings{
chang2025timeimm,
title={Time-{IMM}: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series},
author={Ching Chang and Jeehyun Hwang and Yidan Shi and Haixin Wang and Wei Wang and Wen-Chih Peng and Tien-Fu Chen},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2025},
url={https://openreview.net/forum?id=yeqrrn51TL}
}