This repository is the official implementation of the paper: MixMamba: Time Series Modeling with Adaptive Expertise
The heterogeneity and non-stationary characteristics of time series data continue to challenge single models’ ability to capture complex temporal dynamics, especially in long-term forecasting. Therefore, we propose MixMamba that:
- Leverages the Mamba model as an expert within a mixture-of-experts (MoE). This framework decomposes modeling into a pool of specialized experts, enabling the model to learn robust representations and capture the full spectrum of patterns present in time series data.
- A dynamic gating network is introduced to adaptively allocates each data segment to the most suitable expert based on its characteristics allows the model to adjust dynamically to temporal changes in the underlying data distribution.
- To prevent bias towards a limited subset of experts, a load balancing loss function is incorporated.
MixMamba is a time series forecasting model that utilizes a mixture-of-experts (MoM) approach. The model's architecture consists of four primary stages:
- Pre-processing: Raw time series data undergoes normalization and segmentation to create patches.
- Embedding and Augmentation: Patches are embedded and augmented with positional information to provide context.
- MoM Block: This central component consists of multiple Mamba experts coordinated by a gating network. Each Mamba expert employs a series of projections, convolutions, selective SSM, and a skip connection to learn temporal dependencies.
- Prediction Head: A linear prediction head is used to generate final outputs based on the learned representations.
Please follow the guide here to prepare the environment on Linux OS.
- Clone this repository
git clone https://github.com/KhaledAlkilane89/MixMamba.git
cd MixMamba
- Create environment and install package:
conda create -n mixmamba python=3.10 -y
conda activate mixmamba
pip install -r requirements.txt
- Datasets can be downloaded from either Google Drive or Baidu Drive. After downloading, place the data in the
./dataset
folder.
Train and evaluate the model using the scripts provided in the ./scripts/
directory.
Please refer to the following example for reproducing the experimental results:
- Long-term forecasting:
bash ./scripts/long_term_forecast/ETT_script/mixmamba_ETTh1.sh
- Short-term Forecasting:
bash ./scripts/short_term_forecast/mixmamba_M4.sh
- Classification:
bash ./scripts/classification/mixmamba.sh
- Mixmamba performance under varied look-back window length
$𝐿 ∈ {96, 192, 336, 720}$ on PEMS03 datasets ($𝑇 = 720$ ) (Upper left). - Comparison of memory usage (Up) and computation time (Down) on ETTm2 dataset (Batch size is set to 32) (Upper right).
- Comparison of learned representations for different experts on ETTm1 dataset with
$𝐿 = 96, 𝑇 = 720$ (Down left). - Hyperparameters analysis on exchange and ILI datasets (
$𝐿 = 96, 𝑇 = 720$ ). (Down right)
If you use this code or data in your research, please cite:
@article{ALKILANE2024102589,
title = {MixMamba: Time series modeling with adaptive expertise},
author = {Khaled Alkilane and Yihang He and Der-Horng Lee},
journal = {Information Fusion},
volume = {112},
pages = {102589},
year = {2024},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2024.102589},
url = {https://www.sciencedirect.com/science/article/pii/S1566253524003671}
}
For inquiries or to discuss potential code usage, please reach out to the following researchers:
- Khaled (khaledalkilane@outlook.com)
- Yihang (yihang.23@intl.zju.edu.cn)
We'd like to express our gratitude to the following GitHub repositories for their exceptional codebase: