REC4TS is the first benchmark that evaluates the effectiveness of reasoning strategies for zero-shot time series forecasting (TSF) tasks.
Specifically, REC4TS try to answer two research questions:
-RQ1: Can zero-shot TSF benefit from enhanced reasoning ability?
-RQ2: What kind of reasoning strategies does zero-shot TSF need?
REC4TS covers three cognitive systems: Direct Sytem 1 (e.g. gpt-4o) ; the test-time enhanced System 1 (e.g., gpt-4o with Chain-of-Thought ) and System 2 (e.g. o1-mini).
*Since reasoning strategies for foundational time-series models have not yet been studied and are difficult to implement directly, we have to reuse foundational language models to explore effective TSF reasoning strategies. We envision that our benchmark and insights offer promising potential for future research on understanding and designing effective reasoning strategies for zero-shot TSF.
- Good News: reasoning is helpful for zero-shot TSF!
- Self-consistency is currently the most effective reasoning strategy
- Group-relative policy optimization enbaled DeepSeek-R1 is the only effective System 2 reasoning strategy
- Multimodal TSF benefits more from reasoning strategies than unimodal TSF
- The TimeT-hinking dataset: containing reasoning trajectories of multiple advanced LLMs
- A new and simple test-time scaling on foundation time-series models:based on self-consistency reasoning strategies and inspired by our insights
- TimeThinking: Contains 1.5K filtered reasoning-annotated time series forecasting samples.
- Exp_Log: Records all experimental results from benchmarking, including the complete outputs of various models and performance metrics.
- Clone the repository:
git clone https://github.com/yourusername/Rec4Time.git
cd Rec4Time
- Install dependencies:
pip install -r requirements.txt
- Set up your API keys:
- Open
A_Demo.py
orA_Run_Exp.py
and add your OpenRouter API key:
openrouter_api_key = "your_openrouter_api_key" # Replace with your API key
- Open
To run a simple demo with default parameters:
python A_Demo.py
For more control over experiment parameters, use the A_Run_Exp.py
script:
python A_Run_Exp.py --future_months 6 --data_ids 0 1 2 --llm_ids 0 --multimodal 1
A_Run_Exp.py
supports the following command-line arguments:
Argument | Description | Default |
---|---|---|
--future_months |
List of future months to predict | [6] |
--data_ids |
List of dataset IDs to use | [0] |
--llm_ids |
List of LLM IDs to use 0-OpenAI, 1-Gemini, 2-DeepSeek | [0] |
--multimodal |
Whether to run multimodal experiments (0=No, 1=Yes) | [0, 1] |
--repeat_times |
Number of times to repeat each experiment | 3 |
--lookback_window_size |
Size of historical data window | 96 |
--significant_figures |
Number of significant figures for data | 5 |
--k_text |
Number of text samples to use | 10 |
--unimodal_methods |
Unimodal reasoning strategies | ["naive", "cot", "self_consistency", "self_correction", "lrm"] |
--multimodal_methods |
Multimodal reasoning strategies | ["naive", "cot", "self_consistency", "self_correction", "lrm"] |
The following datasets are available:
ID | Dataset Name |
---|---|
0 | Retail Broiler Composit |
1 | Drought Level |
2 | International Trade Balance |
3 | Gasoline Prices |
4 | Influenza Patients Proportio |
5 | Disaster and Emergency Grant |
6 | Unemployment Rate |
7 | Travel Volume |
The following sytem 1 and system 2 models are supported:
ID | Models |
---|---|
0 | OpenAI (System 1: GPT-4o & Sytem 2: o1-mini) |
1 | Google (System 1: Gemini-2.0-flash & System 2:Gemini-2.0-flash-thinking) |
2 | DeepSeek (System 1: DeepSeek-V3 & System 2: DeepSeek-R1) |
naive
: Direct system 1 generationcot
: Chain-of-thought reasoningself_consistency
: Multiple predictions with averagingself_correction
: Self-correction of initial predictionslrm
: Using System 2 Models, also known as Large Reasoning Models
Experiment results are saved in the experiment_results
directory in JSON format.
To show experiment results, use the Jupyter notebook A_DrawLatex.ipynb
. It will show the results in the form of LaTeX tables.
To use your own data, prepare CSV files with the following format:
- Time series data: Date column and value column
- Text data: Date column and fact/news column
Place your data files in the data
directory.
To use different LLM providers, modify the LLM_list
and LRM_list
variables in A_Demo.py
or A_Run_Exp.py
:
LRM_list = ["openai/o1-mini-2024-09-12", "google/gemini-2.0-flash-thinking-exp-1219:free", "deepseek/deepseek-r1"]
LLM_list = ['openai/gpt-4o-2024-05-13', "google/gemini-2.0-flash-001", "deepseek/deepseek-chat"]
Contributions are welcome! Please feel free to submit a Pull Request.
If you find this repo useful, please cite our paper.
@misc{liu2025evaluating1vs2,
title={Evaluating System 1 vs. 2 Reasoning Approaches for Zero-Shot Time-Series Forecasting: A Benchmark and Insights},
author={Haoxin Liu and Zhiyuan Zhao and Shiduo Li and B. Aditya Prakash},
year={2025},
eprint={2503.01895},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.01895},
}
If you have any questions or suggestions, feel free to contact: hliu763@gatech.edu
This project is licensed under the CC-BY (Creative Commons Attribution 4.0 International) license.