Time-R1 is a framework designed to endow Language Models (LLMs) with comprehensive temporal reasoning capabilities, enabling them to progressively cultivate sophisticated temporal logic from past events, predict future occurrences, and creatively generate plausible future scenarios.
This repository contains the official code, the Time-Bench dataset, and fine-tuned model checkpoints for our paper:
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs > Zijia Liu, Peixuan Han, Haofei Yu, Haoru Li, Jiaxuan You

Figure: Time-R1 output examples: Event Prediction (Stage 2, Left) & Future Scenario Generation (Stage 3, Right).
Large Language Models (LLMs) demonstrate impressive capabilities but often lack robust temporal intelligence. Time-R1 addresses this by introducing a novel three-stage reinforcement learning (RL) curriculum driven by a meticulously designed dynamic rule-based reward system. Our approach progressively builds:
- (Stage 1: Comprehension) Foundational temporal understanding and logical event-time mappings from historical data.
- (Stage 2: Prediction) Skills to predict future event times, especially for events beyond the model's knowledge cutoff, using synthetic data to ensure rigorous training and prevent information leakage.
- (Stage 3: Generation) Strong generalization to creatively generate plausible future scenarios without direct fine-tuning for this task, leveraging capabilities from the first two stages.
Our experiments show that a 3B-parameter Time-R1 model significantly outperforms models over 200 times its size on challenging future event prediction and creative scenario generation benchmarks.
-
Time-Bench Dataset:
- Contains over 200,000 examples with explicit temporal annotations obtained and processed through [The New York Times Archive api](https://developer.nytimes.com/docs/ archive-product/1/overview).
- Covers diverse tasks: timestamp inference, time-gap estimation, event ordering, and masked time entity completion.
- Further details on dataset construction can be found in Appendix B of our paper.
- The training scripts expect the dataset files (e.g.,
.parquet
files) to be placed in adatasets
subdirectory within theOUTPUT_BASE_DIR
specified in the scripts (e.g.,Time-R1/datasets/
).
-
Time-R1 Model Checkpoints:
- Includes progressive checkpoints from Stage 1 (Phases 1, 2, and 3, where Phase 3 yields
$\theta_1$ ). - An ablation model from Stage 1 (
$\theta_1'$ , without dynamic reward design). - The final model after Stage 2 (
$\theta_2$ ).
- Includes progressive checkpoints from Stage 1 (Phases 1, 2, and 3, where Phase 3 yields
- Source Code: For training Time-R1 and evaluating on Time-Bench.
- Comprehensive Temporal Reasoning: Unified capabilities for understanding, prediction, and creative generation related to time.
- Novel 3-Stage RL Curriculum: Progressively builds temporal skills from foundational understanding to advanced future-oriented reasoning.
- Dynamic Reward System: Meticulously designed rewards guide the LLM effectively. (See details below)
- State-of-the-Art Performance: Our 3B Time-R1 model surpasses significantly larger models on key temporal tasks.
- Time-Bench Dataset: A new large-scale, multi-task temporal reasoning dataset derived from ten years of news data.
-
Staged Model Checkpoints: The training process yields distinct model checkpoints (
$\theta_1$ and$\theta_2$ ) representing different stages of temporal reasoning capability, which can be reproduced using the provided scripts.
A core component of Time-R1 is its dynamic rule-based reward system. This system is crucial for guiding the LLM through the complexities of temporal reasoning. The detailed implementation of our reward functions can be found in the following files:
Time-R1/verl/utils/reward_score/time_reasoning.py
(over 1200 lines of code dedicated to various temporal comprehension reward calculations)Time-R1/verl/utils/reward_score/time_prediction.py
(handles rewards for future time prediction tasks)
These modules implement nuanced scoring for different temporal aspects, adapting to the learning stage and task difficulty.
- RL Workflow: Refer to
verl/trainer/main_ppo*.py
(e.g.,main_ppo_s1_p1.py
,main_ppo_s2.py
) andverl/trainer/ppo/ray_trainer.py
. - Reward Design: As detailed above, see
verl/utils/reward_score/
. - Hyperparameters: General PPO hyperparameters can be found in
verl/trainer/config/ppo_trainer.yaml
, with task-specific overrides in the training scripts.
-
Create Conda Environment (Recommended):
# python=3.9 is recommended as per the verl framework. conda create -n timer1 python=3.9 conda activate timer1
-
Install Dependencies:
# Install PyTorch (or you can skip this step and let vLLM install the correct version for you) pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121 # Install vLLM and Ray # Note: vllm==0.6.3 is specified, other versions like 0.5.4, 0.4.2, 0.3.1 might also work. pip3 install vllm==0.6.3 pip3 install ray # Install the verl framework (local editable install) pip install -e . # Install Flash Attention 2 pip3 install flash-attn --no-build-isolation # Install other useful packages (e.g., for logging and plotting) pip install wandb IPython matplotlib # Install remaining dependencies from requirements.txt if any are not covered above pip install -r requirements.txt
-
Environment Configuration: Ensure that system variables (like
CUDA_VISIBLE_DEVICES
,WANDB_ENTITY
if using Weights & Biases, and paths toBASE_MODEL
orOUTPUT_BASE_DIR
) are correctly configured in your environment or within the bash scripts before execution. These are often marked with "###" in the scripts and require user-specific values.
The Time-R1 model is trained progressively through a series of stages and phases.
This stage focuses on building foundational temporal understanding. It is divided into three phases, each building upon the previous:
-
Phase 1 (Inference Easy): Trains on simpler temporal inference tasks.
- Script:
scripts/stage1_phase1.sh
- Script:
-
Phase 2 (All Comprehension Tasks, Different Alphas): Introduces all comprehension tasks with varied difficulty (alpha values for reward calculation).
- Script:
scripts/stage1_phase2.sh
- This phase uses the checkpoint from Phase 1 as its
BASE_MODEL
.
- Script:
-
Phase 3 (All Comprehension Tasks, Dynamic Alpha): Fine-tunes with a dynamic alpha in the reward system, further enhancing nuanced understanding. This phase yields the final
$\theta_1$ checkpoint.- Script:
scripts/stage1_phase3.sh
- This phase uses the checkpoint from Phase 2 as its
BASE_MODEL
.
- Script:
This stage trains the model to predict future event times, using the
- Script:
scripts/stage2.sh
- This stage uses the
$\theta_1$ checkpoint (from Stage 1, Phase 3) as itsBASE_MODEL
.
While Stage 3 (Creative Future Scenario Generation) does not involve direct fine-tuning, its capabilities are an emergent property of the model trained through Stages 1 and 2 (i.e., the
- Script:
future_news_generation/stage3.sh
- This script uses the
$\theta_2$ checkpoint for news generation and then runs analysis scripts on the generated content.
For detailed hyperparameters and further experimental setup, please refer to the main paper and the provided scripts.
Through careful reward design and a phased training curriculum, our Stage 1 results have shown significant improvements, particularly in tasks like time-difference estimation, substantially surpassing the results reported in the main paper. The overall average performance of our
The impact of our dynamic reward mechanism is also illustrated below. The blue curve represents the training progression of scripts/stage1_no_dynamic_reward.sh
).

Figure: Training curves for Stage 1. Red: Time-R1 (Phase 3) with dynamic rewards. Blue: Ablation without dynamic reward design.
The
The concept of Time-R1 is inspired by Deepseek-R1 and TinyZero. Its implementation is built upon veRL. We sincerely appreciate the efforts of these teams for their contributions to open-source research and development.
@article{liu2025time,
title={Time-R1: Towards Comprehensive Temporal Reasoning in LLMs},
author={Liu, Zijia and Han, Peixuan and Yu, Haofei and Li, Haoru and You, Jiaxuan},
journal={arXiv preprint arXiv:2505.13508},
year={2025}
}