CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

📖 Introduction

This is the official repository for the paper "CALM BEFORE THE STORM: UNLOCKING NATIVE REASONING FOR OPTIMIZATION MODELING".

STORM (Smart Thinking Optimization Reasoning Model) is an advanced Large Language Model designed for automating Operations Research (OR) and optimization modeling tasks. Traditional domain adaptation methods often force models into a rigid, non-reflective generation pattern, which suppresses the powerful, native multi-step reasoning abilities of modern Large Reasoning Models (LRMs).

To address this, we introduce CALM (Corrective Adaptation with Lightweight Modification). CALM utilizes lightweight, expert-aligned hints to dynamically correct and guide a model's reasoning trajectories, rather than overwriting them. This approach generates high-quality training data that mirrors an expert's thought process.

Building on CALM, we transform a 4B parameter base model into STORM through a two-stage training pipeline: Supervised Fine-Tuning (SFT) + Reinforcement Learning (RL).

✨ Core Highlights

🚀 SOTA Performance with High Efficiency: STORM, with only 4B parameters, achieves a new state-of-the-art average accuracy of 68.9% across five popular optimization modeling benchmarks. Its performance matches or surpasses that of a 671B parameter model, demonstrating exceptional parameter efficiency.
🧠 Preserving and Enhancing Native Reasoning: Our CALM framework preserves and amplifies the model's inherent multi-step, iterative reasoning abilities through 'lightweight correction' rather than 'forced instruction,' allowing it to reason more like a true domain expert.
🛠️ Powerful Code-Integrated Reasoning: STORM can autonomously leverage a wide range of scientific computing libraries (e.g., pulp, sympy, numpy) during inference to aid its modeling and solving process, showcasing strong tool-use capabilities.
💡 Emergent Abilities: After reinforcement learning, STORM demonstrates the ability to use novel tools not seen during its training (like using rdkit for chemistry problems) to solve complex tasks, indicating powerful generalization and autonomous learning.

🔧 Installation

1. Environment Setup

We highly recommend using Conda to manage your Python environment.

conda create -n storm python=3.10
conda activate storm

2. Inference Engine

For high-performance inference, we support vLLM and SGLang. Please choose one to install based on your preference and environment.

Option 1: vLLM (Recommended)

pip install "vllm>=0.8.5.post1"

Option 2: SGLang

pip install "sglang>=0.4.6.post1"

3. Core Dependencies

These are the essential Python packages required to run this project.

pip install math_verify transformers datasets pebble

4. Scientific Computing Environment (Crucial for Code-Integrated Reasoning)

STORM's power lies in its ability to dynamically call external Python libraries to solve problems. To unlock its full potential, ensure the following common scientific computing packages are installed in your environment.

# Operations Research & Optimization Solvers
pip install pulp gurobipy cvxpy pyomo osqp scikit-optimize optuna hyperopt ortools

# Scientific Computing & Data Analysis
pip install numpy scipy sympy pandas matplotlib scikit-learn statsmodels networkx autograd torch

# Other Specialized Libraries (Optional, depending on your tasks)
pip install pymc3 pydstool shapely pygeos seaborn plotly mpmath

Important Note: The model's tool-use capabilities are open-ended. We found that when faced with specialized problems (e.g., GPQA Diamond Chemistry), STORM attempts to use more specific libraries like rdkit. Therefore, we encourage you to install other relevant scientific packages based on your application domain to further enhance the model's capabilities.

5. (Optional) Full Environment Replication

If you wish to create an environment identical to the one used in our experiments, you can install all dependencies from the requirements.txt file. Please be aware that this list is very extensive and includes many task-specific packages.

pip install -r requirements.txt

🤖 Model Weights

We have open-sourced the STORM-Qwen3-4B model weights. You can download them from either source:

Hugging Face: tangzhy/STORM-Qwen3-4B
ModelScope: tangzhy/STORM-Qwen3-4B

🚀 Inference & Evaluation

We provide a convenient script to reproduce the evaluation results from our paper.

Script Usage

The run_inference.sh script accepts three arguments:

MODEL_NAME_OR_PATH: The local path to your downloaded model weights.
TEST_SET_NAME: The name of the benchmark to evaluate. Options include: nl4opt, mamo_easy, mamo_complex, industryor, OptMath.
GPU_ID: The ID of the GPU device you wish to use (e.g., 0).

The script (run_inference.sh):

#!/bin/bash

# $1: Local model path, e.g., /path/to/your/STORM-Qwen3-4B
# $2: Test set name, e.g., nl4opt
# $3: GPU ID to use, e.g., 0

MODEL_NAME_OR_PATH=$1
TEST_SET=test.tir_prompt.$2

INPUT_FILE="data/$TEST_SET.jsonl"
OUTPUT_TAG="STORM_infer_outputs/$TEST_SET"
MODEL_OUTPUT_DIR=$MODEL_NAME_OR_PATH/$OUTPUT_TAG

CUDA_VISIBLE_DEVICES=$3 TOKENIZERS_PARALLELISM=false python -m infer.inference_and_eval \
    --input_file $INPUT_FILE \
    --output_dir $MODEL_OUTPUT_DIR \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --engine "vllm" \
    --tensor_parallel_size 1

You can switch between vllm and sglang by modifying the --engine parameter.

Example

Assuming you have downloaded the model to ./models/STORM-Qwen3-4B and want to evaluate it on the nl4opt test set using GPU 0, run the following command:

bash run_inference.sh ./models/STORM-Qwen3-4B nl4opt 0

Acknowledgements

For our Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) training pipeline, we would like to thank and reference the work presented in the CoRT paper. Their official repository can be found here: CoRT GitHub.

📜 Citation

If you find our work helpful for your research, please consider citing our paper:

@misc{tang2025calmstormunlockingnative,
      title={CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling}, 
      author={Zhengyang Tang and Zihan Ye and Chenyu Huang and Xuhan Huang and Chengpeng Li and Sihang Li and Guanhua Chen and Ming Yan and Zizhuo Wang and Hongyuan Zha and Dayiheng Liu and Benyou Wang},
      year={2025},
      eprint={2510.04204},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.04204}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

📖 Introduction

✨ Core Highlights

🔧 Installation

1. Environment Setup

2. Inference Engine

3. Core Dependencies

4. Scientific Computing Environment (Crucial for Code-Integrated Reasoning)

5. (Optional) Full Environment Replication

🤖 Model Weights

🚀 Inference & Evaluation

Script Usage

Example

Acknowledgements

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
infer		infer
README.md		README.md
requirements.txt		requirements.txt
run_inference.sh		run_inference.sh

tangzhy/STORM

Folders and files

Latest commit

History

Repository files navigation

CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

📖 Introduction

✨ Core Highlights

🔧 Installation

1. Environment Setup

2. Inference Engine

3. Core Dependencies

4. Scientific Computing Environment (Crucial for Code-Integrated Reasoning)

5. (Optional) Full Environment Replication

🤖 Model Weights

🚀 Inference & Evaluation

Script Usage

Example

Acknowledgements

📜 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages