Skip to content
/ STORM Public

CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

Notifications You must be signed in to change notification settings

tangzhy/STORM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

Paper Hugging Face ModelScope

📖 Introduction

This is the official repository for the paper "CALM BEFORE THE STORM: UNLOCKING NATIVE REASONING FOR OPTIMIZATION MODELING".

STORM (Smart Thinking Optimization Reasoning Model) is an advanced Large Language Model designed for automating Operations Research (OR) and optimization modeling tasks. Traditional domain adaptation methods often force models into a rigid, non-reflective generation pattern, which suppresses the powerful, native multi-step reasoning abilities of modern Large Reasoning Models (LRMs).

To address this, we introduce CALM (Corrective Adaptation with Lightweight Modification). CALM utilizes lightweight, expert-aligned hints to dynamically correct and guide a model's reasoning trajectories, rather than overwriting them. This approach generates high-quality training data that mirrors an expert's thought process.

Building on CALM, we transform a 4B parameter base model into STORM through a two-stage training pipeline: Supervised Fine-Tuning (SFT) + Reinforcement Learning (RL).

✨ Core Highlights

  • 🚀 SOTA Performance with High Efficiency: STORM, with only 4B parameters, achieves a new state-of-the-art average accuracy of 68.9% across five popular optimization modeling benchmarks. Its performance matches or surpasses that of a 671B parameter model, demonstrating exceptional parameter efficiency.
  • 🧠 Preserving and Enhancing Native Reasoning: Our CALM framework preserves and amplifies the model's inherent multi-step, iterative reasoning abilities through 'lightweight correction' rather than 'forced instruction,' allowing it to reason more like a true domain expert.
  • 🛠️ Powerful Code-Integrated Reasoning: STORM can autonomously leverage a wide range of scientific computing libraries (e.g., pulp, sympy, numpy) during inference to aid its modeling and solving process, showcasing strong tool-use capabilities.
  • 💡 Emergent Abilities: After reinforcement learning, STORM demonstrates the ability to use novel tools not seen during its training (like using rdkit for chemistry problems) to solve complex tasks, indicating powerful generalization and autonomous learning.

🔧 Installation

1. Environment Setup

We highly recommend using Conda to manage your Python environment.

conda create -n storm python=3.10
conda activate storm

2. Inference Engine

For high-performance inference, we support vLLM and SGLang. Please choose one to install based on your preference and environment.

Option 1: vLLM (Recommended)

pip install "vllm>=0.8.5.post1"

Option 2: SGLang

pip install "sglang>=0.4.6.post1"

3. Core Dependencies

These are the essential Python packages required to run this project.

pip install math_verify transformers datasets pebble

4. Scientific Computing Environment (Crucial for Code-Integrated Reasoning)

STORM's power lies in its ability to dynamically call external Python libraries to solve problems. To unlock its full potential, ensure the following common scientific computing packages are installed in your environment.

# Operations Research & Optimization Solvers
pip install pulp gurobipy cvxpy pyomo osqp scikit-optimize optuna hyperopt ortools

# Scientific Computing & Data Analysis
pip install numpy scipy sympy pandas matplotlib scikit-learn statsmodels networkx autograd torch

# Other Specialized Libraries (Optional, depending on your tasks)
pip install pymc3 pydstool shapely pygeos seaborn plotly mpmath

Important Note: The model's tool-use capabilities are open-ended. We found that when faced with specialized problems (e.g., GPQA Diamond Chemistry), STORM attempts to use more specific libraries like rdkit. Therefore, we encourage you to install other relevant scientific packages based on your application domain to further enhance the model's capabilities.

5. (Optional) Full Environment Replication

If you wish to create an environment identical to the one used in our experiments, you can install all dependencies from the requirements.txt file. Please be aware that this list is very extensive and includes many task-specific packages.

pip install -r requirements.txt

🤖 Model Weights

We have open-sourced the STORM-Qwen3-4B model weights. You can download them from either source:

🚀 Inference & Evaluation

We provide a convenient script to reproduce the evaluation results from our paper.

Script Usage

The run_inference.sh script accepts three arguments:

  1. MODEL_NAME_OR_PATH: The local path to your downloaded model weights.
  2. TEST_SET_NAME: The name of the benchmark to evaluate. Options include: nl4opt, mamo_easy, mamo_complex, industryor, OptMath.
  3. GPU_ID: The ID of the GPU device you wish to use (e.g., 0).

The script (run_inference.sh):

#!/bin/bash

# $1: Local model path, e.g., /path/to/your/STORM-Qwen3-4B
# $2: Test set name, e.g., nl4opt
# $3: GPU ID to use, e.g., 0

MODEL_NAME_OR_PATH=$1
TEST_SET=test.tir_prompt.$2

INPUT_FILE="data/$TEST_SET.jsonl"
OUTPUT_TAG="STORM_infer_outputs/$TEST_SET"
MODEL_OUTPUT_DIR=$MODEL_NAME_OR_PATH/$OUTPUT_TAG

CUDA_VISIBLE_DEVICES=$3 TOKENIZERS_PARALLELISM=false python -m infer.inference_and_eval \
    --input_file $INPUT_FILE \
    --output_dir $MODEL_OUTPUT_DIR \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --engine "vllm" \
    --tensor_parallel_size 1

You can switch between vllm and sglang by modifying the --engine parameter.

Example

Assuming you have downloaded the model to ./models/STORM-Qwen3-4B and want to evaluate it on the nl4opt test set using GPU 0, run the following command:

bash run_inference.sh ./models/STORM-Qwen3-4B nl4opt 0

Acknowledgements

For our Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) training pipeline, we would like to thank and reference the work presented in the CoRT paper. Their official repository can be found here: CoRT GitHub.

📜 Citation

If you find our work helpful for your research, please consider citing our paper:

@misc{tang2025calmstormunlockingnative,
      title={CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling}, 
      author={Zhengyang Tang and Zihan Ye and Chenyu Huang and Xuhan Huang and Chengpeng Li and Sihang Li and Guanhua Chen and Ming Yan and Zizhuo Wang and Hongyuan Zha and Dayiheng Liu and Benyou Wang},
      year={2025},
      eprint={2510.04204},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.04204}, 
}

About

CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published