OpenSIR: Open-Ended Self-Improving Reasoner

Setup

pip install vllm==0.7.2 setuptools rootutils strictfire
pip install flash-attn --no-build-isolation
pip install flashinfer-python==0.2.2 -i https://flashinfer.ai/whl/cu124/torch2.5/
GIT_LFS_SKIP_SMUDGE=1 pip install -e ".[dev]"

Train

We first generate a small set of questions that are within the solve rate range as the intial example pool.

CUDA_VISIBLE_DEVICES=0 vllm serve meta-llama/Llama-3.2-3B-Instruct --gpu-memory-utilization 0.60 --port 8000
python src/opensir/generate_initial_problems.py --main_model meta-llama/Llama-3.2-3B-Instruct --max_tokens 2048 --main_port_number 8000 --lower_ratio 0.5 \
    --scale_factor 5 \
    --higher_ratio 0.9  --n_target_problems 512 --initial_batch_size 32 --output_filename "src/opensir/data/initial_data_new.jsonl"

Then, we proceed to train the model with GRPO using self-play

CUDA_VISIBLE_DEVICES=0 vllm serve Linq-AI-Research/Linq-Embed-Mistral --task embed --gpu-memory-utilization 0.25 --port 8001

VLLM_USE_V1=0 CUDA_VISIBLE_DEVICES=0 trl vllm-serve \
  --model meta-llama/Llama-3.2-3B-Instruct \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.6 \
  --max-model-len 8192 

CUDA_VISIBLE_DEVICES=1,2 NCCL_DEBUG=WARN HF_HUB_ENABLE_HF_TRANSFER=1 ACCELERATE_LOG_LEVEL=info TRANSFORMERS_VERBOSITY=info accelerate launch \
    --config_file recipes/accelerate_configs/zero3.yaml \
    --num_processes 2 \
    src/opensir/train.py --config configs/opensir_train.yaml

Evaluate

python evaluate.py --model saves/Opensir_Llama-3.2-3B-Instruct/checkpoint-200  --output_fn results/Opensir_Llama-3.2-3B-Instruct_eval-outputs.jsonl --result_fn results/Opensir_Llama-3.2-3B-Instruct_eval-results.jsonl --n 16 --temp 0.6 --top_p 0.95

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.github		.github
assets		assets
configs		configs
datasets		datasets
logs		logs
recipes		recipes
slurm		slurm
src		src
tests		tests
.gitignore		.gitignore
.project-root		.project-root
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenSIR: Open-Ended Self-Improving Reasoner

Setup

Train

Evaluate

About

Uh oh!

Releases

Packages

Contributors 41

Uh oh!

Languages

License

EdinburghNLP/OpenSIR

Folders and files

Latest commit

History

Repository files navigation

OpenSIR: Open-Ended Self-Improving Reasoner

Setup

Train

Evaluate

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 41

Uh oh!

Languages

Packages