This repository contains the official implementation for the paper "STARS: Segment-level Token Alignment via Rejection Sampling in Large Language Models", accepted at the 2nd Workshop of Frontiers in Probabilistic Inference: Sampling Meets Learning at NeurIPS 2025.
This repository implements three different decoding strategies for large language models:
- STARS - Segment-level Token Alignment via Rejection Sampling (Ours)
- Best-of-N Sampling - Generate N candidates and select the best one using a reward model
- Vanilla Decoding - Standard greedy/sampling-based generation
STARS performs adaptive rejection sampling at the segment level, enabling efficient alignment of LLM outputs with reward models during inference without requiring additional training.
Install the required dependencies:
pip install -r requirements.txtBefore running the scripts, configure the appropriate YAML files in the configs/ directory:
configs/stars_config.yaml- Configuration for STARS algorithm (segment_size, max_attempts, alpha, beta, reward_threshold)configs/bon_config.yaml- Configuration for Best-of-N sampling (num_samples)configs/vanilla_config.yaml- Configuration for vanilla decodingconfigs/win_percentage_config.yaml- Configuration for evaluation
Run the STARS algorithm to generate responses:
python stars.pyRun the Best-of-N sampling approach:
python bon.pyRun standard vanilla decoding:
python vanilla.pyCompare any two output files - either stars, vanilla or bon decoding with the win percentage evaluator:
python win-percentage-evaluator.pyThis script uses an LLM judge (GPT-4.1) to compare outputs from different methods and compute win percentages. Tweak configs/win_percentage_config.yaml to run evals.
We have extensively evaluated our method on three alignment directions:
- HarmfulQA (
data/harmfulqa-300.json) - 300 sampled responses from the HarmfulQA dataset. - HH-RLHF (
data/hh-rlhf-300.json) - 300 sampled responses from the HH-RLHF dataset.
- IMDB (
data/imdb_300_samples.csv) - Movie review sentiment data for generating positive sentiment from a Kaggle dataset.
This project is licensed under the MIT License - see the LICENSE file for details.
