STARS: Segment-level Token Alignment via Rejection Sampling in Large Language Models

This repository contains the official implementation for the paper "STARS: Segment-level Token Alignment via Rejection Sampling in Large Language Models", accepted at the 2nd Workshop of Frontiers in Probabilistic Inference: Sampling Meets Learning at NeurIPS 2025.

Overview

This repository implements three different decoding strategies for large language models:

STARS - Segment-level Token Alignment via Rejection Sampling (Ours)
Best-of-N Sampling - Generate N candidates and select the best one using a reward model
Vanilla Decoding - Standard greedy/sampling-based generation

STARS performs adaptive rejection sampling at the segment level, enabling efficient alignment of LLM outputs with reward models during inference without requiring additional training.

Requirements

Install the required dependencies:

pip install -r requirements.txt

Configuration

Before running the scripts, configure the appropriate YAML files in the configs/ directory:

configs/stars_config.yaml - Configuration for STARS algorithm (segment_size, max_attempts, alpha, beta, reward_threshold)
configs/bon_config.yaml - Configuration for Best-of-N sampling (num_samples)
configs/vanilla_config.yaml - Configuration for vanilla decoding
configs/win_percentage_config.yaml - Configuration for evaluation

Run Decoding Algorithms

Step 1: Generate STARS Results

Run the STARS algorithm to generate responses:

python stars.py

Step 2: Generate Best-of-N Results

Run the Best-of-N sampling approach:

python bon.py

Step 3: Generate Vanilla Results

Run standard vanilla decoding:

python vanilla.py

Evaluation

Compare any two output files - either stars, vanilla or bon decoding with the win percentage evaluator:

python win-percentage-evaluator.py

This script uses an LLM judge (GPT-4.1) to compare outputs from different methods and compute win percentages. Tweak configs/win_percentage_config.yaml to run evals.

Alignment Directions (Datasets)

We have extensively evaluated our method on three alignment directions:

1. Harmlessness

HarmfulQA (data/harmfulqa-300.json) - 300 sampled responses from the HarmfulQA dataset.
HH-RLHF (data/hh-rlhf-300.json) - 300 sampled responses from the HH-RLHF dataset.

2. Positive Sentiment

IMDB (data/imdb_300_samples.csv) - Movie review sentiment data for generating positive sentiment from a Kaggle dataset.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

STARS: Segment-level Token Alignment via Rejection Sampling in Large Language Models

Overview

Requirements

Configuration

Run Decoding Algorithms

Step 1: Generate STARS Results

Step 2: Generate Best-of-N Results

Step 3: Generate Vanilla Results

Evaluation

Alignment Directions (Datasets)

1. Harmlessness

2. Positive Sentiment

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
data		data
results		results
README.md		README.md
bon.py		bon.py
requirements.txt		requirements.txt
stars-description.jpg		stars-description.jpg
stars.py		stars.py
vanilla.py		vanilla.py
win-percentage-evaluator.py		win-percentage-evaluator.py

purseclab/STARS

Folders and files

Latest commit

History

Repository files navigation

STARS: Segment-level Token Alignment via Rejection Sampling in Large Language Models

Overview

Requirements

Configuration

Run Decoding Algorithms

Step 1: Generate STARS Results

Step 2: Generate Best-of-N Results

Step 3: Generate Vanilla Results

Evaluation

Alignment Directions (Datasets)

1. Harmlessness

2. Positive Sentiment

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages