Skip to content

Official Code Repository for Paper - STARS: Segment-level Token Alignment via Rejection Sampling in Large Language Models

Notifications You must be signed in to change notification settings

purseclab/STARS

Repository files navigation

STARS: Segment-level Token Alignment via Rejection Sampling in Large Language Models

This repository contains the official implementation for the paper "STARS: Segment-level Token Alignment via Rejection Sampling in Large Language Models", accepted at the 2nd Workshop of Frontiers in Probabilistic Inference: Sampling Meets Learning at NeurIPS 2025.

STARS Overview

Overview

This repository implements three different decoding strategies for large language models:

  1. STARS - Segment-level Token Alignment via Rejection Sampling (Ours)
  2. Best-of-N Sampling - Generate N candidates and select the best one using a reward model
  3. Vanilla Decoding - Standard greedy/sampling-based generation

STARS performs adaptive rejection sampling at the segment level, enabling efficient alignment of LLM outputs with reward models during inference without requiring additional training.

Requirements

Install the required dependencies:

pip install -r requirements.txt

Configuration

Before running the scripts, configure the appropriate YAML files in the configs/ directory:

  • configs/stars_config.yaml - Configuration for STARS algorithm (segment_size, max_attempts, alpha, beta, reward_threshold)
  • configs/bon_config.yaml - Configuration for Best-of-N sampling (num_samples)
  • configs/vanilla_config.yaml - Configuration for vanilla decoding
  • configs/win_percentage_config.yaml - Configuration for evaluation

Run Decoding Algorithms

Step 1: Generate STARS Results

Run the STARS algorithm to generate responses:

python stars.py

Step 2: Generate Best-of-N Results

Run the Best-of-N sampling approach:

python bon.py

Step 3: Generate Vanilla Results

Run standard vanilla decoding:

python vanilla.py

Evaluation

Compare any two output files - either stars, vanilla or bon decoding with the win percentage evaluator:

python win-percentage-evaluator.py

This script uses an LLM judge (GPT-4.1) to compare outputs from different methods and compute win percentages. Tweak configs/win_percentage_config.yaml to run evals.

Alignment Directions (Datasets)

We have extensively evaluated our method on three alignment directions:

1. Harmlessness

  • HarmfulQA (data/harmfulqa-300.json) - 300 sampled responses from the HarmfulQA dataset.
  • HH-RLHF (data/hh-rlhf-300.json) - 300 sampled responses from the HH-RLHF dataset.

2. Positive Sentiment

  • IMDB (data/imdb_300_samples.csv) - Movie review sentiment data for generating positive sentiment from a Kaggle dataset.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Official Code Repository for Paper - STARS: Segment-level Token Alignment via Rejection Sampling in Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages