Skip to content

Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs

License

Notifications You must be signed in to change notification settings

holarissun/embedding-based-llm-alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Embedding-based LLM Alignment:

A Minimalist, Efficient, and Effective Infrastructure of Reward Modeling Research.

Codebase for report Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs


🚀 Example Usage

# Specify Task, Embedding Model, Response Generation Model
args.task = 'Harmless'
args.res_gen_model = 'Gemma2b-sft'
args.embed_model = 'Gemma2b'

# Load Training Data
train_embeddings, train_rewards = load_embd_data(task=args.task, res_gen_model=args.res_gen_model, embed_model=args.embed_model, split='train') 
### train_embeddings.shape = (40000, 10, 2048), 40000 prompts, 10 responses for each prompt, Gemma2b has a 2048-dim embedding space
### train_rewards.shape = (40000, 10, 1), corresponding reward

# Load Testing Data
test_embeddings, test_rewards = load_embd_data(task=args.task, res_gen_model=args.res_gen_model, embed_model=args.embed_model, split='test')
### test_embeddings.shape = (2000, 500, 2048)
### test_rewards.shape = (2000, 500, 1)

# Generation of Pairwise Comparisons
train_comparisons, train_labels = pair_annotate(train_embeddings, train_rewards, annotation_quality = 0.1)
# annotation noise can be adjusted through "annotation_quality"

# Train Embedding-based Reward Model (e.g., use a Bradley-Terry MLP)
reward_model = BT_MLP()
reward_model.fit(train_comparisons, train_labels)

# Make Predictions with the Reward Model on Testset
rm_predictions = reward_model.predict(test_embeddings)
print(rm_predictions.shape) 
### (2000, 500, 1)

# Calculate Evaluation Metrics on Testset
bon_500 = calc_bon(rm_predictions, test_rewards, N=500)
spearmanr = calc_spearmanr(rm_predictions, test_rewards)

🔨 Build (TBD)

pip install 

📊 Embedding Data Downloading

Here is a Google Drive link for a single experiment setup, which is about 10GB. It can be used for a quick start/reproduction:

Google Drive Link (10GB)

The full 300GB embedding files can be found at:

Google Drive Link (300GB)


Demonstrative Use Cases (TBD)

1. A Quick Implementation of Reward Model Ensemble

This Repo.

2. A Quick Implementation of Active Reward Modeling

This Repo.

3. A Quick Implementation of Classification-based Reward Models

This Repo.

4. Exciting Future Works!

  • (Input) More RM data formats other than (pairwise) preferences?
  • (Input) Optimizing the embeddings for discriminative tasks?
  • (Objective) Beyond order consistency --- partial order consistency?

About

Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages