Embedding-based LLM Alignment:

A Minimalist, Efficient, and Effective Infrastructure of Reward Modeling Research.

Codebase for report Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs

🚀 Example Usage

# Specify Task, Embedding Model, Response Generation Model
args.task = 'Harmless'
args.res_gen_model = 'Gemma2b-sft'
args.embed_model = 'Gemma2b'

# Load Training Data
train_embeddings, train_rewards = load_embd_data(task=args.task, res_gen_model=args.res_gen_model, embed_model=args.embed_model, split='train') 
### train_embeddings.shape = (40000, 10, 2048), 40000 prompts, 10 responses for each prompt, Gemma2b has a 2048-dim embedding space
### train_rewards.shape = (40000, 10, 1), corresponding reward

# Load Testing Data
test_embeddings, test_rewards = load_embd_data(task=args.task, res_gen_model=args.res_gen_model, embed_model=args.embed_model, split='test')
### test_embeddings.shape = (2000, 500, 2048)
### test_rewards.shape = (2000, 500, 1)

# Generation of Pairwise Comparisons
train_comparisons, train_labels = pair_annotate(train_embeddings, train_rewards, annotation_quality = 0.1)
# annotation noise can be adjusted through "annotation_quality"

# Train Embedding-based Reward Model (e.g., use a Bradley-Terry MLP)
reward_model = BT_MLP()
reward_model.fit(train_comparisons, train_labels)

# Make Predictions with the Reward Model on Testset
rm_predictions = reward_model.predict(test_embeddings)
print(rm_predictions.shape) 
### (2000, 500, 1)

# Calculate Evaluation Metrics on Testset
bon_500 = calc_bon(rm_predictions, test_rewards, N=500)
spearmanr = calc_spearmanr(rm_predictions, test_rewards)

🔨 Build (TBD)

pip install

📊 Embedding Data Downloading

Here is a Google Drive link for a single experiment setup, which is about 10GB. It can be used for a quick start/reproduction:

Google Drive Link (10GB)

The full 300GB embedding files can be found at:

Google Drive Link (300GB)

Demonstrative Use Cases (TBD)

4. Exciting Future Works!

(Input) More RM data formats other than (pairwise) preferences?
(Input) Optimizing the embeddings for discriminative tasks?
(Objective) Beyond order consistency --- partial order consistency?

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
load_data.py		load_data.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Embedding-based LLM Alignment:

A Minimalist, Efficient, and Effective Infrastructure of Reward Modeling Research.

🚀 Example Usage

🔨 Build (TBD)

📊 Embedding Data Downloading

Demonstrative Use Cases (TBD)

1. A Quick Implementation of Reward Model Ensemble

2. A Quick Implementation of Active Reward Modeling

3. A Quick Implementation of Classification-based Reward Models

4. Exciting Future Works!

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

holarissun/embedding-based-llm-alignment

Folders and files

Latest commit

History

Repository files navigation

Embedding-based LLM Alignment:

A Minimalist, Efficient, and Effective Infrastructure of Reward Modeling Research.

🚀 Example Usage

🔨 Build (TBD)

📊 Embedding Data Downloading

Demonstrative Use Cases (TBD)

1. A Quick Implementation of Reward Model Ensemble

2. A Quick Implementation of Active Reward Modeling

3. A Quick Implementation of Classification-based Reward Models

4. Exciting Future Works!

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages