RefEval

This repository is for our work "References Improve LLM Alignment in Non-Verifiable Domains."

Outline

How to Run
- Installation and Requirements
- Running Experiments
Model Checkpoints
- Best Checkpoints
- All Checkpoints
File Structure
- Files
- Directories

How to Run

Installation and Requirements

Please run pip install -r requirements.txt to install the required packages.

For training, you will need at least 8 GPUs with 48GB of memory each. The code is tested on a machine with 8 NVIDIA A6000 Ada GPUs.

Running Experiments

To run the self-improvement training experiment, please use the following command: bash self_improve.sh.

The script self_improve.sh performs preference optimization using LLM-judge to self-improve with the following steps:

Sampling candidate outputs from the LLM.
Scoring the candidate outputs using the model itself as a judge.
Data processing and precomputing the log probabilities of the output pairs.
Training: traning the LLM using DPO.

File Structure

Files

self_improve.sh: Script for running the self-improvement training experiment.
data_processing.py: Contains the code for post-processing the preference model annotations into training data.
data_utils.py: Utility functions for training data loading.
get_logprobs.py: Script for extracting log probabilities from an LLM/policy.
losses.py: Loss functions.
dpo.py: DPO training.
mle.py: MLE training.
sampling.py: Sampling candidate outputs from an LLM.
scoring.py: Scoring output pairs using a preference model.
utils.py: Utility functions.
vllm_model.py: VLLM model definition.
`deepspeed.conf': Deepspeed configuration file.

Directories

data: Contains the training data, which will be provided in the future.
exps: Contains the results of the experiments. A new directory is created for each experiment, with the name specified in self_improve.sh.
prompts: Contains the prompts used by the LLM-judge.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RefEval

Outline

How to Run

Installation and Requirements

Running Experiments

File Structure

Files

Directories

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
exps		exps
prompts		prompts
README.md		README.md
data_processing.py		data_processing.py
data_utils.py		data_utils.py
deepspeed.conf		deepspeed.conf
dpo.py		dpo.py
get_logprobs.py		get_logprobs.py
losses.py		losses.py
mle.py		mle.py
requirements.txt		requirements.txt
sampling.py		sampling.py
scoring.py		scoring.py
self_improve.sh		self_improve.sh
utils.py		utils.py
vllm_models.py		vllm_models.py

yale-nlp/RLRR

Folders and files

Latest commit

History

Repository files navigation

RefEval

Outline

How to Run

Installation and Requirements

Running Experiments

File Structure

Files

Directories

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages