Skip to content

yale-nlp/RLRR

Repository files navigation

RefEval

This repository is for our work "References Improve LLM Alignment in Non-Verifiable Domains."

Outline

How to Run

Installation and Requirements

Please run pip install -r requirements.txt to install the required packages.

For training, you will need at least 8 GPUs with 48GB of memory each. The code is tested on a machine with 8 NVIDIA A6000 Ada GPUs.

Running Experiments

To run the self-improvement training experiment, please use the following command: bash self_improve.sh.

The script self_improve.sh performs preference optimization using LLM-judge to self-improve with the following steps:

  1. Sampling candidate outputs from the LLM.
  2. Scoring the candidate outputs using the model itself as a judge.
  3. Data processing and precomputing the log probabilities of the output pairs.
  4. Training: traning the LLM using DPO.

File Structure

Files

Directories

  • data: Contains the training data, which will be provided in the future.
  • exps: Contains the results of the experiments. A new directory is created for each experiment, with the name specified in self_improve.sh.
  • prompts: Contains the prompts used by the LLM-judge.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published