FSPO: Few-Shot Preference Optimization

Code for FSPO: Few-Shot Preference Optimization of Synthetic Data Elicits LLM Personalization to Real Users.

What is in this repo?

This repo provides code to train personalized models using FSPO. It is built on top of Eric Mitchell's DPO codebase. The core modifications are in the preference_datasets.py file which contains dataloaders for FSPO. Additionally, due to the increased lengths of prompts with FSPO, we utilize Flash Attention to speed up training.

Setup

Create a venv with requirements listed in requirements.txt, ideally with python 3.12.

conda create --name FSPO python=3.12
source activate FSPO
pip install -r requirements.txt

Additionally, set the HF_TOKEN and WANDB_API_KEY environment variables.

Running

To train a model, use the directpreferenceoptimization codebase. We added a dataloader so you can pass in the path to the preference dataset as a dataset.

python -u train.py model=llama3-2-3b datasets=[roleplay] n_epochs=1 loss=sft lr=1e-7 exp_name=roleplay_prefft trainer=FSDPTrainer sample_during_eval=false eval_every=10000  do_first_eval=false debug=false wandb.project=personalization batch_size=4 max_prompt_length=8192 max_length=8192 eval_batch_size=4

python -u train.py model=llama3-2-3b datasets=[roleplay] n_epochs=1 loss=ipo lr=1e-6 loss.beta=0.01 exp_name=roleplay_ipo trainer=FSDPTrainer sample_during_eval=false eval_every=10000  do_first_eval=false debug=false wandb.project=personalization batch_size=4 max_prompt_length=8192 max_length=8192 eval_batch_size=4 model.archive=/PATH_TO_SFT_OUTPUT/LATEST/policy.pt

Data

Along with this codebase, we also release the following datasets on HuggingFace:

BibTeX

@misc{singh2025fspofewshotpreferenceoptimization,
      title={FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users}, 
      author={Anikait Singh and Sheryl Hsu and Kyle Hsu and Eric Mitchell and Stefano Ermon and Tatsunori Hashimoto and Archit Sharma and Chelsea Finn},
      year={2025},
      eprint={2502.19312},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.19312}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
directpreferenceoptimization		directpreferenceoptimization
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FSPO: Few-Shot Preference Optimization

What is in this repo?

Setup

Running

Data

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Asap7772/fewshot-preference-optimization

Folders and files

Latest commit

History

Repository files navigation

FSPO: Few-Shot Preference Optimization

What is in this repo?

Setup

Running

Data

BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages