KTOTrainer should work when actual batch size==1

https://github.com/huggingface/trl/blob/edabe0a2d8fdd790319ce8862bb8e17336b85df1/trl/trainer/kto_trainer.py#L662-L665

This check was introduced in #2153
However, the KL logits were calculated by unlinking `prompt_input_ids` and `answer_input_ids`, which means the KL term is not equivalent to the reward term.
Accordingly, `KTOTrainer` should work when the actual batch size is 1.

Thank you!




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KTOTrainer should work when actual batch size==1 #2554

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if args.per_device_train_batch_size <= 1:
	raise ValueError(
	"Actual (not effective) batch size must be > 1. KTO will not work properly because the KL term will be equivalent to the implied reward."
	)