-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
Hello author, In the "Batch Size" section of the KTO paper, it mentions that "KTO needs a microbatch size ≥ 2 to estimate the reference point in a single step. The experiments in this paper all use an effective batch size of 32, and in general we recommend using a batch size between 8 and 128." In the code repository, I noticed that when using four GPUs with the KTO algorithm, the batch size is set to 32 (i.e., a microbatch size of 8 per GPU). Does the "effective batch size of 32" mentioned in the paper refer to the macrobatch size being 32?
Metadata
Metadata
Assignees
Labels
No labels