Skip to content

Decouple first prompt evaluation and block size in CAPO #66

Description

@timo282

In CAPO, the block size for racing is controlled by a single parameter. The (sensible) default is 30, and we should not go below it for the statistical test to hold also during the first evaluation of a prompt. However, in very expensive settings (e.g., expensive/time-consuming reward computation), we might want to increase evaluations during racing by fewer than 30 blocks.

My suggestion is to somehow "decouple" the amount of initial evaluations required and the block size more. For example, we could have two parameters

  • block_size: how many evaluations are added in each racing iteration
  • init_block_evals: how many blocks are used for the first evaluation of a prompt

In my setting, we could set block_size=5 and init_block_evals=6 ($6 \times 5 = 30$). The statistical test would still be always valid, and we can increase evaluations more finely during racing.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions