In CAPO, the block size for racing is controlled by a single parameter. The (sensible) default is 30, and we should not go below it for the statistical test to hold also during the first evaluation of a prompt. However, in very expensive settings (e.g., expensive/time-consuming reward computation), we might want to increase evaluations during racing by fewer than 30 blocks.
My suggestion is to somehow "decouple" the amount of initial evaluations required and the block size more. For example, we could have two parameters
block_size: how many evaluations are added in each racing iteration
init_block_evals: how many blocks are used for the first evaluation of a prompt
In my setting, we could set block_size=5 and init_block_evals=6 ($6 \times 5 = 30$). The statistical test would still be always valid, and we can increase evaluations more finely during racing.
In CAPO, the block size for racing is controlled by a single parameter. The (sensible) default is 30, and we should not go below it for the statistical test to hold also during the first evaluation of a prompt. However, in very expensive settings (e.g., expensive/time-consuming reward computation), we might want to increase evaluations during racing by fewer than 30 blocks.
My suggestion is to somehow "decouple" the amount of initial evaluations required and the block size more. For example, we could have two parameters
block_size: how many evaluations are added in each racing iterationinit_block_evals: how many blocks are used for the first evaluation of a promptIn my setting, we could set$6 \times 5 = 30$ ). The statistical test would still be always valid, and we can increase evaluations more finely during racing.
block_size=5andinit_block_evals=6(