Open
Description
Thank you for your work as well as for the open source code (It is reproductive). I have a little question about Table 1 in this paper on arxiv. In the paper, $D^{bd}{val}$ and $D^{cl}{val}$ are used for the evaluation of the attack performance. Are these two subsets (with 1000 test samples) sampled from the openo1_sft_filter.json? I really didn't find the relevant information in the paper (Maybe it was my carelessness). I would be very grateful if you could respond.🤔
Metadata
Metadata
Assignees
Labels
No labels