Skip to content

Conversation

@soldni
Copy link
Member

@soldni soldni commented Nov 17, 2025

Wildchat and Utrachat PPL don't have task aliases. This adds a quick hack to get around the issue by dropping hash.


Note

Normalize wildchat/ultrachat masked PPL task aliases by stripping known hash suffixes and setting the base alias in task metadata.

  • Eval results processing:
    • Hotfix: normalize wildchat_masked_ppl-67b0e9 and ultrachat_masked_ppl-831470 by stripping the hash suffix and setting task_config.metadata.alias to the base alias (wildchat_masked_ppl, ultrachat_masked_ppl).

Written by Cursor Bugbot for commit 4de343e. This will update automatically on new commits. Configure here.

@soldni soldni requested a review from davidheineman November 17, 2025 01:06
# so hash gets added at the end the name. We strip if the hash matches the known one.
if metric.alias in {'wildchat_masked_ppl-67b0e9', 'ultrachat_masked_ppl-831470'}:
metric_alias, _ = metric.alias.rsplit('-', 1)
metric.task_config.setdefault("metadata", {}).setdefault("alias", metric_alias)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Alias Overwrite Fails with setdefault

Using setdefault for the alias key won't override an existing value. If metric.task_config["metadata"]["alias"] already contains one of the hash-suffixed names like wildchat_masked_ppl-67b0e9, the setdefault call won't replace it with the cleaned version, causing the assertion on line 111 to fail. Direct assignment ["alias"] = metric_alias is needed instead of .setdefault("alias", metric_alias) to ensure the cleaned alias always replaces any existing value.

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not an issue. if the task had an alias already, we wouldn't need this trick.

Signed-off-by: Luca Soldaini <lucas@allenai.org>
@davidheineman
Copy link
Member

Ah 😅 thanks for adding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants