-
-
Couldn't load subscription status.
- Fork 10.8k
[CI] Revert back prepare_prompts and check_answers #25087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request reverts a previous change, moving the prep_prompts and check_answers test utility functions back into tests/utils.py.
I've reviewed the reverted code and found two high-severity issues:
- In
prep_prompts, the use ofrandom.seed()affects the global random state, which can lead to flaky tests. I've suggested using an isolatedrandom.Randominstance to ensure deterministic behavior without side effects. - In
check_answers, the logic for parsing model outputs is brittle and can fail with leading whitespace. I've proposed a more robust parsing method.
These changes will improve the reliability and maintainability of the test suite.
| random.seed(1) | ||
| for _ in range(batch_size): | ||
| idx = random.randint(30, 90) | ||
| indices.append(idx) | ||
| prompt = "```python\n# We set a number of variables, " + \ | ||
| f"x{idx} will be important later\n" | ||
| ln = random.randint(*ln_range) | ||
| for k in range(30, ln): | ||
| v = random.randint(10, 99) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using random.seed() has a global effect and can interfere with other tests that rely on the random number generator. It's better to use an isolated random.Random instance for deterministic generation within this function.
Additionally, the prompt string construction can be simplified into a single f-string for better readability.
| random.seed(1) | |
| for _ in range(batch_size): | |
| idx = random.randint(30, 90) | |
| indices.append(idx) | |
| prompt = "```python\n# We set a number of variables, " + \ | |
| f"x{idx} will be important later\n" | |
| ln = random.randint(*ln_range) | |
| for k in range(30, ln): | |
| v = random.randint(10, 99) | |
| rng = random.Random(1) | |
| for _ in range(batch_size): | |
| idx = rng.randint(30, 90) | |
| indices.append(idx) | |
| prompt = f"```python\n# We set a number of variables, x{idx} will be important later\n" | |
| ln = rng.randint(*ln_range) | |
| for k in range(30, ln): | |
| v = rng.randint(10, 99) |
| answer: list[int], | ||
| outputs: list[str], | ||
| accept_rate: float = 0.7): | ||
| answer2 = [int(text[0:2].strip()) for text in outputs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic for parsing the model's output is brittle. If the output has leading whitespace, for example " 55" instead of "55", text[0:2] would be " 5", which incorrectly parses to 5.
A more robust approach is to strip whitespace from the whole string first, then split it and take the first part. This correctly handles leading/trailing whitespace.
Note that this can still raise an IndexError for empty outputs or a ValueError if the output is not a number, but this might be an acceptable risk for this test's context.
| answer2 = [int(text[0:2].strip()) for text in outputs] | |
| answer2 = [int(text.strip().split()[0]) for text in outputs] |
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
No description provided.