-
-
Couldn't load subscription status.
- Fork 10.9k
[CI] Revert back prepare_prompts and check_answers #25087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -8,6 +8,7 @@ | |||||
| import importlib | ||||||
| import json | ||||||
| import os | ||||||
| import random | ||||||
| import signal | ||||||
| import subprocess | ||||||
| import sys | ||||||
|
|
@@ -1150,3 +1151,49 @@ def override_cutlass_fp8_supported(value: bool): | |||||
| "vllm.model_executor.layers.quantization.utils.w8a8_utils.cutlass_fp8_supported", | ||||||
| return_value=value): | ||||||
| yield | ||||||
|
|
||||||
|
|
||||||
| def prep_prompts(batch_size: int, ln_range: tuple[int, int] = (800, 1100)): | ||||||
| """ | ||||||
| Generate prompts which a bunch of assignments, | ||||||
| then asking for the value of one of them. | ||||||
| The prompt is just under 10k tokens; sliding window is 4k | ||||||
| so the answer is outside sliding window, but should still be correct. | ||||||
| Args: | ||||||
| batch_size: number of prompts to generate | ||||||
| ln_range: an argument to control the length of the prompt | ||||||
| """ | ||||||
| prompts: list[str] = [] | ||||||
| answer: list[int] = [] | ||||||
| indices: list[int] = [] | ||||||
| random.seed(1) | ||||||
| for _ in range(batch_size): | ||||||
| idx = random.randint(30, 90) | ||||||
| indices.append(idx) | ||||||
| prompt = "```python\n# We set a number of variables, " + \ | ||||||
| f"x{idx} will be important later\n" | ||||||
| ln = random.randint(*ln_range) | ||||||
| for k in range(30, ln): | ||||||
| v = random.randint(10, 99) | ||||||
| if k == idx: | ||||||
| answer.append(v) | ||||||
| prompt += f"x{k} = {v}\n" | ||||||
| prompt += f"# Now, we check the value of x{idx}:\n" | ||||||
| prompt += f"assert x{idx} == " | ||||||
| prompts.append(prompt) | ||||||
| return prompts, answer, indices | ||||||
|
|
||||||
|
|
||||||
| def check_answers(indices: list[int], | ||||||
| answer: list[int], | ||||||
| outputs: list[str], | ||||||
| accept_rate: float = 0.7): | ||||||
| answer2 = [int(text[0:2].strip()) for text in outputs] | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current logic for parsing the model's output is brittle. If the output has leading whitespace, for example A more robust approach is to strip whitespace from the whole string first, then split it and take the first part. This correctly handles leading/trailing whitespace. Note that this can still raise an
Suggested change
|
||||||
| print(list(zip(indices, zip(answer, answer2)))) | ||||||
| numok = 0 | ||||||
| for a1, a2 in zip(answer, answer2): | ||||||
| if a1 == a2: | ||||||
| numok += 1 | ||||||
| frac_ok = numok / len(answer) | ||||||
| print(f"Num OK: {numok}/{len(answer)} {frac_ok}") | ||||||
| assert frac_ok >= accept_rate | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
random.seed()has a global effect and can interfere with other tests that rely on the random number generator. It's better to use an isolatedrandom.Randominstance for deterministic generation within this function.Additionally, the prompt string construction can be simplified into a single f-string for better readability.