-
Notifications
You must be signed in to change notification settings - Fork 4
feat: Enhance sglang adapter and OpenAI api compatibility #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
f79e2db
d909fac
ccac968
519a8ac
c2a230f
fb19769
f3c6a58
479f6a8
7e25f8f
dd3cb49
733307d
a6ae983
157de98
c980856
249d21a
d25196e
98641f4
41e8023
89ea457
61e8585
3e866a4
d65fafa
45379ae
ea4d4f6
d43f8f8
4721925
68ea45a
86156dd
7d8c495
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,96 @@ | ||||||
| # Dataset Preset Testing | ||||||
|
|
||||||
| Unit tests for dataset preset transforms. These tests verify that presets correctly transform dataset columns without requiring end-to-end benchmark runs. | ||||||
|
|
||||||
| ## Quick Start | ||||||
|
|
||||||
| ```bash | ||||||
| # Run all preset tests | ||||||
| pytest tests/unit/dataset_manager/test_dataset_presets.py -v | ||||||
|
|
||||||
| # Run tests for a specific dataset | ||||||
| pytest tests/unit/dataset_manager/test_dataset_presets.py::TestCNNDailyMailPresets -v | ||||||
|
|
||||||
| # Exclude slow tests (Harmonize transform requires transformers) | ||||||
| pytest tests/unit/dataset_manager/test_dataset_presets.py -m "not slow" -v | ||||||
| ``` | ||||||
|
Comment on lines
+14
to
+16
|
||||||
|
|
||||||
| ## Preset Coverage | ||||||
|
|
||||||
| | Dataset | Presets | Tests | | ||||||
| | ------------- | ------------------------------- | ----- | | ||||||
| | CNNDailyMail | `llama3_8b`, `llama3_8b_sglang` | 6 | | ||||||
|
||||||
| | CNNDailyMail | `llama3_8b`, `llama3_8b_sglang` | 6 | | |
| | CNNDailyMail | `llama3_8b`, `llama3_8b_sglang` | 5 | |
Copilot
AI
Mar 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table’s CNNDailyMail test count appears incorrect. tests/unit/dataset_manager/test_dataset_presets.py currently defines 5 tests under TestCNNDailyMailPresets (3 regular + 2 @pytest.mark.slow), not 6. Please update the count (or remove the numeric column) so the doc stays accurate.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| # Offline Throughput Benchmark | ||
| name: "offline-llama3.1-8b-cnn-benchmark" | ||
| version: "1.0" | ||
| type: "offline" | ||
|
|
||
| model_params: | ||
| name: "meta-llama/Llama-3.1-8B-Instruct" # Path to the model | ||
| temperature: 0.0 | ||
| top_p: 1.0 | ||
| max_new_tokens: 128 | ||
|
|
||
| datasets: | ||
| - name: cnn_dailymail::llama3_8b_sglang | ||
| type: accuracy | ||
| samples: 13368 | ||
| parser: | ||
| input: prompt | ||
| accuracy_config: | ||
| eval_method: "rouge" | ||
| ground_truth: "highlights" | ||
| extractor: identity_extractor | ||
| num_repeats: 1 | ||
|
Comment on lines
+16
to
+22
|
||
| - name: cnn_dailymail::llama3_8b_sglang | ||
| type: "performance" | ||
| samples: 13368 | ||
| parser: | ||
| input: prompt | ||
|
|
||
| settings: | ||
| runtime: | ||
| min_duration_ms: 60000 # 1 minute | ||
| max_duration_ms: 360000 # 6 minutes (Arbitrary here, and doesn't have counterpart in legacy loadgen) | ||
| scheduler_random_seed: 137 # For Poisson/distribution sampling | ||
| dataloader_random_seed: 111 # For dataset shuffling (Will be updated after rng seeds are finalized for submission) | ||
| n_samples_to_issue: 13368 # Number of samples to issue (for offline, this should match the dataset samples) | ||
|
|
||
| load_pattern: | ||
| type: "max_throughput" | ||
|
|
||
| client: | ||
| workers: 4 # Number of client workers | ||
|
|
||
| metrics: | ||
| collect: | ||
| - "throughput" | ||
| - "latency" | ||
| - "ttft" | ||
| - "tpot" | ||
|
|
||
| endpoint_config: | ||
| endpoints: | ||
| - "http://localhost:8080" | ||
| api_type: "sglang" | ||
| api_key: null | ||
|
|
||
| report_dir: logs/llama3_8b_cnn_sglang_offline # Directory to save the benchmark report | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ | |
|
|
||
| from inference_endpoint.dataset_manager.transforms import ( | ||
| AddStaticColumns, | ||
| Harmonize, | ||
| Transform, | ||
| UserPromptFormatter, | ||
| ) | ||
|
|
@@ -48,3 +49,28 @@ def llama3_8b( | |
| ), | ||
| AddStaticColumns(chat_template), | ||
| ] | ||
|
|
||
|
|
||
| def llama3_8b_sglang( | ||
| stream: bool = True, | ||
| max_new_tokens: int = 128, | ||
| temperature: float = 0.0, | ||
| top_p: float = 1.0, | ||
| top_k: int = 1, | ||
| tokenizer_name: str = "meta-llama/Llama-3.1-8B-Instruct", | ||
| ) -> list[Transform]: | ||
| return [ | ||
| # Step 1: Format the prompt from "article" | ||
| UserPromptFormatter( | ||
| user_prompt_format=f"Summarize the following news article in {max_new_tokens} tokens. Please output the summary only, without any other text.\n\nArticle:\n{{article}}\n\nSummary:", | ||
| output_column="prompt", | ||
| ), | ||
| # Step 2: Tokenize the raw prompt via Harmonize in plain mode. | ||
| Harmonize( | ||
| tokenizer_name=tokenizer_name, | ||
| prompt_column="prompt", | ||
| tokenized_column="input_tokens", | ||
| harmonized_column=None, | ||
| mode="plain", | ||
| ), | ||
| ] | ||
|
Comment on lines
+69
to
+76
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you clarify why we need harmonize with
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So here we just want to use the Harmonizer to generate the tokenized inputs ( I could also add a new transform say a Tokenizer transform to do just that (generating tokenized inputs), but only wanted to refactor existing implementations wherever possible. If this sounds more straightforward I can leave the Harmonizer as is and instead add the tokenizing transform. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -137,6 +137,7 @@ def __init__( | |
| prompt_column: str = "prompt", | ||
| tokenized_column: str = "input_tokens", | ||
| harmonized_column: str | None = "harmonized_prompt", | ||
| mode: str = "harmony", | ||
| ): | ||
| """Initialize the Harmonize transform. | ||
|
|
||
|
|
@@ -149,10 +150,14 @@ def __init__( | |
| tokenized_column: The name of the column containing the tokenized prompt. | ||
| harmonized_column: The name of the column containing the harmonized prompt. If None, | ||
| the harmonized prompt will not be stored as text. | ||
| mode: "harmony" to render a Harmony conversation; "plain" to tokenize the raw prompt. | ||
| """ | ||
| self.prompt_column = prompt_column | ||
| self.tokenized_column = tokenized_column | ||
| self.harmonized_column = harmonized_column | ||
| self.mode = mode | ||
| if self.mode not in {"harmony", "plain"}: | ||
| raise ValueError(f"Invalid harmonize mode: {self.mode}") | ||
|
Comment on lines
140
to
+160
|
||
| self.harmonizer = Harmonizer( | ||
| tokenizer_name=tokenizer_name, | ||
| encoding_name=encoding_name, | ||
|
Comment on lines
+158
to
163
|
||
|
|
@@ -175,7 +180,19 @@ def process_row(self, row: dict[str, Any]) -> dict[str, Any]: | |
| Returns: | ||
| Row dictionary with the harmonized prompt added | ||
| """ | ||
| row[self.tokenized_column] = self.harmonizer(row[self.prompt_column]) | ||
| # Guard pre-tokenized rows: the SGLang adapter adds a default Harmonize | ||
| # (GPT-OSS tokenizer + harmony mode). When row processors are fused, the | ||
| # dataframe-level skip is bypassed, so without this guard, adapter | ||
| # Harmonize would overwrite input tokens. Alternative: remove Harmonize | ||
| # from the adapter transforms and require each SGLang preset to add its | ||
| # own Harmonize with the desired tokenizer/args. | ||
| if self.tokenized_column in row and row[self.tokenized_column] is not None: | ||
| return row | ||
|
Comment on lines
+189
to
+190
|
||
| if self.mode == "plain": | ||
| tokens = self.harmonizer.to_tokens(row[self.prompt_column]) | ||
| row[self.tokenized_column] = tokens | ||
| else: | ||
| row[self.tokenized_column] = self.harmonizer(row[self.prompt_column]) | ||
|
Comment on lines
140
to
+195
|
||
| if self.harmonized_column is not None: | ||
|
Comment on lines
+183
to
196
|
||
| row[self.harmonized_column] = self.harmonizer.to_text( | ||
| row[self.tokenized_column] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note says slow tests are excluded because Harmonize “requires transformers”, but
transformersis already a core dependency in this repo; the main reason to mark these slow is usually that they can trigger tokenizer/model downloads and be network-dependent. Consider rewording to reflect that.