[Dataset]: Iterate through benchmark dataset once #48

parfeniukink · 2024-09-06T14:44:06Z

issue link: #44

Summary:

--max-requests CLI parameter now supports the dataset value.
MaxRequestsType(ParamType) is used as a custom click param type to validate input data
RequestGenerator(ABC) now includes the abstract __len__ that corresponds to the length of the dataset if it is supported.
Unit tests are added
- Smoke tests are added
guidellm/utils/text.py is fixed. There is no reason to check if the file start with the 'http' string. Basically this is a Bug since you never read a text file properly because of old if/else condition.

parfeniukink · 2024-09-06T14:49:55Z

@philschmid Hey there. Could you install the version of the branch parfeniukink/issues/44 and tell me if works for you?

markurtz

Looks good, minor NIT on removing typing overrides and we should add in a test case for the main pathway for this.

Two additional points on testing:

For automation, we should add in a quick test to ensure for the sweep case that we looped the dataset correctly and on start of a new benchmark it is the first item from the dataset -- can be merged in with the previous main tests
For manual, please validate locally that calling with a hugging face dataset as well as a simple local csv file (can download the csv from the files for the linked dataset) will work properly
- Running a constant rate will execute the exact number of requests as in the dataset along with the first and last prompts sent matching the first and last in the dataset
- Running a sweep with a smaller (10 prompt) csv file will properly cycle and each benchmark run starts with the same beginning prompt and ends with the end prompt

tests/unit/test_main.py

src/guidellm/request/transformers.py

* Main CLI parameters are updated * `MaxRequestsType(ParamType)` is used as a custom `click` param type * Request generators are updated * Unit testes are added * Smoke tests are added

philschmid · 2024-10-08T09:50:44Z

Hey Guys, any idea on when we could get this merged?

parfeniukink self-assigned this Sep 6, 2024

parfeniukink mentioned this pull request Sep 6, 2024

[Dataset]: Iterate through benchmark dataset once #44

Closed

parfeniukink requested a review from markurtz September 9, 2024 06:51

parfeniukink linked an issue Sep 9, 2024 that may be closed by this pull request

[Dataset]: Iterate through benchmark dataset once #44

Closed

markurtz requested changes Sep 11, 2024

View reviewed changes

tests/unit/test_main.py Show resolved Hide resolved

src/guidellm/request/transformers.py Outdated Show resolved Hide resolved

parfeniukink force-pushed the parfeniukink/issues/44 branch from 9c077ab to 51ceaa8 Compare September 11, 2024 11:59

parfeniukink requested a review from markurtz September 11, 2024 22:05

Dmytro Parfeniuk added 7 commits September 16, 2024 10:43

✨ 'dataset' parameter introduced for the --max-requests CLI parameter

256201c

* Main CLI parameters are updated * `MaxRequestsType(ParamType)` is used as a custom `click` param type * Request generators are updated * Unit testes are added * Smoke tests are added

'dataset' as max-requests validation is added

2ff6b7b

💚 Linter is happy

50d9cc2

🔥 Removed unnecessary noqa and type: ignore

f9e27e4

🚧 WIP

d5a682c

✅ File Request Generator unit tests

4fffc9d

🔥 Removed unused file

656804f

parfeniukink force-pushed the parfeniukink/issues/44 branch from a02561f to 656804f Compare September 16, 2024 07:43

parfeniukink requested a review from sdake October 8, 2024 11:55

markurtz approved these changes Oct 8, 2024

View reviewed changes

markurtz merged commit ecf2984 into main Oct 8, 2024
9 checks passed

markurtz deleted the parfeniukink/issues/44 branch October 8, 2024 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dataset]: Iterate through benchmark dataset once #48

[Dataset]: Iterate through benchmark dataset once #48

Uh oh!

parfeniukink commented Sep 6, 2024 •

edited

Loading

Uh oh!

parfeniukink commented Sep 6, 2024

Uh oh!

markurtz left a comment

Uh oh!

Uh oh!

Uh oh!

philschmid commented Oct 8, 2024

Uh oh!

Uh oh!

Uh oh!

[Dataset]: Iterate through benchmark dataset once #48

[Dataset]: Iterate through benchmark dataset once #48

Uh oh!

Conversation

parfeniukink commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Uh oh!

parfeniukink commented Sep 6, 2024

Uh oh!

markurtz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

philschmid commented Oct 8, 2024

Uh oh!

Uh oh!

Uh oh!

parfeniukink commented Sep 6, 2024 •

edited

Loading