Skip to content

[Dataset]: Iterate through benchmark dataset once #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 8, 2024

Conversation

parfeniukink
Copy link
Contributor

@parfeniukink parfeniukink commented Sep 6, 2024

issue link: #44

Summary:

  • --max-requests CLI parameter now supports the dataset value.
  • MaxRequestsType(ParamType) is used as a custom click param type to validate input data
  • RequestGenerator(ABC) now includes the abstract __len__ that corresponds to the length of the dataset if it is supported.
  • Unit tests are added
    • Smoke tests are added
  • guidellm/utils/text.py is fixed. There is no reason to check if the file start with the 'http' string. Basically this is a Bug since you never read a text file properly because of old if/else condition.

@parfeniukink
Copy link
Contributor Author

@philschmid Hey there. Could you install the version of the branch parfeniukink/issues/44 and tell me if works for you?

@parfeniukink parfeniukink linked an issue Sep 9, 2024 that may be closed by this pull request
Copy link
Member

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, minor NIT on removing typing overrides and we should add in a test case for the main pathway for this.

Two additional points on testing:

  • For automation, we should add in a quick test to ensure for the sweep case that we looped the dataset correctly and on start of a new benchmark it is the first item from the dataset -- can be merged in with the previous main tests
  • For manual, please validate locally that calling with a hugging face dataset as well as a simple local csv file (can download the csv from the files for the linked dataset) will work properly
    • Running a constant rate will execute the exact number of requests as in the dataset along with the first and last prompts sent matching the first and last in the dataset
    • Running a sweep with a smaller (10 prompt) csv file will properly cycle and each benchmark run starts with the same beginning prompt and ends with the end prompt

Dmytro Parfeniuk added 7 commits September 16, 2024 10:43
* Main CLI parameters are updated
* `MaxRequestsType(ParamType)` is used as a custom `click` param type
* Request generators are updated
* Unit testes are added
    * Smoke tests are added
@philschmid
Copy link
Contributor

Hey Guys, any idea on when we could get this merged?

@parfeniukink parfeniukink requested a review from sdake October 8, 2024 11:55
@markurtz markurtz merged commit ecf2984 into main Oct 8, 2024
9 checks passed
@markurtz markurtz deleted the parfeniukink/issues/44 branch October 8, 2024 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Dataset]: Iterate through benchmark dataset once
3 participants