- 
                Notifications
    
You must be signed in to change notification settings  - Fork 92
 
Feat/max error rate #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/max error rate #171
Changes from all commits
b7638b0
              6059af1
              69a5c9e
              7795d2c
              6d688f0
              ede651a
              a17117c
              34cb6b6
              6289c07
              d5ee018
              ce13ef7
              9a68a76
              6dd313d
              3697b30
              2fe64c7
              b54ab14
              b502c94
              332ef08
              c2fd813
              4bda8cf
              4b857f1
              0d89a39
              26319a5
              09925a4
              fa56258
              2889dce
              3361d2f
              c134f66
              464ebe3
              883593a
              35abac7
              55cf718
              1bc8f9a
              9945710
              f11da24
              6c6c15a
              039db66
              640744c
              5783d62
              85cb24d
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| 
          
            
          
           | 
    @@ -147,6 +147,8 @@ The `guidellm benchmark` command is used to run benchmarks against a generative | |||||
| 
     | 
||||||
| - `--max-requests`: Sets the maximum number of requests for each benchmark run. If not provided, the benchmark will run until `--max-seconds` is reached or the dataset is exhausted. | ||||||
| 
     | 
||||||
| - `--max-error-rate`: The maximum error rate after which a benchmark will stop. Applicable only for finite deterministic scenarios i.e `rate_type` is `constant` and `--max-seconds` exists OR `--max-requests` exists OR the dataset is finite. If `--max-error-rate` is `None` or not applicable, benchmarks will continue regardless of error rate. | ||||||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 
        Suggested change
       
    
  | 
||||||
| 
     | 
||||||
| - `--warmup-percent`: Specifies the percentage of the benchmark to treat as a warmup phase. Requests during this phase are excluded from the final results. | ||||||
| 
     | 
||||||
| - `--cooldown-percent`: Specifies the percentage of the benchmark to treat as a cooldown phase. Requests during this phase are excluded from the final results. | ||||||
| 
          
            
          
           | 
    ||||||
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| 
          
            
          
           | 
    @@ -90,6 +90,9 @@ class BenchmarkArgs(StandardBaseModel): | |
| max_duration: Optional[float] = Field( | ||
| description="The maximum duration in seconds to run this benchmark, if any." | ||
| ) | ||
| max_error: Optional[float] = Field( | ||
| description="Maximum error rate or const after which a benchmark will stop." | ||
| ) | ||
| warmup_number: Optional[int] = Field( | ||
| description=( | ||
| "The number of requests to run for the warmup phase of this benchmark, " | ||
| 
          
            
          
           | 
    @@ -213,6 +216,15 @@ class BenchmarkRunStats(StandardBaseModel): | |
| "it was completed." | ||
| ) | ||
| ) | ||
| error_rate: float = Field( | ||
| description=( | ||
| "The number of errored requests divided by the number " | ||
| "of successful and errored requests. " | ||
| "This can be higher than max_error " | ||
| "(if applicable) cause it does not take into " | ||
| "account incomplete requests." | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not following this point, the error rate we calculate and compare to should never include incomplete, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't max_error also not looking at incomplete requests, though? How would it be higher for error_rate vs max_error since those should be based off the same calculations, right?  | 
||
| ) | ||
| ) | ||
| 
     | 
||
| 
     | 
||
| class BenchmarkMetrics(StandardBaseModel): | ||
| 
          
            
          
           | 
    ||
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| 
          
            
          
           | 
    @@ -19,11 +19,16 @@ | |
| __all__ = [ | ||
| "GenerativeRequestLoader", | ||
| "GenerativeRequestLoaderDescription", | ||
| "GetInfiniteDatasetLengthError", | ||
| "RequestLoader", | ||
| "RequestLoaderDescription", | ||
| ] | ||
| 
     | 
||
| 
     | 
||
| class GetInfiniteDatasetLengthError(Exception): | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this exception and logic is no longer needed since if we can get the length then we go through a specific logic route and if we can't, then we fallback on the window  | 
||
| pass | ||
| 
     | 
||
| 
     | 
||
| class RequestLoaderDescription(StandardBaseModel): | ||
| type_: Literal["request_loader"] = "request_loader" | ||
| 
     | 
||
| 
          
            
          
           | 
    @@ -120,7 +125,11 @@ def __len__(self) -> int: | |
| if self.iter_type == "finite": | ||
| return self.num_unique_items() | ||
| 
     | 
||
| raise ValueError(f"Unable to determine length of dataset: {self.data}") | ||
| if self.iter_type != "infinite": | ||
| raise ValueError(f"Invalid iter_type {self.iter_type}") | ||
| raise GetInfiniteDatasetLengthError( | ||
| f"Dataset {self.data} is infinite and thus unable to determine length" | ||
| ) | ||
| 
     | 
||
| @property | ||
| def description(self) -> GenerativeRequestLoaderDescription: | ||
| 
          
            
          
           | 
    ||
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -1,3 +1,4 @@ | ||
| from collections import deque | ||
| from typing import ( | ||
| Generic, | ||
| Literal, | ||
| 
        
          
        
         | 
    @@ -16,6 +17,9 @@ | |
| ] | ||
| 
     | 
||
| 
     | 
||
| RequestStatus = Literal["success", "error"] | ||
| 
     | 
||
| 
     | 
||
| class SchedulerRunInfo(StandardBaseModel): | ||
| """ | ||
| Information about the current run of the scheduler. | ||
| 
          
            
          
           | 
    @@ -46,12 +50,15 @@ class SchedulerRunInfo(StandardBaseModel): | |
| end_number: float | ||
| processes: int | ||
| strategy: SchedulingStrategy | ||
| last_requests_statuses: deque[RequestStatus] | ||
| 
         There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can simplify the logic here so we don't need to add this in especially since this list can be come very long and it is not json serializable so would break any pydantic serialization in the event someone wants to save that state. See note later down where that logic is located  | 
||
| max_error: Optional[float] = None | ||
| 
     | 
||
| created_requests: int = 0 | ||
| queued_requests: int = 0 | ||
| scheduled_requests: int = 0 | ||
| processing_requests: int = 0 | ||
| completed_requests: int = 0 | ||
| errored_requests: int = 0 | ||
| 
     | 
||
| 
     | 
||
| class SchedulerRequestInfo(StandardBaseModel): | ||
| 
          
            
          
           | 
    ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need these reenabled. If you ever need to push and skip recommit, you can pass in the --no-verify flag with your git commit command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @sjmonson just landed a diff that might clear up some of the issues you were seeing