Skip to content

Conversation

@eyurtsev
Copy link
Collaborator

This PR adds a simple rate limiter based on a token bucket.

I would love to extend RunnableBinding with this, we just need to make
sure there's no funny async vs. threading business.

This should be sufficient for benchmarking for now.

@eyurtsev eyurtsev requested a review from hinthornw December 12, 2023 16:31
self,
task: ToolUsageTask,
*,
model: str = "gpt-3.5-turbo-16k",
Copy link
Collaborator

@hinthornw hinthornw Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooc: why do we need a default here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove in a separate PR -- probably shouldn't be here

Copy link
Collaborator

@hinthornw hinthornw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions/comments mainly around docs and communication. The algorithm looks fine to me. May be good to add tests

self.available_tokens += elapsed * self.requests_per_second
self.last = now

self.available_tokens = min(self.available_tokens, self.requests_per_second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For noobs, could we add a comment saying something like "requests_per_second is also the maximum bucket size?

fractions of a second.
"""
if requests_per_second < 1:
raise ValueError("Rate must be at least 1 request per second")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the lazy user who doesn't read the docstring, thoughts on adding a more instructional recommendation here? I naively would assume I could let there be 1/4 requests per second, for instance

self,
*,
requests_per_second: float = 1,
tokens_per_request: int = 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given our domain, could we clarify that tokens are NOT LLM tokens? I could imagine someone thinking this being the number of ~words we can send to the LLM per request. Maybe link to the token bucket algorithm for complete novices as well.

This is especially the case given that some providers rate limit based on e.g., prompt tokens or total tokens per minute/hour

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, should call requests_per_second something like tokens_per_second to make it more clear?

self.last: Optional[time.time] = None
self.check_every_n_seconds = check_every_n_seconds

def consume(self) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be private?

# at a given time.
self._consume_lock = threading.Lock()
# tokens per request sets how many tokens
self.tokens_per_request = tokens_per_request
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to check that tokens_per_request <= requests_per_second ?

Given that we are treating requests_per_second as the max bucket size, we'd never consume tokens if that condition doesn't hold, right?

return input

return (
RunnableLambda(_rate_limited_passthrough).with_config({"name": "Rate Limit"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also just have the function name and docstring be prettier to make the default name nice. The function is rendered in the metadata

@eyurtsev eyurtsev requested a review from hinthornw December 12, 2023 19:42
@eyurtsev eyurtsev merged commit 14de11a into main Dec 13, 2023
@eyurtsev eyurtsev deleted the eugene/add_rate_limiter branch December 13, 2023 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants