Add rate limiter #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

eyurtsev merged 6 commits into main from eugene/add_rate_limiter

Dec 13, 2023

Collaborator

eyurtsev commented Dec 12, 2023

This PR adds a simple rate limiter based on a token bucket.

I would love to extend RunnableBinding with this, we just need to make
sure there's no funny async vs. threading business.

This should be sufficient for benchmarking for now.

eyurtsev added 3 commits

December 12, 2023 11:06

a7685dd

2db5587

035a8d4

eyurtsev requested a review from hinthornw

December 12, 2023 16:31

hinthornw reviewed

View reviewed changes

langchain_benchmarks/tool_usage/agents.py

    
                      self,

                      task: ToolUsageTask,

                      *,

                      model: str = "gpt-3.5-turbo-16k",

Collaborator

hinthornw Dec 12, 2023 •

edited

Loading

ooc: why do we need a default here?

Collaborator Author

eyurtsev Dec 12, 2023

I can remove in a separate PR -- probably shouldn't be here

hinthornw reviewed

View reviewed changes

Collaborator

hinthornw left a comment

Some questions/comments mainly around docs and communication. The algorithm looks fine to me. May be good to add tests

langchain_benchmarks/rate_limiting.py Outdated

    
                              self.available_tokens += elapsed * self.requests_per_second

                              self.last = now

                          self.available_tokens = min(self.available_tokens, self.requests_per_second)

Collaborator

hinthornw Dec 12, 2023

For noobs, could we add a comment saying something like "requests_per_second is also the maximum bucket size?

langchain_benchmarks/rate_limiting.py Outdated

    
                              fractions of a second.

                      """

                      if requests_per_second < 1:

                          raise ValueError("Rate must be at least 1 request per second")

Collaborator

hinthornw Dec 12, 2023

For the lazy user who doesn't read the docstring, thoughts on adding a more instructional recommendation here? I naively would assume I could let there be 1/4 requests per second, for instance

langchain_benchmarks/rate_limiting.py Outdated

    
                      self,

                      *,

                      requests_per_second: float = 1,

                      tokens_per_request: int = 1,

Collaborator

hinthornw Dec 12, 2023

Given our domain, could we clarify that tokens are NOT LLM tokens? I could imagine someone thinking this being the number of ~words we can send to the LLM per request. Maybe link to the token bucket algorithm for complete novices as well.

This is especially the case given that some providers rate limit based on e.g., prompt tokens or total tokens per minute/hour

Collaborator

hinthornw Dec 12, 2023

Similarly, should call requests_per_second something like tokens_per_second to make it more clear?

langchain_benchmarks/rate_limiting.py Outdated

    
                      self.last: Optional[time.time] = None

                      self.check_every_n_seconds = check_every_n_seconds

                  def consume(self) -> bool:

Collaborator

hinthornw Dec 12, 2023

Should this be private?

langchain_benchmarks/rate_limiting.py Outdated

    
                      # at a given time.

                      self._consume_lock = threading.Lock()

                      # tokens per request sets how many tokens

                      self.tokens_per_request = tokens_per_request

Collaborator

hinthornw Dec 12, 2023

Do we also need to check that tokens_per_request <= requests_per_second ?

Given that we are treating requests_per_second as the max bucket size, we'd never consume tokens if that condition doesn't hold, right?

langchain_benchmarks/rate_limiting.py Outdated

    
                      return input

                  return (

                      RunnableLambda(_rate_limited_passthrough).with_config({"name": "Rate Limit"})

Collaborator

hinthornw Dec 12, 2023

You could also just have the function name and docstring be prettier to make the default name nice. The function is rendered in the metadata

c84166d

akira reviewed

View reviewed changes

langchain_benchmarks/rate_limiting.py Show resolved Hide resolved

ad1014a

eyurtsev requested a review from hinthornw

December 12, 2023 19:42

bb9ad4c

eyurtsev merged commit 14de11a into main

eyurtsev deleted the eugene/add_rate_limiter branch

December 13, 2023 18:12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet