Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return rate limit error according to ingestion rate strategy #3485

Merged
merged 6 commits into from
Apr 9, 2024

Conversation

ie-pham
Copy link
Contributor

@ie-pham ie-pham commented Mar 13, 2024

What this PR does: Currently, for all rate_limited errors regardless of the ingestion rate strategy, Tempo returns the local strategy limit. This causes confusion for setups that use the global strategy. This PR correctly returns the limit according to the ingestion rate strategy.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@ie-pham ie-pham marked this pull request as ready for review March 13, 2024 20:59
@ie-pham ie-pham requested a review from zalegrala March 14, 2024 15:36
@ie-pham
Copy link
Contributor Author

ie-pham commented Mar 14, 2024

When running main on my local, I noticed that overrides.IngestionRateStrategy() would sometimes return defaults overrides but would also return specific tenant overrides (which doesn't have ingestion rate strategy)

When we fetch the overrides here I believe because the overrides hadn’t start up yet this function would return the defaults override which does have the ingestion rate strategy.

But then we call the same function here the overrides is up and running, using “” here actually returns a tenant specific overrides which does not contain the ingestion rate strategy.

@kvrhdn could you please double check my logic

@ie-pham ie-pham requested a review from yvrhdn March 14, 2024 18:22
@mdisibio
Copy link
Contributor

@ie-pham What do you think about including both the local and global rate limits in the error message? My concern is the cases where a tenant exceeds the local limit, but overall traffic is still below the global limit. Catching and explaining these can be tricky. The current error message with the local limit makes it more apparent, and also captures the number of distributor replicas at the time (indirectly).

@ie-pham
Copy link
Contributor Author

ie-pham commented Apr 8, 2024

It seems the IngestionRateStrategy has always been a defaults overrides. It isn't part of the distributor config at all. Not sure if we want to spend time adding it or leave it as is.

@yvrhdn
Copy link
Member

yvrhdn commented Apr 9, 2024

It seems the IngestionRateStrategy has always been a defaults overrides. It isn't part of the distributor config at all. Not sure if we want to spend time adding it or leave it as is.

I would be in favour of adding this setting to distributor config and mark the override as deprecated. It's currently really not working as you would expect from other runtime overrides.

Issues:

  • only the value in default overrides is read, while this is valid it won't do what you expect:
# overrides configmap
overrides:
  "*":
    ingestion:
      rate_strategy: global
  "my-tenant":
    ingestion:
      rate_strategy: global
    # ...
  • normally values in the runtime overrides can change at any time and Tempo will adjust accordingly, but we create the limiter during startup and then don't replace it anymore.

Deprecating the override imo would mean:

  • introduce a setting to distributor config block
  • read both values during startup but let overrides take priority
  • update docs and mark the override as deprecated
  • (Minor) we could log a warning during startup if the override is set (Config.CheckConfig)

And then at some point in the future we just remove the override completely.

Copy link
Member

@yvrhdn yvrhdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this will be a nice usability improvement!

Regarding the config change, if we want to do it might be better to do it in a separate PR to not block this one.

@ie-pham ie-pham merged commit 93d88ff into grafana:main Apr 9, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants