Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API rate limit exceeded for user ID #613

Closed
supergeoff opened this issue Nov 23, 2022 · 10 comments · Fixed by #626
Closed

API rate limit exceeded for user ID #613

supergeoff opened this issue Nov 23, 2022 · 10 comments · Fixed by #626
Labels
Type/Bug Something isn't working

Comments

@supergeoff
Copy link

Hello,
My mattermost logs are full of API rate limit exceeded warn message.
Only for my user (I'm mattermost admin)
{"timestamp":"2022-11-22 18:05:34.007 Z","level":"warn","msg":"Failed to search for review","caller":"app/plugin_api.go:977","plugin_id":"github","userid":"XXX","github username":"username","error":"GET https://api.github.com/search/issues?q=is%3Apr+is%3Aopen+review-requested%3Ausername+archived%3Afalse+org%3Aorganization: 403 API rate limit exceeded for user ID XXX. [rate reset in 24s]","query":"is:pr is:open review-requested:username archived:false org:organization"}
Any idea on wether this is a common limitation or if it is bad setup on my end ?

I followed documentation for setup.

Regards;

@hanzei hanzei added the Triage label Nov 23, 2022
@hanzei
Copy link
Contributor

hanzei commented Nov 23, 2022

@supergeoff Do you see this warning for multiple users or for the same one all over again?

@supergeoff
Copy link
Author

Same profile/ user id over again

@DHaussermann
Copy link

DHaussermann commented Dec 5, 2022

@supergeoff We are looking into this and trying to determine if this was caused by something on the plugin side.
Can you let me know if you saw the rate limit exceeded issue again after you opened this issue or if you see this occurring in your logs still?

@supergeoff
Copy link
Author

supergeoff commented Dec 5, 2022

error stopped on the 26/11/2022 01:30:55. But since the error appeared and still as of today the view on top of topics doesn't work anymore and display 0 for all counters even if i have PR or unread messages.
Regards;

@mickmister
Copy link
Contributor

mickmister commented Dec 14, 2022

From the last link above:

To avoid hitting this limit, you should ensure your application follows the guidelines below.

  1. Make authenticated requests, or use your application's client ID and secret. Unauthenticated requests are subject to more aggressive secondary rate limiting.
  2. Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.
  3. If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.
  4. When you have been limited, use the Retry-After response header to slow down. The value of the Retry-After header will always be an integer, representing the number of seconds you should wait before making requests again. For example, Retry-After: 30 means you should wait 30 seconds before sending more requests.
  5. Requests that create content which triggers notifications, such as issues, comments and pull requests, may be further limited and will not include a Retry-After header in the response. Please create this content at a reasonable pace to avoid further limiting.

For the scenario of the "spammy" feature described below, the two points that relate to this plugin are 2 and 4.

At the moment, the webapp portion of the plugin sends 4 separate HTTP requests to fetch the counts on left hand side of the application. If the user is using the Mattermost desktop app, this is the same as having 3 browser tabs open, so there are actually 12 requests happening semi-concurrently. These requests are sent during page load, websocket reconnect, and at a 5 minute timer while connected.


In order to satisfy point 2 in the numbered items above, we can add some logic to avoid concurrent requests:

  • Have the browser fetch the sidebar numbers with 1 request, rather than 4 requests
  • Then have the plugin's server part fetch the information for each piece of data sequentially
  • Add a certain amount of jitter, something like 0-10 seconds, to delay the client's request when the counts refresh due to websocket reconnect. This helps with the case of multiple browser tabs for the same user, racing to fetch the counts at the same time. This delay seems reasonable to me since this is a passive action of refreshing that happens in the background.
  • Cache a given user's counts for a short amount of time, in case they request it again within that same amount of time like 5-10 minutes
    • Use KVSetWithExpiry to save the cache for a specific amount of time
    • If we receive a webhook event related to a given user, update their cache and send a websocket event to the client with the new counts
    • If the user clicks the "refresh" button near the counts, recalculate the counts and update the cache

For point 4:

Given the improvements detailed above, I'm not sure if we need to do anything else, but we can certainly try to adhere to the Retry-After header.

These rate limits are specific to a given user. If we receive a Retry-After response, this is related to a given user's token being used, so if we want to avoid subsequent requests to the plugin from triggering bad GitHub API behavior, we would need to store this value of Retry-After in the KV store for consumption by other goroutines or HA nodes.

I'm thinking this is overkill, as I don't think we will be hitting their API in quick succession if the proposals for point 2 are implemented.

@Kshitij-Katiyar
Copy link
Contributor

Kshitij-Katiyar commented Dec 16, 2022

@mickmister Doubts regarding point 2 of the above post:-

Do I need to implement points 1,3,4 or like these are the possible solutions and do we need to implement any one of them?

  1. For point 1, so the 4 Apis of GitHub we are calling in a single plugin API do we need to call them serially like we will call the next after getting the response of the previous one, or do we need to call them concurrently. I think calling them serially would be a better idea as GitHub has mentioned that it's always better to call the APIs serially.

  2. For point 3, the timeout we are talking about in the above post should we implement them on the client side or the server side? If we implement them on the client side then the problem of calling the plugin APIs from multiple clients will still persist. For example, If a user has opened 3 different tabs and after implementing the first point(Single PluginAPI to call 4 APIs) those 3 tabs will call the respective 3 plugin APIs which will further call 4 GitHub APIs respectively at the same time even after we added the timeout. Can you please elaborate on that?

  3. For point 4, in sub-point 2 updating their cache here means that we are calling the actual 4 GitHub APIs for the webhook event we got from the GitHub, getting the count of all 4 variables and updating the cache, or do we have to check what kind of event it is and using that we will update the count of the user count we have stored in the cache eg if we get an open PR event then we will update the open PR count by 1 and same thing for sub point 3 recalculate the counts.

    i) From my understanding we are calling these 4 APIs mainly using 3 methods:-
    a) From the component sidebar buttons (component rendering or by clicking the refresh button).
    b) If we get a WebSocket event from the backend from the webhook event we got from GitHub related to these values.
    c) A component named handleReconnect is registered for registerReconnectHandler which gets called if the web app is connected to the internet again and the same component is also called if the user has not done any activity for 1 hour.

So according to your point above for the WebSocket event part, we are updating the cache and sending these values directly in the WebSocket event and if we click on the refresh button we are refreshing the count and updating the cache again so for which case we are actually going to use the cache?. Because if we are sending the counts in the WebSocket event itself then the problem of multiple clients will be fixed as now there is no need for different clients to call these APIs and we are also updating the cache in 2 out of 3 cases. Can you please elaborate on that? Should I send the whole data in the WebSocket event like from all 4 APIs or just the count? I am not able to understand the need for a cache here. Please explain a bit about that.

Thanks for the answer above it was very helpful in understanding the issue and please feel free to correct me and ask any questions.

@mickmister
Copy link
Contributor

For point 1, so the 4 Apis of GitHub we are calling in a single plugin API do we need to call them serially like we will call the next after getting the response of the previous one, or do we need to call them concurrently. I think calling them serially would be a better idea as GitHub has mentioned that it's always better to call the APIs serially.

@Kshitij-Katiyar I was thinking we would change to frontend to fetch all of the 4 counts in one request, and the backend would serially fetch the data for each category in that request

those 3 tabs will call the respective 3 plugin APIs which will further call 4 GitHub APIs respectively at the same time even after we added the timeout.

This is hopefully avoided by the "jitter" aspect of what I'm suggesting. At runtime, we choose a random number between 0-10 and use that as the delay for that specific tab.

do we have to check what kind of event it is and using that we will update the count of the user count we have stored in the cache eg if we get an open PR event then we will update the open PR count by 1 and same thing for sub point 3 recalculate the counts.

Any event pertaining to the user could trigger an update to the notifications count, so we would want to keep that as updated as possible. I'm not sure if there's a performance concern with refreshing all counts for a given webhook event. I'd like to get @hanzei's opinion on this. This is definitely increasing the scope of what we're trying to fix here.

A component named handleReconnect is registered for registerReconnectHandler which gets called if the web app is connected to the internet again and the same component is also called if the user has not done any activity for 1 hour.

I thought this was every 5 minutes, though I can't seem to find the code that has this constant at the moment. Do you have a link to the code that waits on a timer to call the reconnect handler?

So according to your point above for the WebSocket event part, we are updating the cache and sending these values directly in the WebSocket event and if we click on the refresh button we are refreshing the count and updating the cache again so for which case we are actually going to use the cache?.

The cache is meant to defend against users requesting with multiple tabs in quick succession. Now that you point this out, I'm not sure if a cache really helpful here. It depends on how often the reconnect handler is called, which you said is once an hour, which is negligible.

Should I send the whole data in the WebSocket event like from all 4 APIs or just the count?

Might as well send all the data since it's already available.

@Kshitij-Katiyar
Copy link
Contributor

@mickmister Link for the code containing the constant for registerReconnectHandler:-

const activityTimeout = 60 * 60 * 1000; // 1 hour

@Kshitij-Katiyar
Copy link
Contributor

Kshitij-Katiyar commented Dec 19, 2022

@mickmister This is the whole approach I have thought of to tackle this issue. Please let me know your thoughts about the following:-

  1. From the frontend we will call a single API fetching the data of all four notifications count serially.
  2. Whenever we get an event from GitHub regarding changes we will our existing API to get the updated count and data for all the notification counts and dispatch a WebSocket event from the server with the updated count and data to the client (Frontend) where we will show the updated count from the data we got from catching the WebSocket event. In this case, there won't be any API call from the client as we already got the updated data and this also will fix the issue of multiple API calls from multiple clients.
  3. So the actual API call to get the updated count from the frontend will be done in mainly 2 cases:-
    i) When the user clicks on the refresh count button from the webapp.(We should definitely call the API if user intentionally clicking the refresh button)
    ii) When the user's webapp gets connected to the internet again or the user is inactive for more than 1 hour.

If we are sending the updated count from the server inside the WebSocket event only then I think there is no need to create a jitter as mentioned in point number 2 as the API call is only going to be done if the user's internet gets connected to the internet again or the user is inactive for more than 1 hour which I think is a very edge case.

Please let me know your thoughts on this.

@mickmister
Copy link
Contributor

if the user's internet gets connected to the internet again

@Kshitij-Katiyar Note that a websocket reconnect can happen for any number of reasons, including something like a hiccup in middleware between the frontend and MM server. I'm still thinking we should add the jitter for this case of multiple tabs reconnecting.

Whenever we get an event from GitHub regarding changes we will our existing API to get the updated count and data for all the notification counts and dispatch a WebSocket event from the server with the updated count and data to the client (Frontend) where we will show the updated count from the data we got from catching the WebSocket event.

I made this suggestion with the assumption that we would be using cached values for the counts. I'm not sure how much value the webhook-driven update brings by itself, in regards to rate limiting. We can add this functionality, but if we're not adding the server-side cache, then I think the "auto-update based on incoming webhook event" is unrelated to the rate limiting issue and can be done in a separate PR.

Also, about the "5 minute" thing I referenced: the webapp used to unconditionally call these reconnect handlers every 15 minutes but it has since been removed mattermost/mattermost-webapp#10386

Kshitij-Katiyar added a commit to Brightscout/mattermost-plugin-github that referenced this issue Dec 30, 2022
* [MI-2481]:Fixed issue mattermost#681 on github

* [MI-2481]:Fixed review fixes

* [MI-2481]:Fixed review fixes

* [MI-2481]:Fixed review fixes
@hanzei hanzei linked a pull request Jan 2, 2023 that will close this issue
Kshitij-Katiyar added a commit to Brightscout/mattermost-plugin-github that referenced this issue Jan 9, 2023
Kshitij-Katiyar added a commit to Brightscout/mattermost-plugin-github that referenced this issue Jan 9, 2023
@hanzei hanzei moved this from Todo to Submitted in Brightscout Plugin Maintenance Feb 3, 2023
@hanzei hanzei added the Type/Bug Something isn't working label Feb 3, 2023
Kshitij-Katiyar added a commit to Brightscout/mattermost-plugin-github that referenced this issue May 8, 2023
mickmister pushed a commit that referenced this issue Jul 19, 2023
* [MI-2481]:Fixed issue #613 on github (#12)

* [MI-2481]:Fixed issue #681 on github

* [MI-2481]:Fixed review fixes

* [MI-2481]:Fixed review fixes

* [MI-2481]:Fixed review fixes

* [MI-2547]:Fixed review fixes for Github issue 613 and PR 626 (#14)

* [MI-2547]:Fixed review fixes for Github issue 613 and PR 626

* [MI-2547]:Fixed review fixes

* [MI-2547]:Fixed review fixes

* [MI-2582]:Fixed review comments for github issue #613 (#15)

* [MM-613]:Fixed review comments

* [MI-3072]:Converted the LHS APIs into GraphQL (#32)

* [MI-3012]: Converted user PRs API into GraphQL

* [MI-3012]: Fixed review comments

* [MI-3012]:fixed log message

* [MI-3035]: Converted the get assignment API into GraphQL

* [MI-3035]:Fixed self review comments

* [MI-3035]: Fixed review comments

* [MI-3072]:Converted review requested API to graphQL

* [MI-3072]:Combined all the graphQL queries

* [MI-3072]:Fixed CI errors

* [MI-3072]:Fixed review comments

* [MI-3072]:Fixed CI errors

* [MI-3072]:Fixed review comments

* [MI-3072]:Fixed review comments

* [MM-613]:Changed namings

* [MM-613]:Fixed review comments

* [MM-613]:Fixed review comments

* [MM-613]:Fixed panic error

* [MM-613]:Fixed review comments

* [MM-613]:Fixed lint errors

* [MM-613]:Fixed review comment

* [MM-613]:Fixed review comments

* [MM-613]:Fixed review comments
@github-project-automation github-project-automation bot moved this from Submitted to Done in Brightscout Plugin Maintenance Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type/Bug Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

6 participants