-
Notifications
You must be signed in to change notification settings - Fork 48
Description
In their docs, github mention that long running requests (>10 sec) are interrupted and return nothing. We hit such a case and it seems that Github returns a 502 Bad gateway HTTP status code.
In our case, the issue comments stream gave such errors.
To see the error, run curl -v -o /dev/null "https://api.github.com/repos/bitcoin/bitcoin/issues/comments?per_page=100&sort=updated&direction=asc&since=2019-11-25T15:20:23" (this discards the output which is irrelevant, but shows the headers and the error 502). Curl helpfully shows a timer which stops at 10 seconds, confirming this is a server-side timeout.
When the tap is running with multiple repos listed in its config, it chokes on such errors and returns prematurely, and never finishes the list of tasks it's supposed to do. In my testing, retrying the same query led to the same result consistently. Lowering the per_page param seemed to get the data (in the case above, I had to go down to 10).
The tap should be able to get around such errors in a cleaner way:
- ideally, it would retry the same endpoint but with a smaller
per_pagevalue for a while (say until it completed the current stream/repo) - the current
MAX_PER_PAGEis set at 1000, and should be 100 max, according to docs - as a temporary workaround, the tap could simply skip the current repo and move on to the next one. This means some gaps in the data however 🕳️