Skip to content

Allow the tap to continue on server side timeout (error 502) #52

@laurentS

Description

@laurentS

In their docs, github mention that long running requests (>10 sec) are interrupted and return nothing. We hit such a case and it seems that Github returns a 502 Bad gateway HTTP status code.

In our case, the issue comments stream gave such errors.

To see the error, run curl -v -o /dev/null "https://api.github.com/repos/bitcoin/bitcoin/issues/comments?per_page=100&sort=updated&direction=asc&since=2019-11-25T15:20:23" (this discards the output which is irrelevant, but shows the headers and the error 502). Curl helpfully shows a timer which stops at 10 seconds, confirming this is a server-side timeout.

When the tap is running with multiple repos listed in its config, it chokes on such errors and returns prematurely, and never finishes the list of tasks it's supposed to do. In my testing, retrying the same query led to the same result consistently. Lowering the per_page param seemed to get the data (in the case above, I had to go down to 10).

The tap should be able to get around such errors in a cleaner way:

  • ideally, it would retry the same endpoint but with a smaller per_page value for a while (say until it completed the current stream/repo)
  • the current MAX_PER_PAGE is set at 1000, and should be 100 max, according to docs
  • as a temporary workaround, the tap could simply skip the current repo and move on to the next one. This means some gaps in the data however 🕳️

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions