-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BaseGraphqlService, support [github] V4 API #3763
Conversation
|
schema, | ||
url, | ||
query, | ||
variables = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made query
and variables
top-level params here. I'm in 2 minds about whether it makes more sense to collect them into an object. Also note here I deliberately haven't implemented mutations. In general, calling a shields badge should read data from upstream service, not write data to it, so we shouldn't need mutations.
*/ | ||
attrs.query = mergeQueries( | ||
attrs.query, | ||
'query { rateLimit { limit cost remaining resetAt } }' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly asking the GH API to just tell us what our remaining usage is in the body seems like an easier approach than attempting to parse the query and calculate the projected cost of each call and less likely to result in a discrepency: https://developer.github.com/v4/guides/resource-limitations/#calculating-nodes-in-a-call
That said, we probably do need to watch out for query count on contributions. With the V3 API, its pretty hard for someone to contribute a badge that smashes our usage limit because 1 http request = 1 point, but with a graphql a single http request can cost you a whole truckload of points. Not sure what the solution is for that other than to watch out for it in review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about establishing a max number of queries, which could be overridden per service? If the response exceeds the configured number it could error out rather than rendering. That way we'll see it right in the code, and can even search for it.
On the other hand, we have a rather huge amount of GitHub quota – I want to say something like 2.2k tokens – so maybe we don't need to worry about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking of that as a runtime check or a static check? Either way, I'm not against adding it, but I think it can be raised as an issue to address in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't know the answer until the response comes back, it'd have to be a runtime check. Agreed, it can be addressed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, ok. I misunderstood and thought you were still talking about pre-parsing the query before we send it.
Throwing the exception after we've already made the query is obviously simpler, but also of more limited value (esp if it makes it through to production) in that we've already made the request we don't want to make before deciding we didn't want to make it, if you see what I mean :)
Maybe its a useful MVP to start with, in that if we start getting exceptions in sentry saying "lol - just rinsed our token and now we're not even going to render a badge with all the expensive data we just requested" we can at least be reactive. Bear in mind it might lull us into a false sense of security though as we'd be unlikely to throw it in dev because we treat global tokens differently from pooled ones. My instinct is most contributors probably use a global token as opposed to a pool with one token in it (I know that's what I usually do - I think working on this PR is the first time I've set up a pool locally).
Unfortunately I think we do still actually need to work out the query cost up-front to implement it in a useful way.
function graphqlErrorHandler(errors) { | ||
if (errors[0].type === 'NOT_FOUND') { | ||
throw new NotFound({ prettyMessage: 'repo not found' }) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In time, it might be useful to add other cases in here, but I don't know what they are yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Chris, this is looking really good. Not an insignificant piece of work! I left a few comments though there's nothing really major here.
As for porting the service or not, one strategy might be to reserve the GraphQL endpoints for the things that really need it, and the other might be to port things over to GraphQL as much as possible.
Is there a performance difference?
} | ||
|
||
getRateLimitFromBody(body) { | ||
const parsedBody = JSON.parse(body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it would be nice to avoid parsing the body twice for all these responses. Any idea if there's a good way to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a bit annoying.. I think we could, but I think its also a fairly major change. At the moment we're essentially doing all this token work inside the call to _request()
, which means we can deal with manipulating the request and processing the response in the same place. We can also deal with standardTokens
, searchTokens
and graphqlTokens
in the same GithubApiProvider
class. In order to avoid double-parsing, we'd need to restructure so we can still manipulate the request, but then update the token pool after we call _parseJson()
in either _requestJson()
or _requestGraphql()
.
I've not really completely thought through the consequences of all this, but here's a couple of thoughts:
One way to deal with that would be to split this up so we deal with standardTokens
/ searchTokens
in by overriding the _requestJson()
method in GithubAuthV3Service
.. and graphqlTokens
by overriding the _requestGraphql()
method of GithubAuthV4Service
. Although at that point if GithubAuthV4Service
overrides _requestGraphql()
, its not really a subclass of BaseGraphqlService
anymore.
Maybe another way would be to allow the _requestJson()
/_requestGraphql()
/_requestFomat()
methods in base classes to accept an optional callback param or something that cna be used to run a post-process after parsing but before we return the response.
Any further thoughts or ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having GithubAuthV4Service rely on BaseGraphqlService to do the json parsing, but pass in an empty schema so the validation is a no-op. Then the token-related stuff can happen, then GithubAuthV4Service can invoke _validate?
Partly related, I dislike deep inheritance because the indirection makes it hard for people unfamiliar with the code to dig through. Each layer is one more thing to keep in your head at once.
So I think it would be good to make BaseGraphqlService extend directly from BaseService. All that would take is exporting parseJson
as its own function (or moving it to its own file).
There are a couple places in GitHub where we invoke _parseJson
from within a service, and those will be made cleaner by importing it directly.
In Gerritt there's also a place where we override it, though it'd be less magical and easier to understand if we used BaseService directly (invoking _request, doing the trim, then invoking parse and validate).
The refactor is a bit involved so maybe it's best to leave it for a follow-on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first step on this, I've moved parseJson()
out to a helper and switched BaseGraphqlService
to inherit from BaseService
. That seems sensible anyway, but I've not taken it further.
Not sure I agree that overriding a method from the base class is "magical" in the context of gerrit though. Its what pretty much every other function we declare in a badge service class is doing - seems like a pretty consistent pattern 🤷♂️
*/ | ||
attrs.query = mergeQueries( | ||
attrs.query, | ||
'query { rateLimit { limit cost remaining resetAt } }' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about establishing a max number of queries, which could be overridden per service? If the response exceeds the configured number it could error out rather than rendering. That way we'll see it right in the code, and can even search for it.
On the other hand, we have a rather huge amount of GitHub quota – I want to say something like 2.2k tokens – so maybe we don't need to worry about this.
I thought about this a bit already. As I see it, there are a few tradeoffs to think about here (some performance-related and some not). Predictably there is not a single clear-cut answer in all cases:
|
Everything you're saying makes sense. Assuming the GraphQL API is otherwise fast, since network latency is the key issue I'd guess the smaller responses probably will have a bigger impact than anything else. Could you paste what you wrote into a comment in github-auth-service.js? Or else distill it down to what you think contributors need to know? |
Cheers. I'll raise issues for the additional bits of work |
This bit of work sprouted off from #3610 but I've decided to do it in its own PR because this will require some review to get right and trying to load all this into a PR that is already a ~500 line diff is going to make that annoying. Lets get this bit right then I'll go back, rebase #3610 onto this work and pick that up.
Main things going on here are:
BaseGraphqlService
- we'll probably need this in future anywayGithubAuthV4Service
service for badges which query the Github V4 API - most of the work here was to accommodate one change in the V4 API: rate limit usage info is now returned in the response body on request. As well as supporting move [githubtag] and [githubrelease] under /v, move variants to query params #3610 this will allow us to address other open feature requests requiring the V4 API. e.g: Badge request: GitHub dependency graph #1550, Badge request: GitHub Package Registry, NPM variants #3608