Description
This is related to a bug I sort of noticed towards the end of sem but didnt really explore further when I was working on WATcher/CATcher (i believe it existed on both WATcher and CATcher).
Recently, while reviewing #1312, I was reminded of this bug and thus the existence of this issue.
This bug is actually related to the issue cache and toFetchIssues
but to explain this bug, let's take a look at the functionalities of the of fetchIssuesGraphqlByTeam
and fetchIssuesGraphqlByTeam
from github.service.ts
fetchIssuesGraphqlByTeam(tutorial: string, team: string, issuesFilter: RestGithubIssueFilter): Observable<Array<GithubIssue>> {
const graphqlFilter = issuesFilter.convertToGraphqlFilter();
return this.toFetchIssues(issuesFilter).pipe(
filter((toFetch) => toFetch),
mergeMap(() =>
this.fetchGraphqlList<FetchIssuesByTeamQuery, GithubGraphqlIssue>( //...
fetchIssuesGraphql(issuesFilter: RestGithubIssueFilter): Observable<Array<GithubIssue>> {
const graphqlFilter = issuesFilter.convertToGraphqlFilter();
return this.toFetchIssues(issuesFilter).pipe(
filter((toFetch) => toFetch),
mergeMap(() =>
this.fetchGraphqlList<FetchIssuesQuery, GithubGraphqlIssue>( //...
What both function does is actually just converting the filter and then chaining some actions on the observables obtained from toFetchIssues
.
- get an Github REST API filter
- convert it to a Github GraphQL filter
- fetch all issues with the filter and update local cache using Github REST API
- remove falsy values from the results
- Refetch all the whole thing again using Github GraphQL API (erm what?)
Thus, each time these functionalities are called, 2 sets of identical Github issues are being downloaded from their servers.
We can confirm the presense of the bug by breakpointing on issues-cache-manager and issue-last-modified as seen here:

So what happened and why? (This is my hypothesis and not representative of what actually happened)
Essentially, toFetchIssues
contains code that caches Github issues local storage using Github API then return an observable list of github issues to be used by other parts of the app.
At some point of development, it was decided to migrate over to using Github Graphql API instead.
However, the team likely faced issues related to identifying whether or not there were changes in the issues in the repo using GraphQL (this is related to 304 not modified
and GraphQL API). This wasn't an issue with REST API as server will respond code 304 not modified
if there are no changes to that page in the api response.
As a result, toFetchIssues
today is used as a way to track if there are any modifications / changes to issues because it manages a issue cache with the help of http response code 304 not modified response which significantly reduces data transfers.
Thus, today this function is just checking if there are modifications of issues on the repo and if there is, do a Graphql call to fetch them.
While 304 responses reduce API usage, there is still 2 copies of github issues (one from graphql api and another from github api) stored within the webapp at all times. To add on, each time there are changes, 2 identical copies of github issues will be downloaded from the servers. Which means this is actually pretty inefficient.
Understanding the cache manager
Github REST API responds in pages, (ie page x contains issues aaa-bbb ...)
Each page is a http call, and there is an e-tag related to that page. Think of that tag as some form of id.
When there are no change to the data, that content of that page stays the same and the server will just resent the old page when requesting for that page of data (thus no additional api costs). This leads to the e-tag being identical to before (indicating no change of data on that page).
This means that tracking the etag changes enables the dev to determine if there are modifications to the data.
The cache manager here
As each node in Graphql is dynamic, its impractical to use 304 not modified on graphql calls. This is also why determining data modification is a bit harder on GraphQL but not impossible.
Potential suggestions / solutions
Immediate ideas I can think of right now is basically get the cache manager to rely on GitHub’s updatedAt field rather than HTTP response codes.
TLDR. Requests for last modified date when searching Issues, PRs from Github using GraphQL.
when receiving the Github results, store the them together with last-modified (or perhaps updatedAt? (not too sure which field is the one to use here))
Last modified will be used to track changes.
In Github GraphQL, its possible to only return results that have last modifed date above certain time.
Thus, we can just include that in our query and update the related item in the cache from the result.
Thus for polling, it's just a Graphql requests with no response most of the time which costs 1 point of api usage out of the hourly 5000 points based on Github Graphql rates. Which is fine based on our current polling rate.
tldr;
- Stop using REST API for issue modification tracking.
- Modify GraphQL queries to fetch only issues modified after updatedAt stored in cache.
- Ensure the cache updates correctly with only the changed issues.
I didn't do much testing but I think the above is the jist of the issue explained. (do correct me if there is misinformation above)
What do you guys think @CATcher-org/2425s2