- 
                Notifications
    You must be signed in to change notification settings 
- Fork 7
Caching the Doc get response (for homepage) #1054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…he when updated via API.
| Another simple option is expiry: The cache can live for at most a given period (T), e.g. one day, before it’s cleared/rebuilt. That way, it’s never >T stale…  On Mar 15, 2022, at 10:49, Jeffrey ***@***.***> wrote:
 
 I tried two versions of caching the /api/document GET to speed up the homepage carousel:
 (not shown) Cache the latest docs IDs and do a db get on them for subsequent requests: This one wasn't that much faster (maybe 10%; 2.6s vs 2.3s) than the existing version - keep in mind we're already indexing by created date.
 This PR: Cache the raw JSON. Of course, this one is fast (2.8s vs 0.15s), but of course misses data updates.
 Resetting the cache is tricky because it won't catch direct actions on the doc model status via the admin.
 Other options: Reduce the number of things in the carousel / introduce a search/browse; server-side render; leave it.
 Refs #909.
 You can view, comment on, or merge this pull request online at:
   #1054
 Commit Summary
 7e35c32 Cache the entire response when no param values are present. Clear cache when updated via API.
 File Changes  (2 files)
 M src/client/components/carousel.js (10)
 M src/server/routes/api/document/index.js (57)
 Patch Links:
 https://github.com/PathwayCommons/factoid/pull/1054.patch
 https://github.com/PathwayCommons/factoid/pull/1054.diff
 —
 Reply to this email directly, view it on GitHub, or unsubscribe.
 You are receiving this because you are subscribed to this thread. | 
| 
 makes sense. I guess the important case is that an author who submits sees their paper on the homepage. Messing around with the status of older docs isn't such a big deal. | 
|  | ||
| const docCache = new LRUCache({ | ||
| max: DOCUMENT_IMAGE_CACHE_SIZE | ||
| }); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we caching more than one key/result? Why use a LRU cache in particular? If we don't need a LRU cache in particular, you could use a general cache that has a TTL expiry already built in, e.g. https://www.npmjs.com/package/node-cache
Are you thinking about making this more granular to a per-ID cache? If not, could this just be a variable that can be null/non-null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right I didn't need this, only the TTL. Lemme look at that other package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way the lru-cache is way outta date (4.4.1 but latest is 7.*) should we update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we have a strong reason to update to a new major version, it's usually not worth the time it takes to verify that the update hasn't broken things. Upgrading from v4 to v7 means the API probably changed quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify the first comment a bit: If we want a TTL etc., then a lib like node-cache is good. If we don't want a TTL, then we can just use a simple variable that can be nulled out.
| Would this also help to speed up the search feature? That's currently just caching all the docs in the browser's memory with that search lib, right? | 
| 
 Yes, this was the motivation for taking on this issue. The only technical issue is setting the number of cached docs, i.e. the default  | 
| I'm assuming the home page has something like n=20, whereas the search has n=infinity? How big would the infinity case be in KB? | 
| 
 (This PR) GET  
 
 | 
| What do you think about the search including all docs by default instead of 100? If I were to do a search and an expected, existing doc doesn't come up because of the limit, then I wouldn't enjoy using the search. In that case, we could use the cache keys so that the carousel isn't negatively affected by the search, and both would be cached. 
 Notes: 
 | 
| 
 Worth a try! 
 I noticed the entire  | 
| Removing the  
 | 
| The infinity, all-docs case is probably less than 1MB when gzipped. That's about the size of an image, so it would be reasonable to load them all for the search -- for now anyway | 
- Cache instances of only limit set to 'Infinity'
| This is on https://test.biofactoid.org | 
| Let's just bump up the max size of the image cache so all the images should be cached in memory. That should resolve the random image reloading in the search for now | 
| Possibly improvements: The home component makes two closely staggered web service calls to retrieve documents: First to retrieve all public docs for the search (longer running) then the carousel to retrieve a recent slice. It seems that when the cache is empty, the latter is bottle-necked by the former. This has the effect of making the carousel load slow at the cost of the search. Will move to separate issue. | 
| See also: https://rethinkdb.com/docs/secondary-indexes/javascript/ 
 If we don't use the index, it will be slow | 
| Last commit managed to squeeze a bit more efficiency out of the db: 
 | 
| //TODO - Look at what's going on with the document pathway PNG caching (limit, speed).... | 
| 
 This helped a lot - the likely reason is that with the limited cache size, the search would effectively punt images that would otherwise show up on the homepage. So now everything is preloaded. I think this version works decent. | 
| 
 Nice. There may have been some cache thrashing going on | 
I tried two versions of caching the
/api/documentGET to speed up the homepage carousel:(not shown) Cache the latest docs IDs and do a db get on them for subsequent requests: This one wasn't that much faster (maybe 10%; 2.6s vs 2.3s) than the existing version - keep in mind we're already indexing by created date.
This PR: Cache the raw JSON. Of course, this one is fast (2.8s vs 0.15s), but of course misses data updates.
Resetting the cache is tricky because it won't catch direct actions on the doc model
statusvia the admin.Other options: Reduce the number of things in the carousel / introduce a search/browse; server-side render; leave it.
Refs #909.