Skip to content

Conversation

@maxkfranz
Copy link
Member

  • Use DOCUMENT_IMAGE_PLL_LIMIT to set the number of images that can be generated at once.
    • It should be as high as the hardware can support to keep things fast.
    • It's set at 1 by default. This always works, but it's not as fast as it could/should be.
  • Also add DOCUMENT_IMAGE_CACHE_SIZE to the readme. This was undocumented.

@jvwong, would you test this out? We'll also need to determine what number to use for DOCUMENT_IMAGE_PLL_LIMIT in the docker config for the production app.

- Use `DOCUMENT_IMAGE_PLL_LIMIT` to set the number of images that can be generated at once.
    - It should be as high as the hardware can support to keep things fast.
    - It's set at 1 by default.  This always works, but it's not as fast as it could/should be.
- Also add `DOCUMENT_IMAGE_CACHE_SIZE` to the readme.  This was undocumented.
@maxkfranz maxkfranz requested a review from jvwong March 14, 2023 15:32
@jvwong
Copy link
Member

jvwong commented Mar 16, 2023

@maxkfranz
I've been playing around with this locally. I think we'll need to chat about how I should test the performance, right now I'm not seeing a difference in looking at

  • Total load time for the Search page (i.e. browsable page after clicking Search)
  • Load time for the Explorer page for an individual document while the Search page is loading

@maxkfranz
Copy link
Member Author

Talk next Wed.?

@jvwong
Copy link
Member

jvwong commented Mar 22, 2023

A couple of instances to test:

Provider URL OS CPU (Cores) RAM (GB) Live
Donnelly *https://test.biofactoid.org/ Ubuntu 16.04.7 LTS Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz (4) 30 YES
Digital Ocean https://beta.biofactoid.org/ Ubuntu 20.04.2 LTS DO-Regular (4) 7 NO
  • Branch test will be automatically run here

@jvwong
Copy link
Member

jvwong commented Mar 22, 2023

I'm finding the getDoc call to be a major offender following an initial load:

const main = async () => {
try {
const doc = await getDoc(id);
const lastEditedDate = '' + doc.lastEditedDate();
const cache = imageCache.get(id);
const canUseCache = imageCache.has(id) && cache.lastEditedDate === lastEditedDate;
if( canUseCache ){
res.send(cache.img);
} else {
const cache = await fillCache(doc, lastEditedDate);
res.send(cache.img);
}

Alternative to handle imgCache staleness is:

  • Fetch the JSON results from /api/document/ (ttl is 24 hours)
  • Find the matching document and it's lastEditedDate
  • Hit the cache or regenerate as usual

So the downside is authors won't see updates to their submitted docs within 24 hours. The p-limit code is still valid on initial load, the rest of the time the entire search browse page loads almost instantly.

@jvwong
Copy link
Member

jvwong commented Mar 23, 2023

There's two major cases I've observed.

(1) Empty caches: Initial server load or reboot
Here, this PR (default config vars) does limit the image system and gives the server a chance to respond to other requests (e.g. loading a page, creating a new doc) -- eventually. This is a resource hungry period, but thankfully only happens once.

(2) Cache filled: Each subsequent homepage/search request
The major bottleneck in image fetching is checking the document last update date, which incurs database hits that gum up the server: Each image in the search, for each user loading it in their browser.

For (1) I say let's merge this in as is. For (2) I've started #1151

# Conflicts:
#	src/server/routes/api/document/index.js
@jvwong jvwong merged commit 57cd4b1 into unstable Apr 3, 2023
@jvwong jvwong deleted the imgspd branch April 3, 2023 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants