Skip to content

fix(files_reminders): reduce N+1 query issue for big folders#58288

Open
salmart-dev wants to merge 1 commit intomasterfrom
fix/files-reminders-increase-cache-size
Open

fix(files_reminders): reduce N+1 query issue for big folders#58288
salmart-dev wants to merge 1 commit intomasterfrom
fix/files-reminders-increase-cache-size

Conversation

@salmart-dev
Copy link
Contributor

@salmart-dev salmart-dev commented Feb 12, 2026

Summary

Increases the CappedMemoryCache size for ReminderService to 30k.

Context

OCA\FilesReminders\Dav\PropFindPlugin preloads reminders information for an entire collection during PROPFIND. ReminderService has a cacheFolder method which uses a CappedMemoryCache as cache. The problem is that the default size for this cache is 512 meaning that if a directory has more than 512 files inside we start getting cache misses and start querying the DB.

What's worse is that once the condition hits, it always voids the pre-cached data and causes N+1 queries:

  1. preloadCollection starts caching reminders for files in a directory
  2. once the limit of the CappedMemoryCache is reached, the information that was cached earlier is discarded
  3. when the PROPFIND is serialized into XML, the first file is no longer in the cache
  4. The code queries the DB and caches the value, pushing another previously cached value out
  5. This continues making the pre-caching useless 🤯

Checklist

Signed-off-by: Salvatore Martire <4652631+salmart-dev@users.noreply.github.com>
@salmart-dev salmart-dev self-assigned this Feb 12, 2026
@salmart-dev salmart-dev added this to the Nextcloud 33.0.1 milestone Feb 12, 2026
@salmart-dev
Copy link
Contributor Author

/backport to stable33 please

@salmart-dev
Copy link
Contributor Author

/backport to stable32 please

@salmart-dev
Copy link
Contributor Author

/backport to stable31 please

@salmart-dev salmart-dev marked this pull request as ready for review February 12, 2026 16:54
@salmart-dev salmart-dev requested a review from a team as a code owner February 12, 2026 16:54
@salmart-dev salmart-dev requested review from Altahrim, come-nc, icewind1991 and leftybournes and removed request for a team February 12, 2026 16:54
@Altahrim
Copy link
Collaborator

I am wondering if it makes sense for CappedMemory cache to discard old entries…
It's more unlikely but if a directory have more than 30k entries we'll hit the same issue.

I am also a bit afraid that 30k entries in cache use a lot of memory

@salmart-dev
Copy link
Contributor Author

I am wondering if it makes sense for CappedMemory cache to discard old entries… It's more unlikely but if a directory have more than 30k entries we'll hit the same issue.

I am also a bit afraid that 30k entries in cache use a lot of memory

That is what the current code is already doing. We insert by key, and while inserting we call reset on the array which goes to the first element inserted (so the oldest) and then unset it.

@Altahrim
Copy link
Collaborator

Sorry, it wasn't clear ^^

Maybe CappedMemory cache should just stop discarding entries. When it's full, it serves current entries and stop adding new ones.

@salmart-dev
Copy link
Contributor Author

Sorry, it wasn't clear ^^

Maybe CappedMemory cache should just stop discarding entries. When it's full, it serves current entries and stop adding new ones.

But that still means then executing N-512 queries to load data for directories with N files, no?

I'm thinking of two things:

  1. cache the information that a directory has been preloaded and stop caching keys for files with no reminders. This would cut down the amount of items in the cache by a bit, although it still can be exhausted if a folder has many files with reminders.
  2. detect that the capped memory cache is full in the plugin and pause the caching, keep the cursor open and then the code of the plugin could detect when it's time to read another batch of results. This requires the plugin being the one managing the cache though, which is not the current case as the reminder service is the one caching right now.

Copy link
Contributor

@come-nc come-nc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it’s the issue with the preloading strategy it means we need to keep the complete list of entries in memory, which may not always fit, or will eat up a lot of memory.

I would feel better if we could change the logic to instead use generators and stream each node through the plugins, one at a time. But I’m not sure that’s possible or easy with sabre/dav architecture, and also it’s not good either, because it would mean doing a DB request for each node which we do not want. So I’m not sure what’s the perfect solution.
We’d need that but with each plugin opening and keeping its DB request to fetch data one by one or something. (or batch by batch more realisticly)

@salmart-dev
Copy link
Contributor Author

Yeah it’s the issue with the preloading strategy it means we need to keep the complete list of entries in memory, which may not always fit, or will eat up a lot of memory.

I would feel better if we could change the logic to instead use generators and stream each node through the plugins, one at a time. But I’m not sure that’s possible or easy with sabre/dav architecture, and also it’s not good either, because it would mean doing a DB request for each node which we do not want. So I’m not sure what’s the perfect solution. We’d need that but with each plugin opening and keeping its DB request to fetch data one by one or something. (or batch by batch more realisticly)

I think doing this in a streaming fashion for all results is not possible as we'd depend on the order of access, which would mean that the DB query and the code must use the same scan order for nodes so that the properties are read in order from the DB. I think going by batches would be better, I'm just not sure how performant having all queries become WHERE IN queries would be though.

I have some sabre/dav branch to allow getting the list of nodes that will need their properties loaded before the propfind event. I could see if I can add batching and use the same mechanism to preload per-batch. If we get that in in sabre, then we can try this on our server. Or... we make a branch and test it on perftesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants