Files search is very slow - SQL query very complex esp. when collectives, talk installed #44596
Description
⚠️ This issue respects the following points: ⚠️
- This is a bug, not a question or a configuration/webserver/proxy issue.
- This issue is not already reported on Github OR Nextcloud Community Forum (I've searched it).
- Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
- I agree to follow Nextcloud's Code of Conduct.
Bug description
I have a server with 465348 lines in the oc_filecache
table. Searching for a file using unified search, the request takes about 1.5-2s to return a response. Upon investigation, the underlying SQL query (I am on postgresql) takes around 1-1.5s. I feel like there is room for optimisation here.
Below are my findings (scroll down for original sql queries and process of getting to a conclusion):
Update: You can view a nice overview of the perpared statement ahd which parts take the longest here: http://tatiyants.com/pev/#/plans/plan_1711947023189
It's clear that the searches used for collectives/talk/deck are the heaviest, taking up ~50% of the execution time (2% execution time for each of the ~25 searches in my case, adding up to ~50%). This is confirmed by removing all the corresponding parts of the SQL query. Just using the first part reduces the time by half or 687ms.
explain analyze SELECT "file"."fileid", "storage", "path", "path_hash", "file"."parent", "file"."name", "mimetype", "mimepart", "size", "mtime", "storage_mtime", "encrypted", "etag", "file"."permissions", "checksum", "unencrypted_size", "meta"."json" AS "meta_json", "meta"."sync_token" AS "meta_sync_token" FROM "oc_filecache" "file" LEFT JOIN "oc_files_metadata" "meta" ON "file"."fileid" = "meta"."file_id" WHERE ("file"."name" ILIKE '%so%') AND ((("storage" = '4') AND (("path" = 'files') OR ("path" LIKE 'files/%'))));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=4869.62..60019.75 rows=878 width=738) (actual time=60.518..686.015 rows=11185 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop Left Join (cost=3869.62..58931.95 rows=366 width=738) (actual time=20.113..586.505 rows=3728 loops=3)
-> Parallel Bitmap Heap Scan on oc_filecache file (cost=3869.33..57153.65 rows=366 width=233) (actual time=19.982..562.403 rows=3728 loops=3)
Recheck Cond: (storage = '4'::bigint)
Filter: (((name)::text ~~* '%so%'::text) AND (((path)::text = 'files'::text) OR ((path)::text ~~ 'files/%'::text)))
Rows Removed by Filter: 104122
Heap Blocks: exact=9659
-> Bitmap Index Scan on fs_storage_mimepart (cost=0.00..3869.11 rows=325958 width=0) (actual time=50.535..50.536 rows=345536 loops=1)
Index Cond: (storage = '4'::bigint)
-> Index Scan using files_meta_fileid on oc_files_metadata meta (cost=0.29..4.86 rows=1 width=513) (actual time=0.005..0.005 rows=0 loops=11185)
Index Cond: (file_id = file.fileid)
Planning Time: 1.275 ms
Execution Time: 687.419 ms
Possible solutions:
I've dug a bit deeper, and perhaps (somehow) in lib/private/Files/Node/Folder.php we can filter out the caches that are for apps that have their own provider (or just hard code it to collectives, talk etc). It seems messy, and I don't know how to do it (I don't really understand ICache) but perhaps it's doable. Another way might be to allow getCachesAndMountPointsForSearch
to filter out certain mount points / caches. E.g. $searchHelper->getCachesAndMountPointsForSearch($this->root, $this->path, $exclude->exclude, $limitToHome)
[$caches, $mountByMountPoint] = $searchHelper->getCachesAndMountPointsForSearch($this->root, $this->path, $limitToHome);
// filter out collectives and talk caches - they have their own search providers and only slow down the file search if we add them here
$caches = array_filter($caches, function (ICache $cache) {
// !!!PSEUDO CODE!!!
return !in_array($cache->getStorage()->getId(), ['collectives', 'talk']);
});
Raw sql queries
Using postgresql's slow query log, here is an example of one query, for which I have subsequently run ANALYZE on it:
LOG: duration: 1443.320 ms execute <unnamed>: SELECT "file"."fileid", "storage", "path", "path_hash", "file"."parent", "file"."name", "mimetype", "mimepart", "size", "mtime", "storage_mtime", "encrypted", "etag", "file"."permissions", "checksum", "unencrypted_size", "meta"."json" AS "meta_json", "meta"."sync_token" AS "meta_sync_token" FROM "oc_filecache" "file" LEFT JOIN "oc_files_metadata" "meta" ON "file"."fileid" = "meta"."file_id" WHERE ("file"."name" ILIKE $1) AND ((("storage" = $2) AND (("path" = $3) OR ("path" LIKE $4))) OR (("storage" = $5) AND (("path" = $6) OR ("path" LIKE $7))) OR (("storage" = $8) AND (("path" = $9) OR ("path" LIKE $10))) OR (("storage" = $11) AND (("path" = $12) OR ("path" LIKE $13))) OR (("storage" = $14) AND (("path" = $15) OR ("path" LIKE $16))) OR (("storage" = $17) AND (("path" = $18) OR ("path" LIKE $19))) OR (("storage" = $20) AND (("path" = $21) OR ("path" LIKE $22))) OR (("storage" = $23) AND (("path" = $24) OR ("path" LIKE $25))) OR (("storage" = $26) AND (("path" = $27) OR ("path" LIKE $28))) OR (("storage" = $29) AND (("path" = $30) OR ("path" LIKE $31))) OR (("storage" = $32) AND (("path" = $33) OR ("path" LIKE $34))) OR (("storage" = $35) AND (("path" = $36) OR ("path" LIKE $37))) OR (("storage" = $38) AND (("path" = $39) OR ("path" LIKE $40))) OR (("storage" = $41) AND (("path" = $42) OR ("path" LIKE $43))) OR (("storage" = $44) AND (("path" = $45) OR ("path" LIKE $46))) OR (("storage" = $47) AND (("path" = $48) OR ("path" LIKE $49))) OR (("storage" = $50) AND (("path" = $51) OR ("path" LIKE $52))) OR (("storage" = $53) AND (("path" = $54) OR ("path" LIKE $55))) OR (("storage" = $56) AND (("path" = $57) OR ("path" LIKE $58))) OR (("storage" = $59) AND (("path" = $60) OR ("path" LIKE $61))) OR (("storage" = $62) AND (("path" = $63) OR ("path" LIKE $64))) OR (("storage" = $65) AND (("path" = $66) OR ("path" LIKE $67))) OR (("storage" = $68) AND (("path" = $69) OR ("path" LIKE $70))) OR (("storage" = $71) AND (("path" = $72) OR ("path" LIKE $73))) OR (("storage" = $74) AND (("path" = $75) OR ("path" LIKE $76))) OR (("storage" = $77) AND (("path" = $78) OR ("path" LIKE $79))) OR (("storage" = $80) AND ("path" = $81)) OR (("storage" = $82) AND ("path" = $83)) OR (("storage" = $84) AND ("path" = $85)) OR (("storage" = $86) AND ("path" = $87)) OR (("storage" = $88) AND ("path" = $89)) OR (("storage" = $90) AND ("path" = $91)) OR (("storage" = $92) AND ("path" = $93)) OR (("storage" = $94) AND ("path" = $95)) OR (("storage" = $96) AND ("path" = $97)) OR (("storage" = $98) AND ("path" = $99)) OR (("storage" = $100) AND ("path" = $101)) OR (("storage" = $102) AND ("path" = $103)) OR (("storage" = $104) AND ("path" = $105)) OR (("storage" = $106) AND ("path" = $107)) OR (("storage" = $108) AND ("path" = $109)) OR (("storage" = $110) AND ("path" = $111)) OR (("storage" = $112) AND ("path" = $113)) OR (("storage" = $114) AND ("path" = $115)) OR (("storage" = $116) AND ("path" = $117)) OR (("storage" = $118) AND ("path" = $119)) OR (("storage" = $120) AND ("path" = $121)) OR (("storage" = $122) AND ("path" = $123)) OR (("storage" = $124) AND ("path" = $125)) OR (("storage" = $126) AND ("path" = $127)) OR (("storage" = $128) AND ("path" = $129)) OR (("storage" = $130) AND ("path" = $131)) OR (("storage" = $132) AND ("path" = $133)) OR (("storage" = $134) AND ("path" = $135)) OR (("storage" = $136) AND ("path" = $137)) OR (("storage" = $138) AND ("path" = $139)) OR (("storage" = $140) AND ("path" = $141)) OR (("storage" = $142) AND ("path" = $143)) OR (("storage" = $144) AND ("path" = $145)) OR (("storage" = $146) AND ("path" = $147)) OR (("storage" = $148) AND ("path" = $149)) OR (("storage" = $150) AND ("path" = $151)) OR (("storage" = $152) AND ("path" = $153)) OR (("storage" = $154) AND ("path" = $155)) OR (("storage" = $156) AND ("path" = $157)) OR (("storage" = $158) AND ("path" = $159)) OR (("storage" = $160) AND ("path" = $161)) OR (("storage" = $162) AND ("path" = $163)) OR (("storage" = $164) AND ("path" = $165)) OR (("storage" = $166) AND ("path" = $167)) OR (("storage" = $168) AND ("path" = $169)) OR (("storage" = $170) AND ("path" = $171)) OR (("storage" = $172) AND ("path" = $173))) ORDER BY "mtime" + $174 desc LIMIT 5
Analyzed (with args partially obfuscated):
Limit (cost=312527.91..312528.49 rows=5 width=746) (actual time=1078.179..1091.077 rows=5 loops=1)
-> Gather Merge (cost=312527.91..312615.41 rows=750 width=746) (actual time=952.836..965.730 rows=5 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=311527.88..311528.82 rows=375 width=746) (actual time=902.232..902.235 rows=4 loops=3)
Sort Key: ((file.mtime + '0'::bigint)) DESC
Sort Method: top-N heapsort Memory: 33kB
Worker 0: Sort Method: top-N heapsort Memory: 28kB
Worker 1: Sort Method: top-N heapsort Memory: 30kB
-> Nested Loop Left Join (cost=0.29..311521.65 rows=375 width=746) (actual time=49.220..897.684 rows=3726 loops=3)
-> Parallel Seq Scan on oc_filecache file (cost=0.00..309711.86 rows=375 width=233) (actual time=49.114..874.850 rows=3726 loops=3)
Filter: (((name)::text ~~* '%so%'::text) AND (((storage = '4'::bigint) AND (((path)::text = 'files'::text) OR ((path)::text ~~ 'files/%'::text))) OR
((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/3'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/3/%'::text))) OR ((storage =
'178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/4'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/4/%'::text))) OR ((storage = '178'::bigin
t) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/5'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/5/%'::text))) OR ((storage = '178'::bigint) AND (((pa
th)::text = 'appdata_ocfhweyb1d2t/collectives/7'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/7/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text =
'appdata_ocfhweyb1d2t/collectives/6'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/6/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocf
hweyb1d2t/collectives/8'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/8/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/co
llectives/12'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/12/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/
18'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/18/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/19'::text)
OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/19/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/16'::text) OR ((path
)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/16/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/22'::text) OR ((path)::text ~~
'appdata\_ocfhweyb1d2t/collectives/22/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/26'::text) OR ((path)::text ~~ 'appdata\
_ocfhweyb1d2t/collectives/26/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/27'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1
d2t/collectives/27/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/25'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collec
tives/25/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/10'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/10/%
'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/30'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/30/%'::text)))
OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/36'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/36/%'::text))) OR ((stor
age = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/38'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/38/%'::text))) OR ((storage = '178
'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/39'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/39/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/29'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/29/%'::text))) OR
((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/43'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/43/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/44'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/44/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text = 'appdata_ocfhweyb1d2t/collectives/42'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/42/%'::text))) OR ((storage = '178'::bigint) AND (((path)::text =
'appdata_ocfhweyb1d2t/collectives/45'::text) OR ((path)::text ~~ 'appdata\_ocfhweyb1d2t/collectives/45/%'::text))) OR ((storage = '185'::bigint) AND (((path)::text = 'files/My Share'::text) OR ((path)::text ~~ 'files/My Share/%'::text))) OR ((storage = '206'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.png'::text)) OR ((storage = '210'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.JPG'::text)) OR ((storage = '210'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.PNG'::text)) OR ((storage = '206'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.png'::text)) OR ((storage = '206'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.png'::text)) OR ((storage = '211'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.m..png'::text)) OR ((storage = '211'::bigint) AND ((path)::text = 'files/
Talk/my_file_redacted.m..png'::text)) OR ((storage = '211'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.png'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.JPG'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.JPG'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.JPG'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.JPG'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.JPG'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.JPG'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.JPG'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.JPG'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.php'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.png'::text)) OR ((storage = '233'::bigint)
AND ((path)::text = 'files/Talk/my_file_redacted.png'::text)) OR ((storage = '233'::bigint) AND ((path)::text = 'files/Talk/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.jpg'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text))
OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint)
AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.php'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text)) OR ((storage = '171'::bigint) AND ((path)::text = 'files/Deck/my_file_redacted.png'::text))))
Rows Removed by Filter: 465348
-> Index Scan using files_meta_fileid on oc_files_metadata meta (cost=0.29..4.82 rows=1 width=513) (actual time=0.005..0.005 rows=0 loops=11179)
Index Cond: (file_id = file.fileid)
Planning Time: 4.460 ms
JIT:
Functions: 391
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 54.835 ms, Inlining 0.000 ms, Optimization 26.228 ms, Emission 527.024 ms, Total 608.087 ms
Execution Time: 1758.145 ms
Steps to reproduce
- Have a lot of files on your NC install (500k+ ideal)
- Enable slow query logging in postgres (or MySQL I guess) - set the slow query time to anything above 500ms
- Run a search in the unified search
- View slow log file
Expected behavior
Search should be faster
Installation method
Community VM appliance
Nextcloud Server version
28
Operating system
Debian/Ubuntu
PHP engine version
PHP 8.2
Web server
Apache (supported)
Database engine version
PostgreSQL
Is this bug present after an update or on a fresh install?
Fresh Nextcloud Server install
Are you using the Nextcloud Server Encryption module?
None
What user-backends are you using?
- Default user-backend (database)
- LDAP/ Active Directory
- SSO - SAML
- Other
Configuration report
No response
List of activated Apps
No response
Nextcloud Signing status
No response
Nextcloud Logs
No response
Additional info
No response
Activity