Skip to content

[Bug]: File Search poor performance with lots of files because of index workaround #38243

Open
@pjft

Description

⚠️ This issue respects the following points: ⚠️

  • This is a bug, not a question or a configuration/webserver/proxy issue.
  • This issue is not already reported on Github (I've searched it).
  • Nextcloud Server is up to date. See Maintenance and Release Schedule for supported versions.
  • Nextcloud Server is running on 64bit capable CPU, PHP and OS.
  • I agree to follow Nextcloud's Code of Conduct.

Bug description

When you search for a file in a NextCloud instance where the user has a lot of them (800k+), performance is atrocious. Might be related to having multiple storages, though not sure of it.

Alas, for some reason, this can be traced back to a deliberate decision that isn't clear to me though it must have a really good reason for it, hence me wanting to raise the question to make sure that the reason behind it still holds.

I'm showing the results of the current query that Nextcloud runs (and the MySQL explanation), and then the same for the changed query I put together. For the sake of intellectual correctness, I ran all queries after restarting the MySQL service, so that there would be no cache at the time.

Fixed Query: 0.428 secs

MariaDB [nextcloud]>  SELECT `file`.`fileid`, `storage`, `path`, `path_hash`, `file`.`parent`, `file`.`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `unencrypted_size` FROM `oc_filecache` `file` WHERE (`file`.`name` COLLATE utf8mb4_general_ci LIKE '%mario%') AND (((`storage` = 38) AND ((`path` = 'files') OR (`path` LIKE 'files/%'))) OR (`storage` = 30) OR (`storage` = 31) OR (`storage` = 32) OR (`storage` = 35) OR (`storage` = 36)) ORDER BY `mtime` desc LIMIT 6;

<...>

6 rows in set (0.428 sec)

Current Query in Nextcloud: 7.401 secs (it adds a +'0' to the order by condition and MySQL doesn't use the mtime index for it).

MariaDB [nextcloud]>  SELECT `file`.`fileid`, `storage`, `path`, `path_hash`, `file`.`parent`, `file`.`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `unencrypted_size` FROM `oc_filecache` `file` WHERE (`file`.`name` COLLATE utf8mb4_general_ci LIKE '%mario%') AND (((`storage` = 38) AND ((`path` = 'files') OR (`path` LIKE 'files/%'))) OR (`storage` = 30) OR (`storage` = 31) OR (`storage` = 32) OR (`storage` = 35) OR (`storage` = 36)) ORDER BY `mtime` +'0' desc LIMIT 6;

<...>

6 rows in set (7.401 sec)

Explanation for the fixed query: uses mtime index key

MariaDB [nextcloud]> explain SELECT `file`.`fileid`, `storage`, `path`, `path_hash`, `file`.`parent`, `file`.`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `unencrypted_size` FROM `oc_filecache` `file` WHERE (`file`.`name` COLLATE utf8mb4_general_ci LIKE '%mario%') AND (((`storage` = 38) AND ((`path` = 'files') OR (`path` LIKE 'files/%'))) OR (`storage` = 30) OR (`storage` = 31) OR (`storage` = 32) OR (`storage` = 35) OR (`storage` = 36)) ORDER BY `mtime` desc LIMIT 6;
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+----------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys                                                                                       | key      | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+----------+---------+------+------+-------------+
|    1 | SIMPLE      | file  | index | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix | fs_mtime | 8       | NULL |    6 | Using where |
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+----------+---------+------+------+-------------+
1 row in set (0.004 sec)

Explanation for the current query: uses fs_storage_path_prefix

MariaDB [nextcloud]> explain SELECT `file`.`fileid`, `storage`, `path`, `path_hash`, `file`.`parent`, `file`.`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, `etag`, `permissions`, `checksum`, `unencrypted_size` FROM `oc_filecache` `file` WHERE (`file`.`name` COLLATE utf8mb4_general_ci LIKE '%mario%') AND (((`storage` = 38) AND ((`path` = 'files') OR (`path` LIKE 'files/%'))) OR (`storage` = 30) OR (`storage` = 31) OR (`storage` = 32) OR (`storage` = 35) OR (`storage` = 36)) ORDER BY `mtime` +'0' desc LIMIT 6;
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+------------------------+---------+------+--------+----------------------------------------------------+
| id   | select_type | table | type  | possible_keys                                                                                       | key                    | key_len | ref  | rows   | Extra                                              |
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+------------------------+---------+------+--------+----------------------------------------------------+
|    1 | SIMPLE      | file  | range | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix | fs_storage_path_prefix | 267     | NULL | 814718 | Using index condition; Using where; Using filesort |
+------+-------------+-------+-------+-----------------------------------------------------------------------------------------------------+------------------------+---------+------+--------+----------------------------------------------------+
1 row in set (0.002 sec)

Also, for context, total number of files in instance (there's only a single user): 862k.

MariaDB [nextcloud]> select count(fileid) from oc_filecache;
+---------------+
| count(fileid) |
+---------------+
|        862541 |
+---------------+
1 row in set (0.477 sec)

Now, the thing is that there seems to be a very deliberate effort to force MySQL to use the index with fs_storage_path_prefix instead of mtime . However, for whatever reason, in this specific context, it just makes the search query very slow. I have multiple storage points (5, I think) and lots of files. The order by clause is what causes the slow-down - removing the order by clause works as well, but I imagine there is an intention (for the sake of usability) to show the most recent files first. The mtime index helps, but I struggle to understand why we wouldn't want to use it and in what scenarios does the other index perform better.

This change was introduced one year ago in #29300 and then #32074 by @icewind1991 . I'd love to learn more so that I can understand whether it makes sense for me to submit a pull request to revert it (as maybe the underlying context might have changed since and that workaround might no longer be necessary?), or should I just revert the change on my end alone (which I have done and seems to work as intended so far).

This issue seems to be similar to #35776 and others related to search performance, but it clearly seems there are many others in the actual team who are more knowledgeable about the topic, so would love to hear their guidance.

Regards, and keep up the great work.

Steps to reproduce

  1. Search for a file on the file search box, on a large repository.
  2. Results take at time over 10 seconds to come back.

Expected behavior

File search results to return shortly.

Installation method

Community Manual installation with Archive

Nextcloud Server version

25

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.1

Web server

Apache (supported)

Database engine version

MariaDB

Is this bug present after an update or on a fresh install?

Updated to a major version (ex. 22.2.3 to 23.0.1)

Are you using the Nextcloud Server Encryption module?

None

What user-backends are you using?

  • Default user-backend (database)
  • LDAP/ Active Directory
  • SSO - SAML
  • Other

Configuration report

{
    "system": {
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "<redacted>"
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "25.0.3.2",
        "overwrite.cli.url": "<redacted>",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "default_phone_region": "pt",
        "defaultapp": "files",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "twofactor_enforced": "true",
        "twofactor_enforced_groups": [],
        "twofactor_enforced_excluded_groups": [],
        "mail_smtpmode": "sendmail",
        "mail_smtpauthtype": "LOGIN",
        "mail_sendmailmode": "pipe",
        "mail_smtpsecure": "ssl",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpauth": 1,
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpport": "587",
        "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
        "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
        "maintenance": false,
        "memcache.local": "\\OC\\Memcache\\Redis",
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "port": 6379
        },
        "theme": "",
        "loglevel": 2,
        "updater.release.channel": "stable"
    }
}

List of activated Apps

Enabled:
  - activity: 2.17.0
  - bruteforcesettings: 2.5.0
  - calendar: 4.3.3
  - circles: 25.0.0
  - cloud_federation_api: 1.8.0
  - comments: 1.15.0
  - contacts: 5.2.0
  - contactsinteraction: 1.6.0
  - dashboard: 7.5.0
  - dav: 1.24.0
  - federatedfilesharing: 1.15.0
  - federation: 1.15.0
  - files: 1.20.1
  - files_external: 1.17.0
  - files_pdfviewer: 2.6.0
  - files_rightclick: 1.4.0
  - files_sharing: 1.17.0
  - files_trashbin: 1.15.0
  - files_versions: 1.18.0
  - firstrunwizard: 2.14.0
  - logreader: 2.10.0
  - lookup_server_connector: 1.13.0
  - nextcloud_announcements: 1.14.0
  - notifications: 2.13.1
  - oauth2: 1.13.0
  - onlyoffice: 7.8.0
  - password_policy: 1.15.0
  - photos: 2.0.1
  - previewgenerator: 5.2.4
  - privacy: 1.9.0
  - provisioning_api: 1.15.0
  - related_resources: 1.0.4
  - serverinfo: 1.15.0
  - settings: 1.7.0
  - sharebymail: 1.15.0
  - support: 1.8.0
  - survey_client: 1.13.0
  - systemtags: 1.15.0
  - tasks: 0.14.5
  - text: 3.6.0
  - theming: 2.0.1
  - twofactor_backupcodes: 1.14.0
  - twofactor_nextcloud_notification: 3.6.0
  - twofactor_totp: 7.0.0
  - updatenotification: 1.15.0
  - user_status: 1.5.0
  - viewer: 1.9.0
  - weather_status: 1.5.0
  - workflow_script: 1.10.0
  - workflowengine: 2.7.0
Disabled:
  - admin_audit
  - encryption
  - extract: 1.3.5
  - files_accesscontrol: 1.15.1
  - music: 1.8.3
  - printer: 0.0.3
  - recommendations: 0.6.0
  - suspicious_login
  - twofactor_gateway: 0.20.0
  - user_ldap

Nextcloud Signing status

No errors have been found.

Nextcloud Logs

No response

Additional info

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions