Skip to content

Path deduplication in oc_filecache #42182

Open

Description

How to use GitHub

  • Please use the 👍 reaction to show that you are interested into the same feature.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Is your feature request related to a problem?
My small/medium Nextcloud instance (~80GB) has a database of 300MB, mainly caused (>60%) by the table oc_filecache.
I think there is potential to reduce the database size which might also improve performance for large instances.

Describe the solution you'd like
While working on #41321, I noticed that oc_filecache contains the full (internal) path for each file and additionally the file name.
A path deduplication in the database could decrease the table size by a lot.

This could be achieved by creating a table oc_directories (or oc_paths) containing directories and mapping them to an id.
This id can then be used in the oc_filecache instead of the raw string and the entire path can be re-created by joining oc_filecache and oc_directories and combining the directory path with the file name.
Then, the full path to the directory will only be in the oc_directories table and a directory with lots and lots of files wouldn't increase the table size by that much.

Describe alternatives you've considered

  • One minor fix could be to at least drop the file name column (since the information can easily be retrieved from the path using basename).
  • The table already contains the column parent with the file_id of the parent directory. Resolving that recursively until parent = file_id could already replace the path column. However, I'm not sure how that could affect the performance for very deep directory nestings and I feel like the solution mentioned above might be a compromise.

Additional context
I can help implementing this, but would appreciate a few pointers if there is something to consider.

I'm also not sure if 3rd-party apps use the filecache - if so, this change would have to be rolled out in a major release since it could break these apps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions