-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the feature would like to see added to OpenZFS
Inspired by discussion on issue #13392, I would like to see a "lightweight" deduplication optimised for copying operations.
The basic idea is that instead of attempting to build a full dedup table for one or more entire datasets (which can require a lot of RAM), the "lightweight" table would only track blocks that have been recently read, with some options for tuning the amount of dedup data held in memory using this method.
Depending upon how well this "lightweight" dedup table is tuned, it should be possible for ZFS to eliminate large amounts of duplication as a result of copying within the pool, as any data that is written out shortly after reading (i.e- copied) should be detectable using this more limited dedup table.
How will this feature improve OpenZFS?
In most cases full deduplication is an overkill feature, requiring a large amount of RAM to enable. However, for many use cases the primary source of duplication is going to be copying within or between datasets, as well as typical read/write activity on files where files are written out with only some parts changed.
By focusing dedup on the internal copying use cases, it should be possible to allow dedup to be used with a much smaller memory impact, and for use cases where internal copying is the main source of duplication this ought to achieve much the same benefits as full deduplication; namely fast (metadata only) copying/moving of files within the same or related (same encryption root, similar settings) datasets, and reduced impact on capacity.
As with full deduplication this should also benefit partial copies, i.e- where only some of a file's records are copied while others are discarded, replaced or added. The main limitation of this "lightweight" deduplication is that data that has left the smaller deduplication table cannot be deduplicated, i.e- files that are opened but not written out until some time later, or files that are imported from an external source, however if these are minority cases for a pool there will still be a benefit from enabling the feature.
Additional context
Firstly, there's almost certainly a better name for what I'm describing, but "lightweight" deduplication is the best I've come up with so far, since the idea is "deduplication of the most recently read files", i.e- deduplication of short-term copying/editing. I also quite like "temporal deduplication" because it sounds extra fancy.
The actual tuning of this dedup table is a little tricky, but I have three options I've thought of so far:
- Tie the Dedup to ARC/L2ARC: Since the ARC/L2ARC already covers recently accessed data, my current preferred solution is to effectively generate a dedup table for the ARC (and also maybe L2ARC)'s contents only. Unless a user is using a weirdly huge ARC/L2ARC the dedup table will still be a lot smaller than it would be for full deduplication, and all we really need is a way to lookup ARC/L2ARC content using a hash. In this case the only setting required is something on the receiving dataset to indicate whether deduplication should be used where possible (e.g-
deduplication=lightweight?), otherwise the dedup table is ignored and a copy occurs as normal. A possible secondary setting on the pool could determine whether L2ARC is included? - Deduplicate open (and recently opened) read streams: This would mean the dedup table scales with the amount of read activity (potential copying) on the enabled dataset(s), with a setting to determine how quickly space is reclaimed once a read stream is closed. This shouldn't be too quick, or copies may be missed, but the longer the cleanup takes, the bigger the maximum size will become. This would require the same target dataset setting as the above option, as well as a source dataset option to enable dedup of reads, and also possibly a per dataset setting to limit the amount of "recently open" data. While this method should be functional, it would also be the most complicated to configure and control the size of.
- Maximum Size: The simplest option, requiring a global setting to set the size limit for the "lightweight" dedup table as shared across all enabled datasets on the system (if any). This would leave performance/effectiveness to be determined by the user with more/less performance the larger/smaller the maximum size is.
In all three cases, any enabled dedup devices could be used as normal for storing the actual table(s), though this would mainly be of benefit if your dedup devices aren't big enough to store whole dataset dedup tables.