Skip to content

Commit

Permalink
cmdline: introduce --xattr as shortcut for --xattr-write, --xattr-rea…
Browse files Browse the repository at this point in the history
…d & --write-unfinished
  • Loading branch information
sahib committed Jan 26, 2019
1 parent a9a39dc commit 85c2b71
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 6 deletions.
26 changes: 21 additions & 5 deletions docs/rmlint.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -514,23 +514,39 @@ Caching
*NOTE:* In ``--replay`` mode, a new ``.json`` file will be written to
``rmlint.replay.json`` in order to avoid overwriting ``rmlint.json``.

:``-C --xattr``:

Shortcut for ``--xattr-write``, ``--xattr-write``, ``--write-unfinished``.

This comment has been minimized.

Copy link
@jim-collier

jim-collier Jan 26, 2019

Fantastic. But don't you mean --xattr-read for the first argument? (Rather than two --xattr-write in a row.)

This will write a checksum and a timestamp to the extended attributes of each
file that rmlint hashed. This speeds up subsequent runs on the same data set.
Please note that not all filesystems may support extended attributes and you
need write support to use this feature.

See the individual options below for more details and some examples.

:``--xattr-read`` / ``--xattr-write`` / ``--xattr-clear``:

Read or write cached checksums from the extended file attributes.
This feature can be used to speed up consecutive runs.

**CAUTION:** This could potentially lead to false positives if file contents are
somehow modified without changing the file mtime.
somehow modified without changing the file modification time. rmlint uses the mtime
to determine the modification timestamp if a checksum is outdated.

This comment has been minimized.

Copy link
@jim-collier

jim-collier Jan 26, 2019

A useful addition to this caution might be something along the line of:

"But if rmlint is doing a deduplication run, and false matches are submitted to the Linux kernel extent-same ioctl, then the kernel will do a byte-for-byte check itself before deduplicating any blocks. (And skip any blocks that aren't identical.) Thus, at least in the case of Btrfs (and presumably XFS), a false positive match still won't result in data loss."


**NOTE:** Many tools do not support extended file attributes properly,
resulting in a loss of the information when copying the file or editing it.
Also, this is a linux specific feature that works not on all filesystems and
only if you have write permissions to the file.

**NOTE:** You can specify ``--xattr-write`` and ``--xattr-read`` at the same time.
This will read from existing checksums at the start of the run and update all hashed
files at the end.

Usage example::

$ rmlint large_file_cluster/ -U --xattr-write # first run.
$ rmlint large_file_cluster/ --xattr-read # second run.
$ rmlint large_file_cluster/ -U --xattr-write # first run should be slow.
$ rmlint large_file_cluster/ --xattr-read # second run should be faster.

# Or do the same in just one run:
$ rmlint large_file_cluster/ --xattr

:``-U --write-unfinished``:

Expand Down
11 changes: 11 additions & 0 deletions lib/cmdline.c
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,16 @@ static gboolean rm_cmd_parse_limit_sizes(_UNUSED const char *option_name,
}
}

static gboolean rm_cmd_parse_xattr(_UNUSED const char *option_name,
_UNUSED const gchar *_,
RmSession *session,
_UNUSED GError **error) {
session->cfg->write_cksum_to_xattr = true;
session->cfg->read_cksum_from_xattr= true;
session->cfg->write_unfinished = true;
return true;
}

static GLogLevelFlags VERBOSITY_TO_LOG_LEVEL[] = {[0] = G_LOG_LEVEL_CRITICAL,
[1] = G_LOG_LEVEL_ERROR,
[2] = G_LOG_LEVEL_WARNING,
Expand Down Expand Up @@ -1283,6 +1293,7 @@ bool rm_cmd_parse_args(int argc, char **argv, RmSession *session) {
{"newer-than-stamp" , 'n' , 0 , G_OPTION_ARG_CALLBACK , FUNC(timestamp_file) , _("Newer than stamp file") , "PATH"} ,
{"newer-than" , 'N' , 0 , G_OPTION_ARG_CALLBACK , FUNC(timestamp) , _("Newer than timestamp") , "STAMP"} ,
{"config" , 'c' , 0 , G_OPTION_ARG_CALLBACK , FUNC(config) , _("Configure a formatter") , "FMT:K[=V]"} ,
{"xattr" , 'C' , 0 , G_OPTION_ARG_CALLBACK , FUNC(xattr) , _("Enable xattr based caching") , ""} ,

/* Non-trivial switches */
{"progress" , 'g' , EMPTY , G_OPTION_ARG_CALLBACK , FUNC(progress) , _("Enable progressbar") , NULL} ,
Expand Down
2 changes: 1 addition & 1 deletion lib/hasher.c
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ static gboolean rm_hasher_buffered_read(RmHasher *hasher, GThreadPool *hashpipe,
rm_log_error_line("Unexpected EOF in rm_hasher_buffered_read");
break;
} else if(bytes_read == 0) {
rm_log_error_line(_("Something went wrong reading %s; expected %li bytes, "
rm_log_warning_line(_("Something went wrong reading %s; expected %li bytes, "
"got %li; ignoring"),
path, (long int)bytes_to_read,
(long int)*bytes_actually_read);
Expand Down

0 comments on commit 85c2b71

Please sign in to comment.