-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Arch |
| Distribution Version | NA/Rolling |
| Linux Kernel | 5.11.13 |
| Architecture | amd64 |
| ZFS Version | 2.0.4 |
| SPL Version | 2.0.4 |
Describe the problem you're observing
The normalization property might be set so that two file names can't have the same logical Unicode character regardless of codepoints used (eg: U+00E9 versus U+0065 U+0301) and I like to use it to enforce this. Setting normalization implies that utf8only=on, that's fine and all.
Still, realities of file name storage on foreign systems, both historic and present, make it so that you might come across Zip or Tar files that contain filenames outside of the ASCII character set and are not Unicode-compliant. It is occasionally useful to extract these on a ZFS file system with utf8only=off so that you can massage the file names into being Unicode-compliant and move them over to a utf8only file system.
zfs create -o utf8only=off pool/wildwest might seem like a reasonable command to accomplish this goal, however if the parent dataset (pool in the example) has normalization set to any value other than none, the zfs command basically ignores the request for utf8only=off, inherits the normalization property and ends up with utf8only=on, without any warning or indication that it has done so, at least until a non-UTF8 file name is attempted to be stored.
Only through zfs create -o utf8only=off -o normalization=none can one guarantee a file system with the intended properties.
Describe how to reproduce the problem
# zfs create -o normalization=formD pool/norm
# zfs create -o utf8only=off pool/norm/nonutf
# touch /pool/norm/nonutf/$'\377'
touch: cannot touch /pool/norm/nonutf/$'\377': Invalid or incomplete multibyte or wide character