Skip to content

Filename Unicode normalisation #1261

@takerukoushirou

Description

@takerukoushirou

Initial Checklist

  • I understand this is a bug report and questions should be posted in the Community Discussions
  • I searched issues and couldn’t find anything (or linked relevant results below)

Bug Description

Filenames are currently, as far as I can see, treated verbatim. This can lead to a for probably most users confusing situation:

Files that are uploaded from a system that enforces for example NFD (e.g. drag 'n' drop from Finder on macOS) retain their decomposed form, if the filesystem on the server treats filenames verbatim (which is the default for AFAIK most filesystems on Linux).
If now the filename is edited via the web UI, the new name will be in NFC instead. This is an invisible change for the user.

If such a file is then downloaded and uploaded again from a system that enforces a different normalisation form, the name will not match the existing file anymore as the byte sequence differs, and the modified file is instead uploaded as a new file. The user now has two files of seemingly the same name, instead of a new version of an existing file.

Although technically correct, this is rather unexpected behaviour for the user.

Also this is probably only relevant for uploads which should maybe match filenames after normalising their Unicode form (and convert uploaded filenames to the same normalisation form as the matching file that already exists on the server).

Reproduction Steps

  1. Create a text file in the web UI and use e.g. an umlaut in the name.
  2. Download the file to a system that uses a different, fixed Unicode normalisation form, e.g. macOS with NFD.
  3. Modify the file contents and drop the file in the browser. The file will be uploaded as a separate file that has the same name, just with a different Unicode normalisation form applied.
Image

I've been using Safari on macOS with OpenCloud running on a Linux server with ZFS in this example.

# ls -Ahl
total 33K
-rw------- 1 opencloud opencloud 26 Sep 24 19:10 'Test file with umlaut Ä.txt'
-rw-r--r-- 1 opencloud opencloud 18 Sep 24 19:10 'Test file with umlaut Ä.txt'

The filename created in the web UI using NFC:

hexdump -c
0000000   T   e   s   t       f   i   l   e       w   i   t   h       u
0000010   m   l   a   u   t     303 204   .   t   x   t  \n
000001d

The filename uploaded through Finder after being downloaded, which is now in NFD:

hexdump -c
0000000   T   e   s   t       f   i   l   e       w   i   t   h       u
0000010   m   l   a   u   t       A 314 210   .   t   x   t  \n
000001e

Expected Outcome

The user should not have to care about whether files are in NFC or NFD. If a filename is the same when normalised to either form, it should be treated as equal and the web interface should ask the user if the newly uploaded file should replace the existing file.

Actual Outcome

If filenames are equal (from a user's perspective) but differ in Unicode normalisation form, a file that is uploaded again could be uploaded as a new file instead of a new version.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions