Skip to content

Check for duplicates may return non-duplicates (sys_file.storage, case insensitive DB, ...) #2

Closed
@sypets

Description

@sypets

From first look, I would consider this harmful if one is not careful.

There may be 2 reasons non-duplicates are detected as duplicates:

  1. sys_file.storage should be checked as well (if more than one storage used)
  2. if DB checks case-insensitive so files such as /myfile.jpg and /MYFILE.jpg are detected as duplicates (depends on DB collation, e.g. utf8mb4_general_ci is case insensitive)

What collation does TYPO3 use by default?
See
v11.5:

public/typo3/sysext/install/Classes/Controller/InstallerController.php: 'collate' => 'utf8mb4_unicode_ci',

https://docs.typo3.org/c/typo3/cms-core/main/en-us/Changelog/9.5/Feature-80398-Utf8mb4OnMysqlByDefaultForNewInstances.html

Reproduce

Try adding the following examples as files and check if they are detected as duplicates

Examples:

sys_file.storage sys_file.identifier file
1 (fileadmin) /dir1/abc.jpg fileadmin/dir1/abc.jpg
2 (media) /dir1/abc.jpg media/dir1/abc.jpg
1 (fileadmin) /dir1/ABC.jpg fileadmin/dir1/ABC.jpg

recommendation:

  • always check sys_file.storage as well (if duplicate, identifer and storage must be identical)
  • improve DB query, e.g. by checking if the following is identical as well: identifier_hash, sha1 (identifier_hash should suffice, I think)
  • or do a binary compare of the field
  • or do additional compare of the strings in PHP

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions