Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Do not detect MD5s as UUIDs, and preserve UUID casing for UUID PKs #813

Merged
merged 5 commits into from
Dec 30, 2023

Conversation

nolar
Copy link
Contributor

@nolar nolar commented Dec 27, 2023

Comparing MD5s as UUIDs does not work anyway: it improperly slices and then compares the values, since our code always renders UUIDs as abcdabcd-abcd-abcd-abcd-abcdabcdabcd, always dashed and lower-cased, while the actual value stored in MD5 (i.e. string) PKs can be uppercased and typically non-dashed (e.g. ABCDABCDABCDABCDABCDABCDABCDABCD). As a result, all such MD5 PKs go into one pseudo-UUID range, usually the first one (because in ASCII & UTF-8, uppercase is lesser than lowercase letters).

The root cause is that Python's UUID can parse even such values:

In [3]: from hashlib import md5
In [8]: s=md5(b'hello').hexdigest()

In [9]: s
Out[9]: '5d41402abc4b2a76b9719d911017c592'

In [10]: from uuid import uuid4, UUID
In [11]: UUID(s)
Out[11]: UUID('5d41402a-bc4b-2a76-b971-9d911017c592')

This PR excludes MD5s and other UUID-like textual PKs from UUID detection.

As an extra change (separate commits), this PR also preserves the information on how the database presents the UUIDs — either lowercased or uppercased, and renders the actual sliced UUID values accordingly. This does not matter for native UUIDs (stored & compared as numbers), but does matter for UUIDs stored and/or compared as strings (at least from one side of the diff).

@nolar
Copy link
Contributor Author

nolar commented Dec 27, 2023

FIXED. On a unrelated discussion, this popped up: two sides should be lower-/upper-cased independently based on each side's samples. However, we now slice by PK ranges of one side, and propagate that side to the other one. The casing of the "other" side must be preserved.

Copy link

@dagadbm dagadbm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to unblock if needed but needs proper review

@nolar nolar force-pushed the uuid-misdetection branch 2 times, most recently from 136e605 to 2114ede Compare December 30, 2023 14:12
Sergey Vasilyev added 4 commits December 30, 2023 19:49
It fails the comparison anyway — because of casing & dashes not fitting into alphanumeric ranges/slices.
…e when slicing

Otherwise, it uses the same PK values, e.g. `ArithUUID` from the side A, and then pushes them to side B, where improper rendering can lead to improper slicing.
@nolar nolar force-pushed the uuid-misdetection branch from 2114ede to 9a99030 Compare December 30, 2023 18:49
@nolar nolar merged commit 8f55fb4 into master Dec 30, 2023
@nolar nolar deleted the uuid-misdetection branch December 30, 2023 19:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants