-
Notifications
You must be signed in to change notification settings - Fork 14
Ensure that 'unicodes' is a list #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
benkiel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just unsure if fontTools is giving an already sorted list, if so, sorted is fine. If it's not, the order of the set can have meaning, so doing list might be better here.
Lib/extractor/formats/opentype.py
Outdated
| destinationGlyph.verticalOrigin = tsb + yMax | ||
| # unicodes | ||
| destinationGlyph.unicodes = reversedMapping.get(glyphName, []) | ||
| destinationGlyph.unicodes = sorted(reversedMapping.get(glyphName, [])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without looking into how font tools returns the set; sorted appears to be dangerous, as the default unicode for a glyph may not the the first one in a sort; do you know offhand how fontTools returns things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorted(..)always returns list- fontTools returns a
setin the CMAP and a set is unordered
but the implementation is here maybe to strict or demanding: I would rather opt for setting unicodes as any kind of iterator object and not specify it needs to be a specific type. See how defcon implements -> this <-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derwind Please change sorted to list and this is will be ready for merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. I've just committed modification via your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set is by definition unordered, meaning that its order is literally random, more precisely dependent on the random hash seed for the current Python invocation; if you invoke python again, the order of that set will be different, because the hash seed is different each time by default. The non determinism may be a problem for your use case, as you may obtain different results at each invokation, which is why @derwind suggested to sort the set.
Unlike in defcon or robofab's API, in cmap, there's no notion of a primary unicode codepoint (as in the first of the list), because cmap maps from codepoint (key) to glyph names (or indices actually), not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, correction. Only str and bytes are "salted" with a random hash seed, not integers like in our case.
https://docs.python.org/3/reference/datamodel.html#object.__hash__
in CPython the integer's hash is the number itself, except for very big numbers (well beyond unicode max) or for -1 (which is special case as it means an error in CPython API, so it's hash collide with -2).
In any case, relying on any particular order for a set is wrong, the hash values and thus the order of sets of ints are implementation details. glyph.unicode => unicodes[0] is just wrong and should be deprecated for good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the explanations. I used sorted out of my usual habit, but I guess it wasn't very good. The only essential part is that I wanted to use list type for unicodes instead of set type, because defcon, ufoLib2, etc. seem to use list for unicodes, so I wanted to be consistent if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, extractor should be setting unicodes to a list, because that's what the Glyph class expects. I am saying that sorted(s) is arguably better than list(s), because the set has no intrinsic/stable order one can/should rely on. At least with sorted you know what you're getting. Anyway I'll leave to Ben and Frederik to decide, I'm not the maintainer.
The
Glyph.unicodesof the latest defcon and ufoLib2 are expected to be of list type. However,ttFont['cmap'].buildReversed()returns a set, so we need to convert it to a list.