Skip to content

Conversation

@derwind
Copy link
Contributor

@derwind derwind commented Dec 1, 2021

The Glyph.unicodes of the latest defcon and ufoLib2 are expected to be of list type. However, ttFont['cmap'].buildReversed() returns a set, so we need to convert it to a list.

Copy link
Member

@benkiel benkiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just unsure if fontTools is giving an already sorted list, if so, sorted is fine. If it's not, the order of the set can have meaning, so doing list might be better here.

destinationGlyph.verticalOrigin = tsb + yMax
# unicodes
destinationGlyph.unicodes = reversedMapping.get(glyphName, [])
destinationGlyph.unicodes = sorted(reversedMapping.get(glyphName, []))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without looking into how font tools returns the set; sorted appears to be dangerous, as the default unicode for a glyph may not the the first one in a sort; do you know offhand how fontTools returns things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • sorted(..) always returns list
  • fontTools returns a set in the CMAP and a set is unordered

but the implementation is here maybe to strict or demanding: I would rather opt for setting unicodes as any kind of iterator object and not specify it needs to be a specific type. See how defcon implements -> this <-

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derwind Please change sorted to list and this is will be ready for merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benkiel @typemytype

Thank you for your review. I've just committed modification via your suggestion.

Copy link
Member

@anthrotype anthrotype Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set is by definition unordered, meaning that its order is literally random, more precisely dependent on the random hash seed for the current Python invocation; if you invoke python again, the order of that set will be different, because the hash seed is different each time by default. The non determinism may be a problem for your use case, as you may obtain different results at each invokation, which is why @derwind suggested to sort the set.
Unlike in defcon or robofab's API, in cmap, there's no notion of a primary unicode codepoint (as in the first of the list), because cmap maps from codepoint (key) to glyph names (or indices actually), not the other way around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, correction. Only str and bytes are "salted" with a random hash seed, not integers like in our case.
https://docs.python.org/3/reference/datamodel.html#object.__hash__

in CPython the integer's hash is the number itself, except for very big numbers (well beyond unicode max) or for -1 (which is special case as it means an error in CPython API, so it's hash collide with -2).

In any case, relying on any particular order for a set is wrong, the hash values and thus the order of sets of ints are implementation details. glyph.unicode => unicodes[0] is just wrong and should be deprecated for good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anthrotype

Thank you for all the explanations. I used sorted out of my usual habit, but I guess it wasn't very good. The only essential part is that I wanted to use list type for unicodes instead of set type, because defcon, ufoLib2, etc. seem to use list for unicodes, so I wanted to be consistent if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, extractor should be setting unicodes to a list, because that's what the Glyph class expects. I am saying that sorted(s) is arguably better than list(s), because the set has no intrinsic/stable order one can/should rely on. At least with sorted you know what you're getting. Anyway I'll leave to Ben and Frederik to decide, I'm not the maintainer.

@benkiel benkiel merged commit e695785 into robotools:master Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants