cmap encoding selection: unicodeEncoding  vs. microsoftUCS4Encoding 

I'm dumping the glyphs from HanaMinB.ttf
( available at

https://osdn.net/frs/redir.php?m=pumath&f=%2Fhanazono-font%2F64385%2Fhanazono-20160201.zip

),  where most of the characters are > U+FFFF.

Enclosed please find the output of
      ttfdump -t cmap HanaMinB.ttf

According to the ttfdump output, this ttf file contains 4 cmap
subtables, covering the 4 encodings defined in truetype.go:

unicodeEncoding = 0x00000003 // PID = 0 (Unicode), PSID = 3 (Unicode 2.0)
microsoftSymbolEncoding = 0x00030000 // PID = 3 (Microsoft), PSID = 0 (Symbol)
microsoftUCS2Encoding = 0x00030001 // PID = 3 (Microsoft), PSID = 1 (UCS-2)
microsoftUCS4Encoding = 0x0003000a // PID = 3 (Microsoft), PSID = 10 (UCS-4)

And the current code selects the first one (unicodeEncoding):

pidPsid := u32(table, offset)
// We prefer the Unicode cmap encoding. Failing to find that, we fall
// back onto the Microsoft cmap encoding.
if pidPsid == unicodeEncoding {
  bestOffset, bestPID, ok = offset, pidPsid>>16, true
  break
} else if pidPsid == microsoftSymbolEncoding ||
  pidPsid == microsoftUCS2Encoding ||
  pidPsid == microsoftUCS4Encoding {
  bestOffset, bestPID, ok = offset, pidPsid>>16, true
// We don't break out of the for loop, so that Unicode can override Microsoft.
}

and none of the >U+FFFF characters are available.

Should we prefer microsoftUCS4Encoding  to the
16-bit-only unicodeEncoding ?

[HanaMinB.ttf-dump-cmap.txt](https://github.com/golang/freetype/files/662057/HanaMinB.ttf-dump-cmap.txt)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmap encoding selection: unicodeEncoding vs. microsoftUCS4Encoding #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cmap encoding selection: unicodeEncoding vs. microsoftUCS4Encoding #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions