Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

Use HarfBuzz for text shaping #7528

Open
ChrisLoer opened this issue Dec 22, 2016 · 7 comments
Open

Use HarfBuzz for text shaping #7528

ChrisLoer opened this issue Dec 22, 2016 · 7 comments
Assignees
Labels
Core The cross-platform C++ core, aka mbgl feature text rendering

Comments

@ChrisLoer
Copy link
Contributor

ChrisLoer commented Dec 22, 2016

Addressing complex text rendering requires us to be able to do complex text shaping on the client side. This issue tracks our evaluation of HarfBuzz as a potential cross-platform solution. Desiderata include:

  • Label text processing should be done on the client side
    • Avoid sending sensitive customer data upstream
    • Avoid load on upstream servers
  • As little font data as possible should be sent to client
    • Unnecessary font data could dramatically slow page load time
    • We want to avoid legal concerns about "redistributing" fonts
  • Support as wide a range of fonts as possible
    • OpenType
    • TrueType
    • AAT (maybe assume AAT fonts aren't portable enough?)
    • Graphite?
    • Bitmapped fonts (probably not?)
  • Uniform solution across all our supported platforms

HarfBuzz was originally targeted at supporting OpenType fonts. It currently supports Apple Advanced Typography (AAT) features by passing through to Core Text on macOS/iOS.

To use HarfBuzz, we would modify node-fontnik to provide an interface for requesting glyphs based on font-specific glyph IDs (instead of the current interface based on Unicode code points). We would also have to send down to the client a minimal set of tables for the desired font in order to perform shaping.

HarfBuzz font data requirements

I performed a quick audit of the HarfBuzz code to determine which data tables need to be passed to the client in order to successfully perform shaping.

Necessary tables

Table Purpose
cmap Code point to glyph ID
head Font header
hhea Header for horizontal metrics
hmtx Horizontal metrics
GDEF Auxiliary glyph definitions (composition rules, etc.) [OpenType]
GPOS Glyph repositioning rules.[OpenType]
GSUB Glyph substitution rules. [OpenType]
maxp Memory allocation information to help processor. Should be small, probably necessary.
os/2 Like the Microsoft version of hhea. If included, may be necessary?
kern Kerning information. [TrueType only?]
vhea Header for vertical metrics
vmtx Vertical metrics

Unnecessary tables

Table Purpose
glyf Glyphs (usually biggest)
loca Pointers to glyph descriptions. Should be able to strip?
name Font naming information.
post Postscript support
JSTF Justification rules (expand or contract text with glyph substitutions/insertions). Probably can strip. [OpenType]
MATH Typesetting for mathematical formulae. Probably can strip? [OpenType]

Tables conditional on Core Text support

Table Purpose
mort Old version of morx
morx Finite automata for glyph substitution and repositioning rules
... it looks like other AAT tables get passed through to Core Text in this case without HarfBuzz knowing anything about them

Unsupported tables

My assumption is that tables not listed above are not supported by HarfBuzz and are safe to strip. I verified that the following (somewhat common) tables are not loaded by HarfBuzz:

VORG, BASE, cvt, fpgm, prep, CFF (PostScript glyphs), LTSH, acnt, bdat, EBDT, gasp, cvar, Zapf

Needs investigation

Fonts with 'silf' table, supported by Graphite

cc @jfirebaugh @tmpsantos @1ec5

@tmpsantos
Copy link
Contributor

tmpsantos commented Dec 22, 2016

@ChrisLoer do you see we going similar path on JS using something like an emscript'ed version of HarfBuzz?

@ChrisLoer
Copy link
Contributor Author

@tmpsantos Yes, I think so -- or at least it's the most full-featured solution I'm aware of. It will depend in part on whether we're willing to accept the bundle-size impact of loading all of an emscript'ed HarfBuzz. One of the things on my todo list is to make a relatively stripped down emscripten build of HarfBuzz to see how large it ends up being.

@ChrisLoer
Copy link
Contributor Author

I've been experimenting with stripping out the "unnecessary tables" listed above in Arial Unicode MS. The base font is 24.4 MB in TTF format. Stripping out most of glyf, loca, and post gets it to a 359KB TTF, and HarfBuzz can still perform shaping using the stubbed out font.

The 359KB font compresses down to 103KB with gzip, or 72KB using WOFF2. WOFF2 uses the brotli compression algorithm, which may give some improvement over gzip, but it also has compression optimizations specifically designed for the hmtx table. On the GL-JS side, the cost of implementing WOFF2 decompression will almost certainly overwhelm the savings.

Using language-specific fonts, the sizes get even smaller. For instance, Noto Sans Devanagari is only 145KB, and strips down to 60KB with glyphs removed.

@kkaefer kkaefer added the Core The cross-platform C++ core, aka mbgl label May 9, 2017
@behdad
Copy link

behdad commented Jul 26, 2017

I'm also very interested to see how an emscriptened HarfBuzz performs, and help trim down the size as needed.

@stale stale bot added the archived Archived because of inactivity label Nov 5, 2018
@mdakram
Copy link

mdakram commented Nov 15, 2018

Is this still into consideration?

@stale stale bot removed the archived Archived because of inactivity label Nov 15, 2018
@ChrisLoer
Copy link
Contributor Author

This is still something we'd like to do, but unfortunately it's a large project that hasn't come to the top of our backlog, and I can't predict when it will other than to say "not in the next six months".

@stale stale bot added the archived Archived because of inactivity label May 14, 2019
@stale
Copy link

stale bot commented May 14, 2019

This issue has been automatically detected as stale because it has not had recent activity and will be archived. Thank you for your contributions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Core The cross-platform C++ core, aka mbgl feature text rendering
Projects
None yet
Development

No branches or pull requests

6 participants