Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 3.75 KB

2022-11-14-why-ir.md

File metadata and controls

43 lines (31 loc) · 3.75 KB

The Case for a Font Compiler IR

Motivation

The Oxidize project aims to rebuild basic font infrastructure in Rust followed by tools built on top of this to support font compilation, subsetting, instancing, shaping, hinting and rasterization.

For the font compiler in particular, there is a question of whether to take the expedient path of binding Norad (UFO parser) directly to write-fonts (OpenType font encoder), or to invest the resources necessary to build out a principled intermediate representation.

An intermediate representation, or IR, is a standard component in compiler architecture. It is a data structure that sits between the front- and back-ends of a compiler and is generally suitable for analysis and transformation. One or more IRs are present in nearly all production compilers in use today. There are a few primary reasons for this:

  1. Provides a (mostly) idealized model
    • Can generally ignore idiosyncrasies and ossified constraints imposed by source and target formats
      • When these can’t be completely ignored, expose through extensions and annotations in IR
    • Dependencies in the model are generally made explicit and tracked allowing for transformations without breaking invariants
  2. Decouples the front- and back-ends of a compiler
    • Allow these components to be developed independently with architectures and designs that are focused on their immediate needs
    • Enable support for multiple source and target formats in M+N rather than MxN implementations
      • Opens the door for experimentation with new formats without having to rebuild the world from scratch
  3. Promotes collaboration and code reuse
    • Analysis and transform passes operate directly on the IR so they can be implemented once and be immediately usable with all supported source and target formats
    • Due to both 1) and 2), developers can contribute without being experts in arcane specifications
  4. Potentially reusable artifacts
    • Substructures of an IR can be cached to accelerate future compilations

Design considerations

While it’s too early to commit to any detailed architecture or design, it is clear that there are a few elements that are important to consider:

  • Variable first!
    • Support both multiple masters and delta based variation models while providing robust conversions between them
      • Generation of deltas from masters should be supported within IR rather than punting to the lower level OpenType crate
  • Color as a first class citizen
  • No hard dependencies on source formats (UFO, glyphs) or target formats (OpenType). Lowering passes should be in separate crates (ufo2fontir).

Potential complications

The history of OpenType has led to various differences between platforms and applications that necessitate breaking the “ideal” model in various places:

  • macOS requires specific entries in the naming table to establish proper family linking and display the desired names in font selection UIs. This is no longer the case.
  • Line metrics may be calculated differently based on information from the horizontal metrics and OS/2 and Windows metrics tables. This information can be specified in most font editors and must be carried accurately through the compilation pipeline.
  • Any other cases where OpenType has inconsistent semantics or requires odd structures?

The recommendation is for the IR to support arbitrary extensions in the form of annotations in the same way that LLVM has extensions for target specific intrinsics. For example, the LLVM type system is extended to model the x86_fp80 80 bit floating point type with associated instructions (llvm.sin.f80) to support it. We could do something similar by attaching an OS/2 annotation to our font metrics structure to capture the overridable typographic metrics.