Skip to content

Conversation

@cmyr
Copy link
Member

@cmyr cmyr commented Jan 19, 2022

Rendered

This is a proposal for a low-level crate to define read-only and mutable versions all the various structures defined in the various font tables. There was an earlier draft of this shared with @rsheeter but I've almost completely rewritten it since then, and I think I have a much better sense of the problem space.

Let me know if there's anything in here that is unclear, and of course if there is any additional feedback. Thanks to @simoncozens and @dfrg, this write-up is mostly just me smushing together and explaining existing work from pinot and fonttools-rs.

@cmyr cmyr force-pushed the font-table-types branch from b795fb4 to a5b14f7 Compare January 19, 2022 19:41
The particulars of the design will be determined through this exploration, but a
few core ideas are clear. For the read-only case, the design is largely borrowed
directly from [pinot][], with the main difference being using macros for code
generation, and that approach has been successful for [fonttools-rs][].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ, what is unique to the fonttools-rs codegen? I'm under the impression macros are a widely used and well known capability in Rust?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not unique per se, but it is a demonstration that using macros to generate code for these sorts of data structures (which have various weird features like the offset tables and heterogeneous collections) should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the most important thing to learn from fonttools-rs/otspec is the situations where macro-based parsing hasn't been successful.

We all agree that the easy cases are easy. It's the hard cases that stress the techniques. OpenType subtables are often badly behaved: many tables are not self-contained and require data from other tables to correctly parse them (maxp/loca/glyf being one example); others have internal dependencies in how they are parsed. ValueRecords show both of these dysfunctions.

It would be very valuable to look at the bits of fonttools-rs which don't use macros, and then compare those with how harfbuzz parses those subtables.

pub segment_maps: Vec<SegmentMapsOwned>,
}

struct SegmentMapsOwned(pub Vec<AxisValueMap>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "Owned" suffix a normal Rust pattern or something unique here? Are there other examples in the Rust ecosystem we might refer to and learn from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not unheard of, but I think it's more common when you're retrofitting owned types onto a reference first library which is sort of the case here. The other idiomatic approach would be SegmentMaps for the owned type and SegmentMapsRef (SegmentMapsMut) for the borrowed types(s).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mention above that this name needs bikeshedding. I don't like Owned, because it really isn't accurate; this is a different representation. My preference might be to call this version Avar, and call the read-only version something like AvarView, but I don't love that either, and I suspect there's something better out there; for the purpose of this example I didn't want to suddenly change the name of the type we had already introduced.

of each table; my hunch is that this will end up being a much simpler API in the
general case. In most cases, the user will be using one of these two sets of
types, and in the advanced cases (e.g. server-side subsetting, instancing
variable fonts) it still shouldn't be that complex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we elaborate on how server-side subsetting would work, e.g. how would wanting to efficiently produce many subsets from the same readonly instance of a large font, potentially concurrently, play out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do this justice I would need to have a better understanding of how subsetting is currently implemented. I think this would be a useful exercise, though, and would help me better understand some of the possible constraints. If there's any public documentation someone could point me to, that would be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Faster Horse design doc is a good overview. Scaling primarily based on the number of glyphs in the output is particularly key, anything that scales significantly with glyphs in input will fare poorly for CJK and that is a critical use case for us.

https://github.com/harfbuzz/harfbuzz/blob/main/docs/repacker.md covers the ugliest part.

@garretrieger any other key refs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the implementation, it would also be useful to have a sense of how this is deployed. For instance it might make sense to have some kind of long-lived 'SubsettingContext' that handles multiple requests, and reuses allocations, or it might make sense to use an arena allocator. If we wanted to support cases like this it might be worth thinking about that now?

@rsheeter rsheeter requested a review from behdad January 19, 2022 21:47
Copy link
Contributor

@rsheeter rsheeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and ofc we can continue to accept changes as subsequent PRs. Let's leave a few days for others to review before merging.

@madig
Copy link

madig commented Jan 19, 2022

I think using builders would be fine. fontTools has a builder module at https://github.com/fonttools/fonttools/blob/main/Lib/fontTools/fontBuilder.py and some tables like glyf and loca have to be generated, as noted, in tandem. ufo2ft can be roughly seen as a collection of builders, too (see https://github.com/googlefonts/ufo2ft/blob/32fa83c379402c8ac97f34a47f2d13edf11f87ab/Lib/ufo2ft/outlineCompiler.py#L118-L162). Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do. That'd help with making prototypes. I mean, you can also have Rust data types and builders.

I'd love to see the performance diff (if any!) when compiling a large bunch of UFOs like in https://github.com/madig/noto-amalgamated. I keep hearing the words "data-oriented programming" in my head but wouldn't know how it would look like in this case. Maybe it's not even relevant if 65k glyphs is the most you'll have to deal with!

representations of the data. In practice I don't mind this too much. One way to
think about this is that these data structures are basically just another form
of builder, that happen to be reusable and that maintain some intermediate
state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this distinction already between fonttools-rs ("structures you want to manipulate") and otspec ("structures you want to serialize/deserialize") and it seems to work nicely. With this plan, the shaping engine can operate on "otspec"-level structures, and the font compiler/subsetter/etc. can continue to operate on "fonttools-rs"-level structures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I think that's a good model; we would be combining those two approaches in a way where all the underlying scalar types were shared between both versions, and there would be easy conversion back and forth.


- what tables have the weirdest edge cases / pose the biggest potential
challenges, either for reading or writing? For instance serializing `loca` may
depends on contents of `glyf`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, scanning the fonttools-rs code for subtables which don't use macros will show these up pretty quickly. (You can probably find them easily by grepping the comments for swearwords.)

All the variation tables require the axis count from fvar. Anchor subtables are internally format-switching; ValueRecords are a nightmare, they're format-switching with the formats stored externally, sometimes two subtables upstream; MarkBasePosFormat1 relies on an external count; things like cvt need to know their own length; hmtx needs to communicate with hhea and similar for the vertical equivalent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, I was hoping for a list like this, I do think this points to a very good place to start.

@behdad
Copy link

behdad commented Jan 20, 2022

Reading this discussion brings a lot of deja vu feelings, given that I've thought about this for a good 15+ years. I volunteered to Rod to make a presentation of how HarfBuzz handles these issues, where it fails, what pitfalls there are, etc. I can do it next week. I think a two-hour slot would be adequate. Your list of questions at the end make for a nice agenda.

@cmyr
Copy link
Member Author

cmyr commented Jan 20, 2022

@behdad that would be invaluable.

@dfrg
Copy link
Member

dfrg commented Jan 20, 2022

Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do.

I’d love to see actual examples of this. My suspicion is that patching of this kind is generally not done in bulk or at high frequency like compilation and subsetting so is less sensitive to performance. I would also lean toward the builder model if this holds true.

@behdad
Copy link

behdad commented Jan 20, 2022

Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do.

POD means then you have to compile the whole table, which has the offset-overflow NP-hard problem. You lose everything as soon as you hit that problem. You can't "simply" move back and forth between the two representations.

@behdad
Copy link

behdad commented Jan 20, 2022

BTW, I feel like allsorts is not being mentioned / studied enough here.
https://github.com/yeslogic/allsorts
https://github.com/yeslogic/fathom/blob/main/fathom/examples/formats/opentype.txt

@behdad
Copy link

behdad commented Jan 20, 2022

@rsheeter
Copy link
Contributor

BTW, I feel like allsorts is not being mentioned / studied enough here.

+1, it's one of the most unique attacks on the problem from the little I know of it.

I think maybe there is a missing step 0: review existing related implementations and write some notes about each.

@madig
Copy link

madig commented Jan 20, 2022

Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do.

I’d love to see actual examples of this. My suspicion is that patching of this kind is generally not done in bulk or at high frequency like compilation and subsetting so is less sensitive to performance. I would also lean toward the builder model if this holds true.

Honestly, the scripts for direct binary font manipulation are things that run quickly enough in Python so far. Even if every single change in a Rust library would recompute the entire table, it would probably still be faster. Think adding or changing entries in the name table, taking out the kern feature from the DFLT script for a Deva font to stop Adobe apps from applying kerning twice, maybe adding the overlap bit to stuff in the glyf table, messing with vertical metrics in OS/2 and hhea, adding in a STAT table (and modifying the name table in the process), compressing to WOFF(2), that kind of stuff. For subsetting, we still use pyftsubset; one day I want to switch over to HarfBuzz.

POD means then you have to compile the whole table, which has the offset-overflow NP-hard problem. You lose everything as soon as you hit that problem. You can't "simply" move back and forth between the two representations.

Mh, yes. I suppose what I want to say is, it would be nice if you could take e.g. the name table, easily add something to it or change it and then write it back, that'd be nice. Maybe with a builder pattern where you could start with a given data and then go NameTableBuilder::from(data).addName(...) or something, but maybe that can easily be built on top of a more low-level read/write API without POD stuff.

@dfrg
Copy link
Member

dfrg commented Jan 20, 2022

BTW, I feel like allsorts is not being mentioned / studied enough here.

+1, it's one of the most unique attacks on the problem from the little I know of it.

I think maybe there is a missing step 0: review existing related implementations and write some notes about each.

+1 for more comprehensive notes but I can give a quick rundown based on some familiarity with these projects.

Internally, ttf-parser is very similar to pinot with the exception that it favors sequential reads over using a lot of magic numbers as offsets. The public API is higher level and more similar to swash. It doesn’t provide direct access to parsed table data and doesn’t do any writing.

Allsorts does writing, but, notably, does not appear to subset the layout tables, nor does it seem capable of writing them aside from copying the blobs. The parsing is less adhoc than ttf-parser and pinot, but it is not zero copy. This project is really interesting as a shaper, but less so IMO as a reference for parsing/compiling.

@rsheeter
Copy link
Contributor

For subsetting, we still use pyftsubset; one day I want to switch over to HarfBuzz.

Would recommend. Our experience is that it's (more than) two orders faster for (less than) an order less resources.

@simoncozens
Copy link
Contributor

simoncozens commented Jan 20, 2022

Oh: and even those projects which subset, like allsorts, last I checked did not do a thing about variable fonts. Some of the ItemVariationStore and TupleVariationStore subtables are extremely gnarly. (fonttools-rs both reads and writes them.)

@dfrg
Copy link
Member

dfrg commented Jan 20, 2022

Thanks for the examples, Nikolaus. It seems like the smaller, more targeted transforms like flipping bits in the glyf table and disabling features (without removing them) would be good candidates for a supplemental library and probably shouldn’t influence overall design.

@madig
Copy link

madig commented Jan 20, 2022

For subsetting, we still use pyftsubset; one day I want to switch over to HarfBuzz.

Would recommend. Our experience is that it's (more than) two orders faster for (less than) an order less resources.

Yes, but I ran into crashers and CLI issues whenever I tried it a while back (all reported IIRC). Haven't checked recently, though.

supplemental library

Yes, I think that's fine and fontTools will continue to exist anyway :) I'm more interested in compiling my fonts on all my 16 cores simultaneously in under a second and inside the L1 cache.

@cmyr
Copy link
Member Author

cmyr commented Jan 21, 2022

Plain old data structures may be nicer for spot-manipulation (insert or change something in the name table, add or remove a feature from the feature list in GPOS), but maybe rebuilding the whole table is ok, too. Maybe I can rummage through our scripts to see what kind of manipulation we often do.

POD means then you have to compile the whole table, which has the offset-overflow NP-hard problem. You lose everything as soon as you hit that problem. You can't "simply" move back and forth between the two representations.

This sounds like a very important point. There are various other similar possible optimizations we won't be able to do if we're going the POD route (doing tricky things like using a single coverage table for multiple subtables?)

@rsheeter
Copy link
Contributor

Yes, but I ran into crashers and CLI issues whenever I tried it a while back

If that is still true, particularly the crashes, @garretrieger will likely be interested :)

@simoncozens
Copy link
Contributor

simoncozens commented Jan 21, 2022

There are various other similar possible optimizations we won't be able to do if we're going the POD route (doing tricky things like using a single coverage table for multiple subtables?)

The opposite, I think. Consider the case where you're adding a glyph to the coverage of a positioning table. You'd better decompile and recompile the whole table, because otherwise, if that coverage table was previously shared with another lookup, you just changed the effect of that lookup too. And if it wasn't shared before, it may be sharable now.

@garretrieger
Copy link
Contributor

Yes, but I ran into crashers and CLI issues whenever I tried it a while back

If that is still true, particularly the crashes, @garretrieger will likely be interested :)

The crashes you reported have all been fixed, and the overflow resolution has been significantly improved since then as well. As an added bonus the updated resolver can now pack complex GSUB/GPOS tables with lots of extension lookups smaller than fonttools can now.

@dfrg
Copy link
Member

dfrg commented Jan 21, 2022

I’d like to propose that we make the read functionality core only, feature gate the writing and only take on the alloc dependency when that feature is enabled. This gives us a statically checked constraint that read is zero copy, smaller code size for consumer only scenarios, and simplifies compilation to WASM. I thought I’d bring it up early because trying to retrofit this can be painful.

@rsheeter
Copy link
Contributor

read functionality core only, feature gate the writing

+1 though there is a limit to how much pain this is worth. Hopefully as long as it's introduced early not too much.

simplifies compilation to WASM

Is there a specific reason the writing side seems likely to kerplode in WASM? - wanting to write, maybe to subset for pdf, in wasm seems pretty reasonable.

@rsheeter
Copy link
Contributor

@cmyr I think we could merge this whenever you feel ready. Comments are more refininement than massive change and we can continue to take PRs to update as we have new ideas or learn more.

@dfrg
Copy link
Member

dfrg commented Jan 24, 2022

Is there a specific reason the writing side seems likely to kerplode in WASM? - wanting to write, maybe to subset for pdf, in wasm seems pretty reasonable.

Oh, I was unclear. Writing is perfectly fine and I think would be quite valuable if available as WASM. I'm trying to specifically avoid the std dependency which is what carries potential complications. The story here is getting better all the time, so it may not be an issue.

@cmyr
Copy link
Member Author

cmyr commented Jan 24, 2022

I’d like to propose that we make the read functionality core only, feature gate the writing and only take on the alloc dependency when that feature is enabled. This gives us a statically checked constraint that read is zero copy, smaller code size for consumer only scenarios, and simplifies compilation to WASM. I thought I’d bring it up early because trying to retrofit this can be painful.

I think this makes sense, and I think trying to support the parse-only case as efficiently as possible is a good goal.

Also: I'm personally finding that the term 'zero-copy ' doesn't exactly sit right for me, for this project, and I want to make sure I'm not misunderstanding anything. To me, to be zero-copy we would have to be having all access of data in the font be the equivalent of pointer reads (indexing into a rust slice being functionally equivalent) but because of endianness we necessarily need to copy individual bytes in order to generate the scalars. So we do zero allocation, and we support random access (you can grab an element out of the middle of an array) but we do always need to copy at the leafs of the tree. Does this sound right to other people?

@cmyr I think we could merge this whenever you feel ready. Comments are more refininement than massive change and we can continue to take PRs to update as we have new ideas or learn more.

Sounds good, I'll do one more pass to incorporate any remaining feedback tomorrow but then I think this is a reasonable checkpoint.

@dfrg
Copy link
Member

dfrg commented Jan 24, 2022

Also: I'm personally finding that the term 'zero-copy ' doesn't exactly sit right for me, for this project, and I want to make sure I'm not misunderstanding anything.

In principle, I agree with you. I believe the term originates in avoiding copying buffers between user and kernel space which is pretty far removed from what we're talking about. For our purposes, I think it's reasonable to define zero copy as "all scalar value loads generate a mov+bswap (AArch64: LDR+REV) sequence" which is the best we can do, but I'm happy with using more precise terminology if that's the general sentiment here.

By this definition, pinot (and swash) are not zero copy as they check bounds for almost every field access. We can definitely do better.

@cmyr
Copy link
Member Author

cmyr commented Jan 24, 2022

That's a good goal, and we should try and think of ways we might achieve it without sacrificing safety. At the very least we could have unsafe _unchecked variants of various accessor functions, and require the consumer to do bounds checking up front. More ambitiously, I wonder if we might be able to use the type system to encode the validity of bounds in some cases? I'm thinking of some sort of token type that would be required to make certain method calls, and acquiring that token would involve checking some upper bound. This feels like it might be beyond the capacity of Rust's type system at the moment.

Certainly wherever possible we should encourage the use of iteration-based APIs, since these can trivially perform a single bounds check up front that is valid for all the individual accesses.

cmyr and others added 2 commits January 24, 2022 13:43
Co-authored-by: Chad Brokaw <cbrokaw@gmail.com>
@rsheeter
Copy link
Contributor

I think it's reasonable to define zero copy as "all scalar value loads generate a mov+bswap (AArch64: LDR+REV) sequence" which is the best we can do

+1 to crisply defining what we mean by zerocopy. @dfrg I propose you PR a definitions section, even if for now it's just zerocopy, to either the root README or a new md.

@behdad
Copy link

behdad commented Jan 24, 2022

To me zerocopy simply means that if a font object is n bytes, we consume o(n) bytes to read from it. That is, we don't consume O(n) or more to process it.

@behdad
Copy link

behdad commented Jan 24, 2022

To me zerocopy simply means that if a font object is n bytes, we consume o(n) bytes to read from it. That is, we don't consume O(n) or more to process it.

With that definition, neither endianness conversion nor bounds-checking violate zerocopyness.

@dfrg
Copy link
Member

dfrg commented Jan 24, 2022

That's a good goal, and we should try and think of ways we might achieve it without sacrificing safety.

I gave up on the HarfBuzz style design when originally working on swash due to the safety issues, but I have some ideas about how to maintain safety now...

At the very least we could have unsafe _unchecked variants of various accessor functions, and require the consumer to do bounds checking up front. More ambitiously, I wonder if we might be able to use the type system to encode the validity of bounds in some cases? I'm thinking of some sort of token type that would be required to make certain method calls, and acquiring that token would involve checking some upper bound. This feels like it might be beyond the capacity of Rust's type system at the moment.

It's probably not completely representable in the type system, but I do see this pattern as being the solution. We'll need to carry some sort of context for dealing with offsets when walking a graph anyway and I believe this can also act as a safety token in some cases. For the most part, we should only need to bounds check when traversing an edge.

Certainly wherever possible we should encourage the use of iteration-based APIs, since these can trivially perform a single bounds check up front that is valid for all the individual accesses.

Agree completely on this.

@dfrg
Copy link
Member

dfrg commented Jan 24, 2022

+1 to crisply defining what we mean by zerocopy. @dfrg I propose you PR a definitions section, even if for now it's just zerocopy, to either the root README or a new md.

Will do.

To me zerocopy simply means that if a font object is n bytes, we consume o(n) bytes to read from it. That is, we don't consume O(n) or more to process it.

I like this as a more general definition that gives us some breathing room wrt to bounds checking. I think this makes sense as the hard constraint with the load+swap case as desirable when possible.

@cmyr
Copy link
Member Author

cmyr commented Jan 25, 2022

Okay, I'll merge this shortly.

Some last notes:

There are various other similar possible optimizations we won't be able to do if we're going the POD route (doing tricky things like using a single coverage table for multiple subtables?)

The opposite, I think. Consider the case where you're adding a glyph to the coverage of a positioning table. You'd better decompile and recompile the whole table, because otherwise, if that coverage table was previously shared with another lookup, you just changed the effect of that lookup too. And if it wasn't shared before, it may be sharable now.

I've been thinking about this a bunch, and I think these are both good points. In the first case, actual manual access to everything seems important, and then the second case feels more like a traditional compilation problem. Doing only the second approach means the first is impossible, but doing the first doesn't preclude also supporting the second.

also @dfrg just as a fun aside, the following two functions generate the same asm on x86 & aarch64:

pub unsafe fn manual_read_u16(buf: &[u8], offset: usize) -> u16 {
    #[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))]
    return (*buf.get_unchecked(offset) as u16) << 8 | *buf.get_unchecked(offset + 1) as u16;

    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    (*(buf.as_ptr().add(offset) as *const u16)).swap_bytes()
}

pub unsafe fn boring_read_u16(buf: &[u8], offset: usize) -> u16 {
    u16::from_be_bytes(buf.get_unchecked(offset..offset + 2).try_into().unwrap())
}

@behdad
Copy link

behdad commented Jan 25, 2022

also @dfrg just as a fun aside, the following two functions generate the same asm on x86 & aarch64:

pub unsafe fn manual_read_u16(buf: &[u8], offset: usize) -> u16 {
    #[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))]
    return (*buf.get_unchecked(offset) as u16) << 8 | *buf.get_unchecked(offset + 1) as u16;

    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    (*(buf.as_ptr().add(offset) as *const u16)).swap_bytes()
}

pub unsafe fn boring_read_u16(buf: &[u8], offset: usize) -> u16 {
    u16::from_be_bytes(buf.get_unchecked(offset..offset + 2).try_into().unwrap())
}

Yeah we also found that compilers are smart enough to take care of such opitmizations.

@cmyr cmyr merged commit d670cb6 into main Jan 25, 2022
@cmyr cmyr deleted the font-table-types branch January 25, 2022 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants