-
Notifications
You must be signed in to change notification settings - Fork 10
Font table types #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b795fb4 to
a5b14f7
Compare
| The particulars of the design will be determined through this exploration, but a | ||
| few core ideas are clear. For the read-only case, the design is largely borrowed | ||
| directly from [pinot][], with the main difference being using macros for code | ||
| generation, and that approach has been successful for [fonttools-rs][]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ, what is unique to the fonttools-rs codegen? I'm under the impression macros are a widely used and well known capability in Rust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unique per se, but it is a demonstration that using macros to generate code for these sorts of data structures (which have various weird features like the offset tables and heterogeneous collections) should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the most important thing to learn from fonttools-rs/otspec is the situations where macro-based parsing hasn't been successful.
We all agree that the easy cases are easy. It's the hard cases that stress the techniques. OpenType subtables are often badly behaved: many tables are not self-contained and require data from other tables to correctly parse them (maxp/loca/glyf being one example); others have internal dependencies in how they are parsed. ValueRecords show both of these dysfunctions.
It would be very valuable to look at the bits of fonttools-rs which don't use macros, and then compare those with how harfbuzz parses those subtables.
| pub segment_maps: Vec<SegmentMapsOwned>, | ||
| } | ||
|
|
||
| struct SegmentMapsOwned(pub Vec<AxisValueMap>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "Owned" suffix a normal Rust pattern or something unique here? Are there other examples in the Rust ecosystem we might refer to and learn from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not unheard of, but I think it's more common when you're retrofitting owned types onto a reference first library which is sort of the case here. The other idiomatic approach would be SegmentMaps for the owned type and SegmentMapsRef (SegmentMapsMut) for the borrowed types(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mention above that this name needs bikeshedding. I don't like Owned, because it really isn't accurate; this is a different representation. My preference might be to call this version Avar, and call the read-only version something like AvarView, but I don't love that either, and I suspect there's something better out there; for the purpose of this example I didn't want to suddenly change the name of the type we had already introduced.
| of each table; my hunch is that this will end up being a much simpler API in the | ||
| general case. In most cases, the user will be using one of these two sets of | ||
| types, and in the advanced cases (e.g. server-side subsetting, instancing | ||
| variable fonts) it still shouldn't be that complex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we elaborate on how server-side subsetting would work, e.g. how would wanting to efficiently produce many subsets from the same readonly instance of a large font, potentially concurrently, play out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do this justice I would need to have a better understanding of how subsetting is currently implemented. I think this would be a useful exercise, though, and would help me better understand some of the possible constraints. If there's any public documentation someone could point me to, that would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Faster Horse design doc is a good overview. Scaling primarily based on the number of glyphs in the output is particularly key, anything that scales significantly with glyphs in input will fare poorly for CJK and that is a critical use case for us.
https://github.com/harfbuzz/harfbuzz/blob/main/docs/repacker.md covers the ugliest part.
@garretrieger any other key refs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the implementation, it would also be useful to have a sense of how this is deployed. For instance it might make sense to have some kind of long-lived 'SubsettingContext' that handles multiple requests, and reuses allocations, or it might make sense to use an arena allocator. If we wanted to support cases like this it might be worth thinking about that now?
rsheeter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, and ofc we can continue to accept changes as subsequent PRs. Let's leave a few days for others to review before merging.
|
I think using builders would be fine. fontTools has a builder module at https://github.com/fonttools/fonttools/blob/main/Lib/fontTools/fontBuilder.py and some tables like I'd love to see the performance diff (if any!) when compiling a large bunch of UFOs like in https://github.com/madig/noto-amalgamated. I keep hearing the words "data-oriented programming" in my head but wouldn't know how it would look like in this case. Maybe it's not even relevant if 65k glyphs is the most you'll have to deal with! |
| representations of the data. In practice I don't mind this too much. One way to | ||
| think about this is that these data structures are basically just another form | ||
| of builder, that happen to be reusable and that maintain some intermediate | ||
| state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this distinction already between fonttools-rs ("structures you want to manipulate") and otspec ("structures you want to serialize/deserialize") and it seems to work nicely. With this plan, the shaping engine can operate on "otspec"-level structures, and the font compiler/subsetter/etc. can continue to operate on "fonttools-rs"-level structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I think that's a good model; we would be combining those two approaches in a way where all the underlying scalar types were shared between both versions, and there would be easy conversion back and forth.
|
|
||
| - what tables have the weirdest edge cases / pose the biggest potential | ||
| challenges, either for reading or writing? For instance serializing `loca` may | ||
| depends on contents of `glyf`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, scanning the fonttools-rs code for subtables which don't use macros will show these up pretty quickly. (You can probably find them easily by grepping the comments for swearwords.)
All the variation tables require the axis count from fvar. Anchor subtables are internally format-switching; ValueRecords are a nightmare, they're format-switching with the formats stored externally, sometimes two subtables upstream; MarkBasePosFormat1 relies on an external count; things like cvt need to know their own length; hmtx needs to communicate with hhea and similar for the vertical equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, I was hoping for a list like this, I do think this points to a very good place to start.
|
Reading this discussion brings a lot of deja vu feelings, given that I've thought about this for a good 15+ years. I volunteered to Rod to make a presentation of how HarfBuzz handles these issues, where it fails, what pitfalls there are, etc. I can do it next week. I think a two-hour slot would be adequate. Your list of questions at the end make for a nice agenda. |
|
@behdad that would be invaluable. |
I’d love to see actual examples of this. My suspicion is that patching of this kind is generally not done in bulk or at high frequency like compilation and subsetting so is less sensitive to performance. I would also lean toward the builder model if this holds true. |
POD means then you have to compile the whole table, which has the offset-overflow NP-hard problem. You lose everything as soon as you hit that problem. You can't "simply" move back and forth between the two representations. |
|
BTW, I feel like allsorts is not being mentioned / studied enough here. |
|
Also RustyBuzz and ttf-parser: |
+1, it's one of the most unique attacks on the problem from the little I know of it. I think maybe there is a missing step 0: review existing related implementations and write some notes about each. |
Honestly, the scripts for direct binary font manipulation are things that run quickly enough in Python so far. Even if every single change in a Rust library would recompute the entire table, it would probably still be faster. Think adding or changing entries in the name table, taking out the kern feature from the DFLT script for a Deva font to stop Adobe apps from applying kerning twice, maybe adding the overlap bit to stuff in the glyf table, messing with vertical metrics in OS/2 and hhea, adding in a STAT table (and modifying the name table in the process), compressing to WOFF(2), that kind of stuff. For subsetting, we still use pyftsubset; one day I want to switch over to HarfBuzz.
Mh, yes. I suppose what I want to say is, it would be nice if you could take e.g. the name table, easily add something to it or change it and then write it back, that'd be nice. Maybe with a builder pattern where you could start with a given data and then go |
+1 for more comprehensive notes but I can give a quick rundown based on some familiarity with these projects. Internally, ttf-parser is very similar to pinot with the exception that it favors sequential reads over using a lot of magic numbers as offsets. The public API is higher level and more similar to swash. It doesn’t provide direct access to parsed table data and doesn’t do any writing. Allsorts does writing, but, notably, does not appear to subset the layout tables, nor does it seem capable of writing them aside from copying the blobs. The parsing is less adhoc than ttf-parser and pinot, but it is not zero copy. This project is really interesting as a shaper, but less so IMO as a reference for parsing/compiling. |
Would recommend. Our experience is that it's (more than) two orders faster for (less than) an order less resources. |
|
Oh: and even those projects which subset, like allsorts, last I checked did not do a thing about variable fonts. Some of the ItemVariationStore and TupleVariationStore subtables are extremely gnarly. (fonttools-rs both reads and writes them.) |
|
Thanks for the examples, Nikolaus. It seems like the smaller, more targeted transforms like flipping bits in the glyf table and disabling features (without removing them) would be good candidates for a supplemental library and probably shouldn’t influence overall design. |
Yes, but I ran into crashers and CLI issues whenever I tried it a while back (all reported IIRC). Haven't checked recently, though.
Yes, I think that's fine and fontTools will continue to exist anyway :) I'm more interested in compiling my fonts on all my 16 cores simultaneously in under a second and inside the L1 cache. |
This sounds like a very important point. There are various other similar possible optimizations we won't be able to do if we're going the POD route (doing tricky things like using a single coverage table for multiple subtables?) |
If that is still true, particularly the crashes, @garretrieger will likely be interested :) |
The opposite, I think. Consider the case where you're adding a glyph to the coverage of a positioning table. You'd better decompile and recompile the whole table, because otherwise, if that coverage table was previously shared with another lookup, you just changed the effect of that lookup too. And if it wasn't shared before, it may be sharable now. |
The crashes you reported have all been fixed, and the overflow resolution has been significantly improved since then as well. As an added bonus the updated resolver can now pack complex GSUB/GPOS tables with lots of extension lookups smaller than fonttools can now. |
|
I’d like to propose that we make the read functionality core only, feature gate the writing and only take on the alloc dependency when that feature is enabled. This gives us a statically checked constraint that read is zero copy, smaller code size for consumer only scenarios, and simplifies compilation to WASM. I thought I’d bring it up early because trying to retrofit this can be painful. |
+1 though there is a limit to how much pain this is worth. Hopefully as long as it's introduced early not too much.
Is there a specific reason the writing side seems likely to kerplode in WASM? - wanting to write, maybe to subset for pdf, in wasm seems pretty reasonable. |
|
@cmyr I think we could merge this whenever you feel ready. Comments are more refininement than massive change and we can continue to take PRs to update as we have new ideas or learn more. |
Oh, I was unclear. Writing is perfectly fine and I think would be quite valuable if available as WASM. I'm trying to specifically avoid the |
I think this makes sense, and I think trying to support the parse-only case as efficiently as possible is a good goal. Also: I'm personally finding that the term 'zero-copy ' doesn't exactly sit right for me, for this project, and I want to make sure I'm not misunderstanding anything. To me, to be zero-copy we would have to be having all access of data in the font be the equivalent of pointer reads (indexing into a rust slice being functionally equivalent) but because of endianness we necessarily need to copy individual bytes in order to generate the scalars. So we do zero allocation, and we support random access (you can grab an element out of the middle of an array) but we do always need to copy at the leafs of the tree. Does this sound right to other people?
Sounds good, I'll do one more pass to incorporate any remaining feedback tomorrow but then I think this is a reasonable checkpoint. |
In principle, I agree with you. I believe the term originates in avoiding copying buffers between user and kernel space which is pretty far removed from what we're talking about. For our purposes, I think it's reasonable to define zero copy as "all scalar value loads generate a mov+bswap (AArch64: LDR+REV) sequence" which is the best we can do, but I'm happy with using more precise terminology if that's the general sentiment here. By this definition, pinot (and swash) are not zero copy as they check bounds for almost every field access. We can definitely do better. |
|
That's a good goal, and we should try and think of ways we might achieve it without sacrificing safety. At the very least we could have unsafe Certainly wherever possible we should encourage the use of iteration-based APIs, since these can trivially perform a single bounds check up front that is valid for all the individual accesses. |
+1 to crisply defining what we mean by zerocopy. @dfrg I propose you PR a definitions section, even if for now it's just zerocopy, to either the root README or a new md. |
|
To me zerocopy simply means that if a font object is |
With that definition, neither endianness conversion nor bounds-checking violate zerocopyness. |
I gave up on the HarfBuzz style design when originally working on swash due to the safety issues, but I have some ideas about how to maintain safety now...
It's probably not completely representable in the type system, but I do see this pattern as being the solution. We'll need to carry some sort of context for dealing with offsets when walking a graph anyway and I believe this can also act as a safety token in some cases. For the most part, we should only need to bounds check when traversing an edge.
Agree completely on this. |
Will do.
I like this as a more general definition that gives us some breathing room wrt to bounds checking. I think this makes sense as the hard constraint with the load+swap case as desirable when possible. |
|
Okay, I'll merge this shortly. Some last notes:
I've been thinking about this a bunch, and I think these are both good points. In the first case, actual manual access to everything seems important, and then the second case feels more like a traditional compilation problem. Doing only the second approach means the first is impossible, but doing the first doesn't preclude also supporting the second. also @dfrg just as a fun aside, the following two functions generate the same asm on x86 & aarch64: pub unsafe fn manual_read_u16(buf: &[u8], offset: usize) -> u16 {
#[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))]
return (*buf.get_unchecked(offset) as u16) << 8 | *buf.get_unchecked(offset + 1) as u16;
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
(*(buf.as_ptr().add(offset) as *const u16)).swap_bytes()
}
pub unsafe fn boring_read_u16(buf: &[u8], offset: usize) -> u16 {
u16::from_be_bytes(buf.get_unchecked(offset..offset + 2).try_into().unwrap())
} |
Yeah we also found that compilers are smart enough to take care of such opitmizations. |
Rendered
This is a proposal for a low-level crate to define read-only and mutable versions all the various structures defined in the various font tables. There was an earlier draft of this shared with @rsheeter but I've almost completely rewritten it since then, and I think I have a much better sense of the problem space.
Let me know if there's anything in here that is unclear, and of course if there is any additional feedback. Thanks to @simoncozens and @dfrg, this write-up is mostly just me smushing together and explaining existing work from pinot and fonttools-rs.