Skip to content

Implement lazy font loading, enable embedded font extraction #146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 15, 2022

Conversation

WeebDataHoarder
Copy link
Contributor

Contains one commit (7a69b7f) from #145.

These changes extract the embedded font, and add the option to enable lazy font loading (where font resources are mapped, but not downloaded until libass tries to use them).

Additionally, fallbackFont option can be used to override the default fallback font that gets used. This way, users can specify their preferred font, including CJK ones (for example Noto Sans CJK .ttc).

Enabling lazy font loading is done via lazyFontLoading set to true, but it defaults to false.
This is necessary given some requirements from WASM to not fetch the whole file, specifically:

  • HEAD and Range request support
  • Access-Control-Expose-Headers: Accept-Ranges, Content-Length, Content-Encoding (at least)
  • No Content-Encoding set.
    If these conditions match, HEAD request will be issued and file will be accessed via Ranges as-needed. Otherwise content will be fetched as a whole file.

Additionally, this PR enables embedded font extraction support on libass side. This is split on a separate commit.

@WeebDataHoarder
Copy link
Contributor Author

WeebDataHoarder commented Jun 22, 2022

How the output directory looks like now:

-rw-r--r-- 1 root root  46K jun 23 10:58 COPYRIGHT
-rw-r--r-- 1 root root 143K jun 23 10:58 default.woff2
-rw-r--r-- 1 root root  75K jun 23 10:58 subtitles-octopus.js
-rw-r--r-- 1 root root 394K jun 23 10:51 subtitles-octopus-worker.js
-rw-r--r-- 1 root root 4,5M jun 23 10:52 subtitles-octopus-worker-legacy.js
-rwxr-xr-x 1 root root 2,2M jun 23 10:51 subtitles-octopus-worker.wasm

@WeebDataHoarder WeebDataHoarder force-pushed the lazy-font-loading branch 2 times, most recently from 74469ce to 60f4fd3 Compare June 22, 2022 09:24
@WeebDataHoarder
Copy link
Contributor Author

WeebDataHoarder commented Jun 22, 2022

Removal of --no-heap-copy is a rebase conflict, will force-push

EDIT: yep, conflict was from rebasing this onto #145, now fixed

WeebDataHoarder added a commit to WeebDataHoarder/JavascriptSubtitlesOctopus that referenced this pull request Jun 23, 2022
@WeebDataHoarder
Copy link
Contributor Author

Additional comment for --embed-file: since 3.1.3 (maybe an update could be worth it) it goes directly in memory, without efficiency loss. This is not an issue as mentioned on #145 (comment) given the few bytes that are being embedded.

@WeebDataHoarder WeebDataHoarder force-pushed the lazy-font-loading branch 4 times, most recently from 0c940b7 to cc230cb Compare June 26, 2022 15:53
@TheOneric
Copy link
Member

Commenting only on the embedded fonts part:

The js side makes a copy of the three handles at initialisation, see here:

self.ass_track = self.octObj.track;
self.ass_library = self.octObj.ass_library;
self.ass_renderer = self.octObj.ass_renderer;

I would guess those need to be updated when reiniting the handles, but it isn't immediately obvious to me where this happens.

Is the library handle, JSO is using internally, exposed to users somehow? There’s octObj.ass_library, but I'm not sure if JSO-users have access to that. If it is, some users might already be adding memory fonts or setting ass_set_extract_fonts manually. .. that is, assuming oct_add_font etc are even available to JSO users.

@WeebDataHoarder
Copy link
Contributor Author

As far as I know it's only exposed in the worker, probably for debugging reasons.

There are various spots where ass_track gets re-initialized in post-worker.js, setTrack() for example. ass_renderer currently does not get re-initialized.

Most of the oct_add_font or other stuff, is only available to direct users of the worker.js (not jso module interface).

I can update the code so these values get updated properly, or rid of them elsewhere? Even if desired, you can always inspect/grab these handles via self.octObj.ass_library in worker code / debugger.

@TheOneric
Copy link
Member

TheOneric commented Jun 29, 2022

My line of thought was that if JSO users can get the internal library the change to also clear memory fonts (and reinit all handles) needs to be documented. If no one has access to it, its an internal change and the existing high level description of "allows to change the track" or similar is sufficient for the affected API functions.

There are various spots where ass_track gets re-initialized in post-worker.js, setTrack() for example.

I might have missed something, but grepping through the code it seems like those values are actually only getting assigned to and never used? (If that’s the case: why do they even exist?)

Most of the oct_add_font or other stuff, is only available to direct users of the worker.js (not jso module interface).

Sorry, can you explain howsuch a direct use for a JSO user might look like, so that oct_* are available? I'm trying to understand if there is any more conflict potential and how the scopes here are or are not separated.

Both oct_* and octObj are mentioned in 4.0.0’s release notes and appear to have been added in #64; CC: @TFSThiagoBR98.

@WeebDataHoarder
Copy link
Contributor Author

You haven't missed something :)

The only way I see to use these APIs are by code ran from the worker side specific, and only relevant to forks, which could already add such API. To be able to use that you need to instantiate the .wasm yourself, or be within the worker scope.

There is a kind of direct way to execute code, and that is via onCustomMessage / custom worker events. Sadly these are placeholders that are not set by default (and have no method to set the runner function from user code). You would still need worker modifications to use this.

Don't know what to tell you about the intent of those methods/functions/variables being exposed, but I don't see any way to use them at the moment without modifications.

I believe this topic should probably be a discussion elsewhere, for now I'll update the code to refresh these variables.

@TheOneric
Copy link
Member

Thanks, if there's no way an user can modify the library and renderer states or add memory fonts, then there’s nothing more to worry about regarding embedded fonts. Eventhough, the actually diff ended up quite concise, the interaction with the rest of JSO and the effects aren't, so how about a more verbose commit message similar to this? (also avoiding overly long lines):

Use embedded fonts

ASS subtitles can embedd fonts with a custom encoding in
their [Fonts] section. For historic reasons ass_set_extract_fonts
defaults to disabled, so we need to explicitly opt-in.
There’s currently also an issue, with embedded fonts outliving
their track, which can lead to indefinitely growing memory consumption
if not all memory fonts are cleared on track reinit.

However, ass_clear_fonts can only be safely called if the renderer
also has been released first. At this point it is simpler to just
reinit the whole library-renderer-track triplet and library inits
are not that costly anyway.

JSO never uses (non-embedded) memory fonts itself and does not expose
any way to add them so this does not constitute a user-visible change.
Even oct_add_font cannot be used by consumers of upstream JSO binaries.

Note: the JS pointers of the libass handles are updated in this patch,
      but it appears they are actually never used.

If you can update the commit message and move the embedded font commit before the lazy loading one, I'm going to merge the embedded font commit in ~24h if rcombs or TFSThiagoBR98 don't object.


The question of how to use the interface and dead code warrants a dedicated thread indeed. Only if there would have been some way to add memory fonts there had been direct relevance to the change at hand. (I just noticed createTrackMem also seems to be unused...)

@WeebDataHoarder
Copy link
Contributor Author

I have reworded the embedded fonts commit, and moved it before lazy font loading.


On the topic of dead code, I have cleaned up a few of those and refactored a few methods to de-duplicate some code.
Might open a PR just for dead-code/feature elimination with those changes to start with.

@TheOneric
Copy link
Member

Merged embedded font commit.

@TheOneric TheOneric force-pushed the lazy-font-loading branch from 9d797b0 to c9c8a01 Compare July 3, 2022 16:25
@TheOneric TheOneric changed the title Implement lazy font loading, enable embedded font extraction Implement lazy font loading, ~~enable embedded font extraction~~ Jul 3, 2022
@TheOneric TheOneric changed the title Implement lazy font loading, ~~enable embedded font extraction~~ Implement lazy font loading, enable embedded font extraction Jul 3, 2022
@TheOneric TheOneric requested a review from rcombs October 7, 2022 21:03
Copy link
Member

@rcombs rcombs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one header change; otherwise LGTM.

Extracts default.woff2 from binary, embeds fonts.conf,
removes .data files on output.
Use --memory-init-file=0 to remove .mem files on output.
Specify fallbackFont to override default.woff2.
If lazyFileLoading is set to true, use FS.createLazyFile(). This is
off by default, as it depends on correct HTTP headers sent back.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants