feat(dictionary): cache tokens in IndexedDB #844

ShanaryS · 2025-12-15T17:19:09Z

This eliminates the runtime dependency of Anki (though if it's available, it will be used to automatically keep cache in sync). This gives a massive speedup since we no longer need to query Anki as well as Yomitan for the Anki cards (especially sentence cards). Support for local tokens is not exposed to the user in this PR.

The IndexedDB is accessed through DictionaryProvider (works the same as SettingsProvider).
There is 3 stores used:
- meta: Use for tracking the current and previous builds to know if settings changed or build is in process
- tokens: Stores each word with their lemmas status, etc
- ankiCard: Anki tokens derive their status from this store as we need to track other things like suspended status
Cache is only re-built when a card/note modification time changes which covers review, suspend, edit, or added changes.
- During playback we still use the existing method to detect new cards as checking modified times for all cards frequently is not desirable, Anki does not allow querying cards > modifiedAt. This means we will only trigger a cache build once for a given card during a playback session (though subsequent card triggers will update everything). This covers the main case of collecting a token but tracking more is too inefficient.
Builds are guarded by a buildId so concurrent builds are prevented even in case of shutdowns with no cleanup. Build will expire when completed or if more than 5 minutes has passed since the last update (every few seconds).
Builds can happen while it's being queried during playback, updated tokens will be sent (if build was not manually triggered) so that they can be recolored.
On settings change for a track, it will be cleared next build before being re-populated. If a build is interrupted before completion, it will resume where it left off (and any changes since).
Triggers were added so that when an Anki card is exported/updated from SubtitlePlayer, it triggers a cache build immediately. This already existed if it was initiated from the extension (such as keybind).
The settings tab will display helper text if the user changes settings that will require a cache rebuild as well as build progress (build status messages are exposed to translations).
Reading annotations can be enabled independent of coloring, Always and Never won't trigger cache builds for that track (unless coloring is enabled).
Added ability to restrict tracks per deck in case multiple unrelated decks share the same note type.
Disabling Anki is achieved by keeping the fields empty which will clear that track from the db on the next build. Disabled tracks are kept in the db indefinitely.
Yomitan will require the version with the optimizations in place for tokenize, this significantly improves things and should not be a problem since this feature is unreleased and extensions should auto update.
When settings are updated, SubtitleColoring will now only reset the cache on settings that affect it, rather than all. It will now also clear the richText that's currently being displayed if the user turns off the features as we rely on it being enabled to update it.
We will now color more intelligently. The first request will only color 10 events, then subsequent requests can be up to 100. They will only be triggered once the user is about to need another batch, rather than single events on each new showing subtitle.
Local tokens are not per track unlike Anki, this makes it easier for users to switch between tracks and have local tokens behave consistently. If per track local tokens are desired, it can be trivially added as an opt in later.
Priority for tokens is Local > Anki word > Anki sentence
Added the unlimitedStorage permission which should keep the DB persisted, preventing users from losing local tokens.
Did not add offline resume, that is not requiring Anki to finish building if it was interrupted. This would only matter for first time builds.
The limitations that still exist are inherent to this problem domain:
- Homographs can't be dealt with: bat (animal) and bat (baseball) cannot be separated (obviously just a fundamental limitation with written language). Would need AI for context which is a non-starter.
- Words that are false friends, same spelling but different meaning across languages. Not worth addressing but mitigated by filtering by decks.

killergerbah

Hey really appreciate the patience. I'm still trying to read and understand everything, so I've left some high level feedback for now. Let me know what you think.

common/app/components/App.tsx

common/app/components/Player.tsx

common/components/DictionarySettingsTab.tsx

common/dictionary-db/dictionary-db.ts

client/src/components/WebsiteApp.tsx

common/app/components/Player.tsx

common/components/DictionarySettingsTab.tsx

common/dictionary-db/dictionary-db.ts

common/subtitle-coloring/subtitle-coloring.ts

common/settings/settings.ts

extension/src/handlers/dictionary/dictionary-handler.ts

common/subtitle-coloring/subtitle-coloring.ts

common/dictionary-db/dictionary-provider.ts

killergerbah

Hey thanks again for the patience. I did another pass and mostly made suggestions to improve readability. I know some of this is probably just stylistic difference, but there's a lot of code with complex logic which I think can be made more readable with low cost. Mostly worried about needing to understand this code when making changes months from now.

common/components/DictionarySettingsTab.tsx

killergerbah · 2025-12-31T00:21:25Z

common/dictionary-db/dictionary-db.ts

+                    .map((track) => `#${track + 1}`)
+                    .join(', '),
+            })}`;
+            msg += ` | ${t('settings.dictionaryBuildModifiedCards', { numCards: numUpdatedCards.toLocaleString('en-US') })}`;


Can localization of the progress strings be confined to the dictionary settings form instead? This would require you to model the progress/errors via a data structure but I think this is probably not so difficult as there's a limited number of types of status updates.

While the refactor might be kind of annoying, keeping the front-end and logical concerns separate as a matter of principle will will give us more flexibility to render the progress in different ways on the settings form, without a dependency on this code.

That very flexibility was what I was trying to avoid. Some messages only makes sense to be displayed in certain states and constructing that seems like it would be complicated. Also some messages don't have translations if it's a random error. We also would need to handle translations in SubtitleColoring as it can log errors whenever tokens updates.

I still think it's worth doing though. I'm thinking of returning an object instead where the keys correspond to the translation key and values to the translation params. As well as a separate field for unspecified errors.

If you provide both English string + the translation key, you could use the English string for logging in SubtitleColoring. The translation key could be used for display on the settings form.

But instead of translation key, maybe provide an an enum? That will give us some decoupling of the progress/error type from the loc file.

I can't get a design that I'm happy with so I think it's best if you make the changes.

Pushed, let me know if anything looks weird to you

It looks good, the only issue I noticed is that the elapsed time changes when a build is done and the button is pressed. I think this lies in how the UI is calculating it but it's not a big deal.

I also re-review the rest of the PR as well. Earlier your comment about dictionaryProvider was accurate, we could use DictionaryStorage directly. I'm not sure if it's worth updating though. I could see it being useful for statistics.

common/dictionary-db/dictionary-db.ts

ShanaryS self-assigned this Dec 15, 2025

ShanaryS added the enhancement New feature or request label Dec 15, 2025

ShanaryS force-pushed the dictionary-db branch 2 times, most recently from 842fba2 to 78bad00 Compare December 17, 2025 04:36

feat(dictionary): cache tokens in IndexedDB

ef0283a

ShanaryS force-pushed the dictionary-db branch from 78bad00 to ef0283a Compare December 17, 2025 23:03

killergerbah reviewed Dec 21, 2025

View reviewed changes

ShanaryS added 2 commits December 23, 2025 21:34

use anki decks in addition to anki fields

f707163

use DictionaryProvider

7e89989

killergerbah reviewed Dec 27, 2025

View reviewed changes

ShanaryS force-pushed the dictionary-db branch 6 times, most recently from 372ab82 to c07bd96 Compare December 30, 2025 04:19

update DictionaryProvider subscription api and cleanup code

fae3e2c

ShanaryS force-pushed the dictionary-db branch from c07bd96 to fae3e2c Compare December 30, 2025 15:33

killergerbah reviewed Dec 31, 2025

View reviewed changes

ShanaryS and others added 2 commits December 31, 2025 12:23

remove _updateSuspendedCards() and cleanup code

3386bdc

Anki cache build state updates are structured

f0a4dbb

Uh oh!

feat(dictionary): cache tokens in IndexedDB #844

Are you sure you want to change the base?

feat(dictionary): cache tokens in IndexedDB #844

Uh oh!

Conversation

ShanaryS commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

killergerbah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

killergerbah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

killergerbah Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShanaryS Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

killergerbah Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

killergerbah Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShanaryS Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

killergerbah Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

ShanaryS Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShanaryS commented Dec 15, 2025 •

edited

Loading

killergerbah Dec 31, 2025 •

edited

Loading

killergerbah Jan 1, 2026 •

edited

Loading