Skip to content

Conversation

@ShanaryS
Copy link
Collaborator

@ShanaryS ShanaryS commented Dec 15, 2025

This eliminates the runtime dependency of Anki (though if it's available, it will be used to automatically keep cache in sync). This gives a massive speedup since we no longer need to query Anki as well as Yomitan for the Anki cards (especially sentence cards). Support for local tokens is not exposed to the user in this PR.

  • The IndexedDB is accessed through DictionaryProvider (works the same as SettingsProvider).
  • There is 3 stores used:
    • meta: Use for tracking the current and previous builds to know if settings changed or build is in process
    • tokens: Stores each word with their lemmas status, etc
    • ankiCard: Anki tokens derive their status from this store as we need to track other things like suspended status
  • Cache is only re-built when a card/note modification time changes which covers review, suspend, edit, or added changes.
    • During playback we still use the existing method to detect new cards as checking modified times for all cards frequently is not desirable, Anki does not allow querying cards > modifiedAt. This means we will only trigger a cache build once for a given card during a playback session (though subsequent card triggers will update everything). This covers the main case of collecting a token but tracking more is too inefficient.
  • Builds are guarded by a buildId so concurrent builds are prevented even in case of shutdowns with no cleanup. Build will expire when completed or if more than 5 minutes has passed since the last update (every few seconds).
  • Builds can happen while it's being queried during playback, updated tokens will be sent (if build was not manually triggered) so that they can be recolored.
  • On settings change for a track, it will be cleared next build before being re-populated. If a build is interrupted before completion, it will resume where it left off (and any changes since).
  • Triggers were added so that when an Anki card is exported/updated from SubtitlePlayer, it triggers a cache build immediately. This already existed if it was initiated from the extension (such as keybind).
  • The settings tab will display helper text if the user changes settings that will require a cache rebuild as well as build progress (build status messages are exposed to translations).
  • Reading annotations can be enabled independent of coloring, Always and Never won't trigger cache builds for that track (unless coloring is enabled).
  • Added ability to restrict tracks per deck in case multiple unrelated decks share the same note type.
  • Disabling Anki is achieved by keeping the fields empty which will clear that track from the db on the next build. Disabled tracks are kept in the db indefinitely.
  • Yomitan will require the version with the optimizations in place for tokenize, this significantly improves things and should not be a problem since this feature is unreleased and extensions should auto update.
  • When settings are updated, SubtitleColoring will now only reset the cache on settings that affect it, rather than all. It will now also clear the richText that's currently being displayed if the user turns off the features as we rely on it being enabled to update it.
  • We will now color more intelligently. The first request will only color 10 events, then subsequent requests can be up to 100. They will only be triggered once the user is about to need another batch, rather than single events on each new showing subtitle.
  • Local tokens are not per track unlike Anki, this makes it easier for users to switch between tracks and have local tokens behave consistently. If per track local tokens are desired, it can be trivially added as an opt in later.
  • Priority for tokens is Local > Anki word > Anki sentence
  • Added the unlimitedStorage permission which should keep the DB persisted, preventing users from losing local tokens.
  • Did not add offline resume, that is not requiring Anki to finish building if it was interrupted. This would only matter for first time builds.
  • The limitations that still exist are inherent to this problem domain:
    • Homographs can't be dealt with: bat (animal) and bat (baseball) cannot be separated (obviously just a fundamental limitation with written language). Would need AI for context which is a non-starter.
    • Words that are false friends, same spelling but different meaning across languages. Not worth addressing but mitigated by filtering by decks.

@ShanaryS ShanaryS self-assigned this Dec 15, 2025
@ShanaryS ShanaryS added the enhancement New feature or request label Dec 15, 2025
@ShanaryS ShanaryS force-pushed the dictionary-db branch 2 times, most recently from 842fba2 to 78bad00 Compare December 17, 2025 04:36
Copy link
Owner

@killergerbah killergerbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey really appreciate the patience. I'm still trying to read and understand everything, so I've left some high level feedback for now. Let me know what you think.

@ShanaryS ShanaryS force-pushed the dictionary-db branch 6 times, most recently from 372ab82 to c07bd96 Compare December 30, 2025 04:19
Copy link
Owner

@killergerbah killergerbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey thanks again for the patience. I did another pass and mostly made suggestions to improve readability. I know some of this is probably just stylistic difference, but there's a lot of code with complex logic which I think can be made more readable with low cost. Mostly worried about needing to understand this code when making changes months from now.

.map((track) => `#${track + 1}`)
.join(', '),
})}`;
msg += ` | ${t('settings.dictionaryBuildModifiedCards', { numCards: numUpdatedCards.toLocaleString('en-US') })}`;
Copy link
Owner

@killergerbah killergerbah Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can localization of the progress strings be confined to the dictionary settings form instead? This would require you to model the progress/errors via a data structure but I think this is probably not so difficult as there's a limited number of types of status updates.

While the refactor might be kind of annoying, keeping the front-end and logical concerns separate as a matter of principle will will give us more flexibility to render the progress in different ways on the settings form, without a dependency on this code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That very flexibility was what I was trying to avoid. Some messages only makes sense to be displayed in certain states and constructing that seems like it would be complicated. Also some messages don't have translations if it's a random error. We also would need to handle translations in SubtitleColoring as it can log errors whenever tokens updates.

I still think it's worth doing though. I'm thinking of returning an object instead where the keys correspond to the translation key and values to the translation params. As well as a separate field for unspecified errors.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you provide both English string + the translation key, you could use the English string for logging in SubtitleColoring. The translation key could be used for display on the settings form.

Copy link
Owner

@killergerbah killergerbah Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But instead of translation key, maybe provide an an enum? That will give us some decoupling of the progress/error type from the loc file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get a design that I'm happy with so I think it's best if you make the changes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed, let me know if anything looks weird to you

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, the only issue I noticed is that the elapsed time changes when a build is done and the button is pressed. I think this lies in how the UI is calculating it but it's not a big deal.

I also re-review the rest of the PR as well. Earlier your comment about dictionaryProvider was accurate, we could use DictionaryStorage directly. I'm not sure if it's worth updating though. I could see it being useful for statistics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants