Deduplicate Transliterator VarTables across directions #3646
Labels
A-performance
Area: Performance (CPU, Memory)
C-transliterator
Component: transliterator
C-unicode
Component: Props, sets, tries
S-medium
Size: Less than a week (larger bug fix or enhancement)
T-enhancement
Type: Nice-to-have but not required
Milestone
Existing data struct draft: #3627
Transform rule source files can specify a direction attribute, and in the case of
direction: "both"
, such source files A-B.xml define both forward (A-B) and backward (B-A) transliterators. Bidirectional rule files can have rules that affect only A-B, only B-A, or both A-B and B-A.How should we create data for bidirectional sources?
Discussed with @skius, @sffc, @Manishearth, @eggrobin, @younies:
Proposal: Data structs only store the forward direction, so bidirectional sources get compiled into two data structs (thus any shared data is duplicated). Datagen requires the user to specify the explicit transliterator including the direction (through ID syntax, not through “forward”/”backward” notation) they want, anything else (except transitive dependencies) does not get included. Future enhancement to avoid VarTable duplication is separating out the VarTable into a separate data key and referring to that from the transliterator data structs. Runtime also only accepts the direction implicitly through the bcp47 ID.
LGTM: @Manishearth @younies @eggrobin @sffc @skius
@robertbastian thoughts?
The text was updated successfully, but these errors were encountered: