[ENH] Add Dynamic Alphabet Sizes for SFA #2844

patrickzib · 2025-05-26T12:29:18Z

This PR introduces the concept of dynamic alphabet sizes to SFA.

The alphabet size is used as a budget and assigned over all coefficients to maximize tightness of lower bound. Alphabet sizes are assigned proportional to the variance using three 3 strategies:

Linear-proportional to variance
Sqrt-proportional to variance
Log2-proportional to variance

Illustration

Example with Alphabet Sizes [4, 4, 2, 2] and variance-based feature selection:

Example

E.g. Example for word length of 4 using 4 each, we have a budget of 16=4*4:

Prior to this PR the alphabet has to be fixed for each coefficient: [a-d, a-d, a-d, a-d] = [4, 4, 4, 4] = 16
Now, the number of symbols gets assigned based on importance: [a-h, a-d, a-d, a-b] = [8, 4, 4, 2] = 16

CD-Diagram for (average) alphabet-size 64

Experiments

Using this kind of assignment is most beneficial for smaller alphabet sizes. TLB results (larger is better) show that for 2 to 8 alphabet sizes large improvements can be observed.

Average Symbols	2	4	8	16	32	64	128	256
SFA	37.515	56.694	69.425	77.726	82.2309	85.6476	86.8577	87.5971
SFA+Linear	48.474	63.373	72.769	79.669	83.8591	86.0971	87.1459	87.6656
SFA+Log	46.017	60.966	72.265	79.352	83.8492	85.9773	87.075	87.628
SFA+Sqrt	44.958	60.841	71.268	79.280	83.6312	86.0426	87.1275	87.6555
iSAX	28.025	43.014	54.823	62.948	69.5433	75.366	78.3346	80.1139

aeon-actions-bot · 2025-05-26T12:29:42Z

Thank you for contributing to `aeon`

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I would have added the following labels to this PR based on the changes made: [ $\color{#5209C9}{\textsf{distances}}$, $\color{#41A8F6}{\textsf{transformations}}$ ], however some package labels are already present.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

Run pre-commit checks for all files
Run mypy typecheck tests
Run all pytest tests and configurations
Run all notebook example tests
Run numba-disabled codecov tests
Stop automatic pre-commit fixes (always disabled for drafts)
Disable numba cache loading
Push an empty commit to re-run CI checks

baraline

Minor comments didn't pick up anything major, otherwise lgtm

baraline · 2025-07-03T05:08:17Z

aeon/distances/tests/test_symbolic_mindist.py

+    X_test = zscore(X_test.squeeze(), axis=1)
+    histogram_type = "equi-width"
+
+    # print("Testing")


Left over comment

baraline · 2025-07-03T05:10:13Z

aeon/transformations/collection/dictionary_based/_sfa_fast.py

+alphabet_allocation_methods = {
+    "linear_scale",
+    "log_scale",
+    "sqrt_scale",
+}


Ideally, you would use this list in testing by importing it so it can reflect new potential future additions

baraline · 2025-07-03T05:13:09Z

aeon/transformations/collection/dictionary_based/_sfa_fast.py

+                normed_scale = variance / variance.mean()
+            elif self.alphabet_allocation_method == "log_scale":
+                variance = np.log2((self.dft_variance[self.support]) + 1)
+                normed_scale = variance / variance.mean()


Minor but you could put normed scale after the if conditions if it happens in all of them.

patrickzib added 2 commits May 26, 2025 14:07

add dynamic alphabet allocation

002a750

update default

21dffb7

patrickzib requested a review from baraline May 26, 2025 12:29

patrickzib self-assigned this May 26, 2025

patrickzib requested review from MatthewMiddlehurst, chrisholder and TonyBagnall as code owners May 26, 2025 12:29

patrickzib added the similarity search Similarity search package label May 26, 2025

aeon-actions-bot bot added the enhancement New feature, improvement request or other non-bug code enhancement label May 26, 2025

patrickzib added 9 commits May 26, 2025 14:36

fix test for availability of variable

57c5b77

bugfix

bdffc39

bugfix

0ee4ec9

extend test case and fix sfafast

884b4b7

change logic

ebc6321

refactor code

598a0de

refactor code

aaa7412

change default

4204182

tidy up code

a02b5f6

patrickzib added the distances Distances package label Jun 4, 2025

patrickzib added 6 commits June 4, 2025 13:11

Merge branch 'main' into PS/SFA

b1dc4dd

updates

3e9c1ea

Merge branch 'main' into PS/SFA

5271826

Merge branch 'main' into PS/SFA

634b756

Merge remote-tracking branch 'origin/PS/SFA' into PS/SFA

44ce111

refactor

0e8cf70

baraline reviewed Jul 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Add Dynamic Alphabet Sizes for SFA #2844

[ENH] Add Dynamic Alphabet Sizes for SFA #2844

Uh oh!

patrickzib commented May 26, 2025 •

edited

Loading

Uh oh!

aeon-actions-bot bot commented May 26, 2025

Uh oh!

baraline left a comment

Uh oh!

baraline Jul 3, 2025

Uh oh!

baraline Jul 3, 2025

Uh oh!

baraline Jul 3, 2025

Uh oh!

Uh oh!

[ENH] Add Dynamic Alphabet Sizes for SFA #2844

Are you sure you want to change the base?

[ENH] Add Dynamic Alphabet Sizes for SFA #2844

Uh oh!

Conversation

patrickzib commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Illustration

Example

CD-Diagram for (average) alphabet-size 64

Experiments

Uh oh!

aeon-actions-bot bot commented May 26, 2025

Thank you for contributing to aeon

PR CI actions

Uh oh!

baraline left a comment

Choose a reason for hiding this comment

Uh oh!

baraline Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

baraline Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

baraline Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

patrickzib commented May 26, 2025 •

edited

Loading

Thank you for contributing to `aeon`