BallTree Periodicity decompose a signal in order to find its principal period.
The interesting part here relies on the fact it does not use harmonics to do so and therefore,
is resilient to time shift.
The idea is to use BallTree to 'fuzzy index' portions of the signal into clusters.
NB: those portions are defined by local extremas of the signal.
We can then use Hidden Markov Model to find the most likely sequence of cluster which is the principal period.
The distance metric
In order to cluster together parts of the signal, we need to define a distance between 2 curve portion.
Here a and b.
Then, we align the 2 segments to the left and to the right and compute the 2 integrals of the difference.
The average of both will provide the similarity metric.
Note that the segments might not have the same length;
In that case, the non overlaping portion of the smaller segment is considered as being 0.
As a result, a minimized distance would be equivalent to having the same shape on those 2 curve portions.
Clustering
Using the distance defined above,
We can now apply BallTree to get a tag for each portion of the curve.
This tag sequence is finally parsed by a HMM and the most likely state will be a fix point in the principal period.
Final result
Integral computation
In practice the integral is approximated. At the moment we just sum 10 points on the segment.
This approximation is valid because segments have maximum one variation by construction.
However if someone know how to shift an interpolation on the time axis, it would be a great contribution.
Handling extremas
In practice the algorithm run 2 times: on minimas and on maximas.
The cluster are then joined by common component since they have same index, it is valid because:
- We do not capture constant portions.
- The interpolation is continuous.
We can also note it introduce robustness to amplitude shift.
The results are quite good, allowing to detect periodicity on really noisy signals.
But not only the principal period also patterns in the signal.
We can note the importance of the tolerance parameter used for fuzzy indexing as well and the interpolation.
A future update could allow usage of an external interpolation.