python implementation of SAX (Symbolic Aggregate Approximation) for time series data
- Convert time series data into symbolic representation, where the (Euclidean) distance/similarity is lower bound by the distance in the symbolic space
- The symbolic representation can be viewed as a low-dim (aggregate) representation of time series
- Symbol based algorithms such as suffix-tree, markov chain can be used to analyze time-series
- paper
- website
- jmotif application
- tutorial
- R package
- Another python implementation
- GrammarVis
- GrammarVis github
- GrammarVis VSM github
- jMotif github
- SAX has certain assumptions on time-series data, such as (1) local Gaussian, (2) fixed frequence, (3) real-valued signals. We want to explore more possiblities for other data
- We want a vector representation of time-series pieces, similiar to the idea of representing words a vectors (Google's word2vec)
- we need a fast parallel implementation
examples
- sequitur will be used as the context-free grammar extractor for SAXed data
- the mined rules will be used for outlier/motif detection
- we wrap the c++ implementation for python usage - so it is just a quick workaround for now.
- three papers listed on Grammarviz website
- sequitur site
- another python sequitur implementation
- java implementation can be found in grammarviz2 implementation
- download c++ code http://sequitur.info/latest/sequitur.tgz
- put the
sequitur
code from the uncompressed folder in a convienet place - use the pysequitur package and pass the path to
sequitur
as constructor parameter
- to make it easier to use with pysax
- we understand that the c++ implementation treats rule terminals as single characters, whereas in pysax we are dealing with words, so we need to map the words to single characters first - this might change in future based on our understanding of the code.
examples
- to implement the idea based on Time series anomaly discovery with grammar-based compression - using grammar analysis for time series outlier detection
- main steps: a. SAX-symbolize the time series b. numerosity reduction c. grammar induction by sequitur d. map rules to subsequences e. mine the patten
examples