Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding Distances

Generalizable Lower Bounding (GLB) is a framework for developing tight and efficient Lower Bounds (LB) for elastic measures in time series similarity search. GLB variants outperform state-of-the-art LBs for all elastic measures.

Comparison of GLB against existing LBs on pruning power speedup over 128 UCR datasets. Blue dots above the diagonal indicate datasets where GLB wins.

This package includes implementation of GLB variants as well as state-of-the-art lower bounds used as baselines to showcase the performance of GLB. The primary purposes of this package is to

open-source the code for experiments in our paper to facilitate replication of results
provide our impelementation of existing LBs and GLB to facilitate further research and application in the community.

Publication

"Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding Distances", Proceedings of the VLDB Endowment (PVLDB 2023) Journal, Volume 16, pages 2019‑2032

John Paparrizos*, Charlie Kaize Wu*, Aaron Elmore, Christos Faloutsos, and Michael J. Franklin

Dataset

Our experiments are performed on 128 datasets from the UCR archive, which are included in this package (UCR2018-New). The UCR archive includes datasets from various domains and is the largest public collection of labeled time series datasets. Datasets are normalized and split into training and test sets. For datasets with varying lengths and missing values, we resort to pre-processed versions which used standardized resampling and interpolation methods to fix these issues.

The Experiment File

In our evluation, we tested the performance of 23 LBs on 128 datasets. More specifically, we have three sets of experiments: (1) evaluation of experiments when used alone, which applies to all LBs ; (2) evaluation of lower bounds running in cascades (see below for detailed explaination), which applies to LB_Improved, LB_New, and GLB; (3) a breakdown analysis of GLB_DTW to evaluate the contribution of its three components.

Results of three sections can all be obtained by running Exp.py script with different parameters. When running in the terminal, Exp.py takes 2 parameters: the first decides the LB to test, and the second decides the number of datasets to test on. To replicate all our results, use the following command, where all indicates all LBs, and full indicates all datasets.

python Exp.py all full

To run an selected LB on a selected number of datasets, for example $LB_Keogh$ on the first 40 datasets only, use the following command:

python Exp.py LB_Keogh 40

See next section for a comprehensive list of LBs implemented and evaluated.

Lower Bounds Implemented

Elastic Measure	Lower Bounds	Note
Dynamic Time Warping (DTW)	`LB_Keogh`
DTW	`GLB_DTW`	GLB variant for DTW
DTW	`LB_Kim`
DTW	`LB_New`
DTW	`LB_Improved`
Longest Common Subsequence (LCSS)	`LB_Keogh_LCSS`
LCSS	`GLB_LCSS`	GLB variant for LCSS
Edit Distance with Real Penaly (ERP)	`LB_Kim_ERP`
ERP	`LB_Keogh_ERP`
ERP	`GLB_ERP`
ERP	`LB_ERP`
Move-Split-Merge (MSM)	`LB_MSM`
MSM	`GLB_MSM`	GLB variant for MSM
Sequence Weighted Alignment (SWALE)	`GLB_SWALE`	GLB variant for SWALE
Edit Distance on Real Sequences (EDR)	`GLB_EDR`	GLB variant for EDR
Time Warp Edit Distance (TWED)	`LB_TWED`
TWED	`GLB_TWED`	GLB variant for TWED
DTW	`Cas_Keogh_GLB`	LB_Keogh and GLB_DTW running in a cascade
DTW	`Cas_Keogh_New`	LB_Keogh and LB_New running in a cascade
DTW	`Cas_Keogh_Improved`	LB_Keogh and LB_Improved running in a cascade
DTW	`Breakdown_QueryData`	GLB with query and data depencies but without boundary depency
DTW	`Breakdown_QueryOnly`	GLB with query depency but without data or boudary depency
DTW	`Breakdown_QueryBoundary`	GLB with query and boundary dependency but without data dependency

Replicating Our Results

Download the package to your machine
Go to this directory and download dependent packages by running the following commands in terminal

> cd GLB
> pip install -r requirements.txt

Run the experiment file using python:

> python Exp.py all full

The experiments will run automatically and output the results in an excel spreadsheet in current directory.

Note Running our entire experiments take time.

We ran our experiments on a linux server with the following configuration: Dual Intel(R) Xeon(R) Silver 4116 (12-corewith 2-way SMT), 2.10 GHz, 196GB RAM. The server ran Ubuntu Linux 18.04.3 LTS (64-bit) and used Python 3.7.5, Numba 0.53.1, and GCC 8.4.0 compiler. It took us more than 7 days to finish all experiments.

Given limited time, partial results could be obtained by changing Line 54 of Experiment files to change the number of datasets you want to test on.

See the following figure for how to interpret the results:

Use LBs implemented in this library

To experiment with LBs and Elastic Measures implemented in this package, simply import the two scripts and run correspondng elastic measure/LB as shown in example below, where we compute the DTW distance between two arbitrary time series and use lb_keogh and glb_dtw to compute the lower boud of DTW distance.

Note that Elastic Measures in ElasticMeasures.py and LBs implemented in LB.py expect a time series to be formatted as a 1D numpy array.

import numpy as np
import ElasticMeasures
import LB

ts_x = np.array([3, 4, 8, 4, 15, 10, 3, 14])
ts_y = np.array([4, 8, 12, 19, 23, 7, 2, 9])

dtw_dist = ElasticMeasures.dtw(ts_x, ts_y, w = 2)
lb_keogh_dist = LB.lb_keogh(ts_x, ts_y, w = 2)
glb_dtw_dist = LB.glb_dtw(ts_x, ts_y, w = 2)

print("DTW Distance: ", dtw_dist)
print("LB_Keogh Distance: ", lb_keogh_dist)
print("GLB_DTW Distance: ", glb_dtw_dist)

Output:

DTW Distance:  11.874342087037917
LB_Keogh Distance:  5.916079783099616
GLB_DTW Distance:  10.344080432788601

Use Elastic Measures and LBs in 1-NN Classification

Naive 1-NN Search

class Bounded1NN(object):

    def __init__(self, metric, lb = True, constraint = None, w = 1, epsilon = 0.2, m = 0, g = 0.3, c =0.5, lamb =1, nu = 0.0001, timesx = None, timesy = None, p = 5, r = 1):
        self.metric = metric
        self.w = w
        self.epsilon = epsilon
        self.constraint = constraint
        

    def fit(self, X, Xlabel):

        self.X = X
        self.Xlabel = Xlabel
    
    def predict(self, Y):
        pruned = 0
        test_class = np.zeros(Y.shape[0])
        window = self.w * Y.shape[1]
        
        if self.lb == True: 
            for idx_y, y in enumerate(Y):
                best_so_far = float('inf')
                lb_list = np.zeros(self.X.shape[0])
                
                    lb_dist = glb_dtw(x, y, w)

                    lb_list[idx_x] = lb_dist

                ordering = np.argsort(lb_list)

                self.X = self.X[ordering]
                self.Xlabel = self.Xlabel[ordering]
                lb_list = lb_list[ordering]
                
                for idx_x, x in enumerate(self.X):

                    lb_dist = lb_list[idx_x]

                    if lb_dist < best_so_far:

                        if self.metric == "GLB_DTW": 
                            actual_dist = dtw(x, y, window)

                        if actual_dist < best_so_far:
                            best_so_far = actual_dist
                            test_class[idx_y] = self.Xlabel[idx_x]
                    
                    if lb_dist > best_so_far:
                        pruned += 1
                    
        pruning_power = pruned / (Y.shape[0] * self.X.shape[0])
            
        
        return test_class, pruning_power

Use Pre-processing to accelerate 1-NN

class Bounded1NN(object):

    def __init__(self, metric, lb = True, constraint = None, w = 1, epsilon = 0.2, m = 0, g = 0.3, c =0.5, lamb =1, nu = 0.0001, timesx = None, timesy = None, p = 5, r = 1):
        self.metric = metric
        self.w = w
        self.epsilon = epsilon
        self.constraint = constraint
        

    def fit(self, X, Xlabel):

        self.X = X
        self.Xlabel = Xlabel

    def predict(self, Y):
        pruned = 0
        test_class = np.zeros(Y.shape[0])
        window = self.w * Y.shape[1]
        window = matlab_round(window)
        
        if self.lb == True: 
            self.XUE, self.XLE = make_envelopes(self.X, window)
            self.YUE, self.YLE = make_envelopes(Y, window)
            for idx_y, y in enumerate(Y):
                best_so_far = float('inf')
                lb_list = np.zeros(self.X.shape[0])
                
                    lb_dist = glb_dtw(x, y, self.YUE[idx_y], self.YLE[idx_y],  self.XUE[idx_x], self.XLE[idx_x])

                    lb_list[idx_x] = lb_dist

                ordering = np.argsort(lb_list)

                self.X = self.X[ordering]
                self.Xlabel = self.Xlabel[ordering]
                lb_list = lb_list[ordering]
                
                self.XUE = self.XUE[ordering]
                self.XLE = self.XLE[ordering]


                for idx_x, x in enumerate(self.X):

                    lb_dist = lb_list[idx_x]

                    if lb_dist < best_so_far:

                        if self.metric == "GLB_DTW": 
                            actual_dist = dtw(x, y, window, self.constraint)

                        if actual_dist < best_so_far:
                            best_so_far = actual_dist
                            test_class[idx_y] = self.Xlabel[idx_x]
                    
                    if lb_dist > best_so_far:
                        pruned += 1
                    
        pruning_power = pruned / (Y.shape[0] * self.X.shape[0])
            
        
        return test_class, pruning_power

Use LBs for 1-NN Classification

>>> from LB import Bounded1NN
>>> model = Bounded1NN(metric = 'lcss', lb = True)
>>> model.fit(Coffee_train_X, Coffee_train_y)
>>> predicted_label = model.predict(Coffee_test_X)
>>> print('predicted_label: ', predicted_label)

>>> lb_predict:  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
UCR2018-NEW		UCR2018-NEW
__pycache__		__pycache__
revision_exp		revision_exp
ElasticMeasures.py		ElasticMeasures.py
Exp.py		Exp.py
ExpLB.py		ExpLB.py
LB.py		LB.py
README.md		README.md
UCR_data_summary.xlsx		UCR_data_summary.xlsx
comparison.png		comparison.png
requirements.txt		requirements.txt
result_intepretation.png		result_intepretation.png
selected_speedup.png		selected_speedup.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding Distances

Publication

Dataset

The Experiment File

Lower Bounds Implemented

Replicating Our Results

Use LBs implemented in this library

Use Elastic Measures and LBs in 1-NN Classification

Naive 1-NN Search

Use Pre-processing to accelerate 1-NN

Use LBs for 1-NN Classification

About

Releases

Packages

Languages

johnpaparrizos/GLB

Folders and files

Latest commit

History

Repository files navigation

Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding Distances

Publication

Dataset

The Experiment File

Lower Bounds Implemented

Replicating Our Results

Use LBs implemented in this library

Use Elastic Measures and LBs in 1-NN Classification

Naive 1-NN Search

Use Pre-processing to accelerate 1-NN

Use LBs for 1-NN Classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages