Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
8749097
Updated TICC.py
davidhallac May 2, 2017
a6e951c
added the paper link to the read me
sagarvare Jun 10, 2017
2c038b4
TICC paper added
sagarvare Jun 10, 2017
b16d607
Fixed bug
davidhallac Jun 19, 2017
fd90497
Merge branch 'master' of github.com:davidhallac/CrossTimeCov
davidhallac Jun 19, 2017
e6890b1
Debugging
davidhallac Jun 19, 2017
eef6312
Debugging TICC Solver
davidhallac Jun 22, 2017
6fd474b
Changed Beta in default TICC
davidhallac Jun 22, 2017
7c49074
Additional info about the car.py in readme
sagarvare Aug 3, 2017
52786f0
Tweaks to get examples working
dstuck Sep 1, 2017
cef85c1
Merge pull request #5 from evidation-health/master
davidhallac Sep 1, 2017
a3cfd8e
Debugging
Oct 10, 2017
fd208d3
creating src dir and decomposing
scoutsaachi Oct 20, 2017
87b2e9c
fixing imports
scoutsaachi Oct 20, 2017
1673cd6
merge admm and graphvx
scoutsaachi Oct 20, 2017
1fdb6cc
add back solver
scoutsaachi Oct 20, 2017
1a2f080
removed snapvx from all things
scoutsaachi Oct 27, 2017
435fd63
convert indentation to spaces
scoutsaachi Oct 27, 2017
6677804
fix import bugs
scoutsaachi Oct 27, 2017
436997c
adding concurrency
scoutsaachi Nov 3, 2017
a95fd32
Merge branch 'master' of https://github.com/scoutsaachi/TICC
scoutsaachi Nov 3, 2017
9dab642
move print statement for optimization
scoutsaachi Nov 3, 2017
e1620f9
make admm solver callable
scoutsaachi Nov 3, 2017
041183a
change reassignment
scoutsaachi Nov 7, 2017
5588f33
moving assignment after smoothening
scoutsaachi Nov 7, 2017
85cd3f3
sample script
scoutsaachi Nov 11, 2017
12edc1b
Merge pull request #10 from scoutsaachi/master
davidhallac Nov 11, 2017
08a4563
Modified README
davidhallac Nov 12, 2017
0b9ec5e
switching to python3
scoutsaachi Nov 14, 2017
d4cf4de
Removed cvxpy dependency
davidhallac Feb 18, 2018
e7dcd29
Merge remote-tracking branch 'upstream/master'
scoutsaachi Mar 2, 2018
27d2c33
Bug fixes
davidhallac Mar 5, 2018
1aedeab
Merge remote-tracking branch 'upstream/master'
scoutsaachi Mar 8, 2018
cd79cda
added BIC method
scoutsaachi Mar 8, 2018
6525184
added beta bic
scoutsaachi Mar 9, 2018
c942e0f
single BIC
scoutsaachi Mar 15, 2018
dd29a04
Merge pull request #16 from scoutsaachi/orig_ticc
davidhallac Mar 15, 2018
bc7ce09
lasting git commit issues
scoutsaachi Mar 22, 2018
ee0802d
Closed Pool
davidhallac Mar 23, 2018
1379c01
changed TICC to a class with `predict` method.
mohataher Mar 28, 2018
d747e24
Added simple unit test
davidhallac Mar 30, 2018
cf92833
Merge branch 'master' of https://github.com/davidhallac/TICC
scoutsaachi Apr 5, 2018
ebf6238
change to python3
scoutsaachi Apr 5, 2018
fec802f
Merge pull request #27 from scoutsaachi/master
davidhallac Apr 5, 2018
20e1eed
had accidentally made TICC sequential in change to python3
scoutsaachi Apr 11, 2018
8218645
change convergence to be before point reassignment, change point reas…
scoutsaachi Apr 13, 2018
8ffaa5b
Merge pull request #29 from scoutsaachi/master
davidhallac Apr 13, 2018
95648a6
Re-seed the initialization every time TICC is called
davidhallac Apr 16, 2018
f9a3c07
Merge branch 'master' into master
davidhallac Apr 16, 2018
84603c2
Merge pull request #21 from mohataher/master
davidhallac Apr 16, 2018
4f26b41
Updated readme
davidhallac Apr 17, 2018
ab867c9
Merge branch 'master' of github.com:davidhallac/CrossTimeCov
davidhallac Apr 17, 2018
98c79bb
Updated readme
davidhallac Apr 17, 2018
890797a
Shining up README and adding full reference
RasmusFonseca Apr 30, 2018
549ee03
Update README.md
davidhallac Apr 30, 2018
d386665
Merge pull request #32 from RasmusFonseca/master
davidhallac Apr 30, 2018
49dce13
update UnitTest.py so it runs; remove redundant TICC init args
JesseKolb May 1, 2018
504fd64
Merge pull request #33 from JesseKolb/UnitTestUpdate
davidhallac May 1, 2018
52f98da
add predict method for new data & test batch/streaming using method
JesseKolb May 4, 2018
5788a14
Merge pull request #35 from JesseKolb/predict_new_data
davidhallac May 4, 2018
8afc301
Fixed Eta bug
davidhallac May 15, 2018
6f2a151
Create LICENSE
davidhallac Aug 17, 2018
097b274
Merge pull request #47 from davidhallac/add-license-1
davidhallac Aug 17, 2018
d953812
Biased covariance
Heusdens97 Jun 7, 2020
f6dfad0
remove print
Heusdens97 Jun 7, 2020
85d45d1
Merge pull request #66 from Heusdens97/master
davidhallac Jun 14, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
/**/*.pyc


.idea/

Results.txt
25 changes: 25 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
BSD 2-Clause License

Copyright (c) 2017-2018, David Hallac, Sagar Vare, Saachi Jain, and Others
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
237 changes: 21 additions & 216 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,235 +1,40 @@
# TICC
TICC is a python solver for efficiently segmenting and clustering a multivariate time series. For implementation details refer to the paper.
TICC is a python solver for efficiently segmenting and clustering a multivariate time series. It takes as input a T-by-n data matrix, a regularization parameter `lambda` and smoothness parameter `beta`, the window size `w` and the number of clusters `k`. TICC breaks the T timestamps into segments where each segment belongs to one of the `k` clusters. The total number of segments is affected by the smoothness parameter `beta`. It does so by running an EM algorithm where TICC alternately assigns points to clusters using a dynamic programming algorithm and updates the cluster parameters by solving a Toeplitz Inverse Covariance Estimation problem.

----
The TICC method takes as input a T-by-n data matrix, a regularization parameter "lambda" and smoothness parameter "beta", the window size "w" and the number of clusters "k". TICC breaks the T timestamps into segments where each segment belongs to one of the "k" clusters. The total number of segments is defined by the smoothness parameter "beta". It does so by running an EM algorithm where TICC alternately assigns points to clusters using a DP algorithm and updates the cluster parameters by solving a Toeplitz Inverse Covariance Estimation problem. The details can be found in the paper.
For details about the method and implementation see the paper [1].

Download & Setup
======================
## Download & Setup
Download the source code, by running in the terminal:
```
git clone https://github.com/davidhallac/TICC.git
```
Files
======================
The TICC package has the following important files:
```
TICC.py
```
Runs an instance of TICC algorithm.

**Parameters**

lambda_parameter : the lambda regularization parameter as described in the paper

beta : the beta parameter controlling the smoothness of the output as described in the paper

number_of_cluster: the number of clusters 'k' that the time stamps are clustered into

window_size : the size of the sliding window

prefix_string : the location of the output files

threhsold : used for generating the cross time plots. Not used in the TICC algorithm

input_file : Location of the data file of size T-by-n.

maxIters : maximum iteration of the TICC algorithm


**Returns**

saves a .csv file for each of the cluster inverse covariances

saves a .csv file with list of the assignments for each of the timestamps to the 'k' clusters

prints the binary accuracy, if the correct method for computing the confusion matrix is specified

----

```
car.py
```
Runs an instance of TICC algorithm on the car example (case-study), as described in the paper. The parameters are the same as the TICC example.

**Parameters**

lambda_parameter : the lambda regularization parameter as described in the paper

beta : the beta parameter controlling the smoothness of the output as described in the paper

number_of_cluster: the number of clusters 'k' that the time stamps are clustered into

window_size : the size of the sliding window

prefix_string : the location of the output files

threshold : used for generating the cross time plots. Not used in the TICC algorithm

input_file : Location of the data file of size T-by-n.

maxIters : maximum iteration of the TICC algorithm

**Returns**

saves a .csv file for each of the cluster inverse covariances

saves a .csv file with list of the assignments for each of the timestamps to the 'k' clusters

saves a .csv file with the locations information

saves a .csv file with the color information for each of the time stamps

----

```
network_accuracy.py
```
Runs an instance of TICC algorithm on the T-by-n data matrix as described in the paper. Used for generating the network accuracy table as shown in the paper. The parameters are the same as the TICC example.

**Parameters**

lambda_parameter : the lambda regularization parameter as described in the paper

beta : the beta parameter controlling the smoothness of the output as described in the paper

number_of_cluster: the number of clusters 'k' that the time stamps are clustered into

window_size : the size of the sliding window

prefix_string : the location of the output files

threhsold : used for generating the cross time plots. Not used in the TICC algorithm

input_file : Location of the data file of size T-by-n.

maxIters : maximum iteration of the TICC algorithm

**Returns**

saves a .csv file for each of the cluster inverse covariances

saves a .csv file with list of the assignments for each of the timestamps to the 'k' clusters

prints the network F1 scores for each of the clusters, assuming the "true" networks are stored as specified in the file.
## Using TICC
The `TICC`-constructor takes the following parameters:

----
```
generate_synthetic_data.py
```
Generates data using the methodology described in the paper. The data is generated from 'k' number of clusters. The 'T' time stamps are broken down into segments, and the segment lengths of the corresponding clusters should be mentioned in the 'break_points' array and 'seg_ids' list, respectively. So length of segment 'i' = break_points[i+1] - break_points[i].

**Parameters**

window_size : the size of the sliding window

number_of_sensors : The dimension 'n' of the output T-by-n data matrix.

sparsity_inv_matrix: sparsity of the MRF for each of the clusters. The sparsity of the inverse covariance matrix of each cluster.

rand_seed : The random seed used for generating random numbers

number_of_cluster: the number of clusters 'k' that the time stamps are generated from

cluster_ids : The corresponding cluster ids from which the segments are generated.

break_points : The end point of the segments. So length of segment 'i' = break_points[i+1] - break_points[i]

save_inverse_covariances : Boolean. Flag indicating if the computed inverse covariances for each of the clusters should be
saved as "Inverse Covariance cluster = cluster#.csv"

out_file_name : The file name where the .csv data matrix should be stored.

**Returns**
* `window_size`: the size of the sliding window
* `number_of_clusters`: the number of underlying clusters 'k'
* `lambda_parameter`: sparsity of the Markov Random Field (MRF) for each of the clusters. The sparsity of the inverse covariance matrix of each cluster.
* `beta`: The switching penalty used in the TICC algorithm. Same as the beta parameter described in the paper.
* `maxIters`: the maximum iterations of the TICC algorithm before convergence. Default value is 100.
* `threshold`: convergence threshold
* `write_out_file`: Boolean. Flag indicating if the computed inverse covariances for each of the clusters should be saved.
* `prefix_string`: Location of the folder to which you want to save the outputs.

saves a .csv file with data matrix of shape T-by-n

saves a .csv file for each of the inverse covariances of each cluster if the save_inverse_covariances flag is True.
The `TICC.fit(input_file)`-function runs the TICC algorithm on a specific dataset to learn the model parameters.

----
```
scalability_test.py
```
Runs an instance of the scalability test. Prints out the time required for each step: E-step (DP algorithm) and M-step (Optimization using Toeplitz Graphical Lasso).

**Parameters**

number_of_cluster: the number of clusters 'k' that the time stamps are clustered into

window_size : the size of the sliding window

input_file : Location of the data file of size T-by-n.

maxIters : maximum iteration of the TICC algorithm

**Output**

prints out the time taken for each of the steps in TICC algorithm. This function was used to generate the scalability plot in the paper.

----
```
TICC_solver.py
```
Solver for the TICC algorithm. Contains all the important functions. The solve function within the file can run an instance of the TICC algorithm. The details of the solve function are as below:
* `input_file`: Location of the data matrix of size T-by-n.

**Parameters**
An array of cluster assignments for each time point is returned in the form of a dictionary with keys being the `cluster_id` (from `0` to `k-1`) and the values being the cluster MRFs.

window_size : the size of the sliding window

maxIters : the maximum iterations of the TICC algorithm before covnergence. Default value is 100.
## Example Usage

lambda_parameter: sparsity of the MRF for each of the clusters. The sparsity of the inverse covariance matrix of each cluster.

beta: The switching penalty used in the TICC algorithm. Same as the beta parameter described in the paper.

number_of_clusters: the number of clusters 'k' that the time stamps are generated from

threshold: the threshold parameter used in visualization. Not a part of the TICC algorithm.

input_file: Location of the Data matrix of size T-by-n.

prefix_string: Location of the folder to which you want to save the outputs.

write_out_file : Boolean. Flag indicating if the computed inverse covariances for each of the clusters should be
saved as "Inverse Covariance cluster = cluster#.csv"

**Returns**

returns an array of cluster assignments for each time point.

returns a dictionary with keys being the cluster_id (from 0 to k-1) and the values being the cluster MRFs.

----

Example Usage
======================

Generating the data. In case, you already have a data matrix, skip this step. For generating the data as mentioned in the paper, use generate_synthetic_data.py. Change the parameters of break_points and seg_ids, to define the temporal pattern of your time series that you want to generate. Use the sparsity_inv_matrix to define the sparsity of the MRF of each cluster. ALso set window_size, number_of_sensors appropriately according to your application. Then run the following command:

```
python generate_synthetic_data.py
```
Next use the TICC.py file for running an instance of the TICC algorithm on the data matrix. The TICC.py method should be initialized with the following parameters : smoothness parameter 'beta', sparsity regularization 'lambda', window size, maximum Iterations before convergence, number of clusters, location of the input and output file. After updating this in the TICC.py file, run the following:

```
python TICC.py
```
For generating the network accuracy plots, use the Network.py file. Add the same parameters as above in the network_accuracy.py file and additionally save the true Inverse covariances as "Inverse Covariance cluster = 'cluster#'.csv" in the same directory as the network_accuracy.py file. Next run:
```
python network_accuracy.py
```
For running a scalability experiment, use the scalability_test.py file. Set the parameters within the file same as the TICC.py file, and run the following command:
```
python scalability_test.py
```

For using the solver, on your data , the usage is as shown below. Enter the parameters as mentioned in the paper. Use the output cluster_assignments and the dictioanry of the cluster_MRFs, as needed in the application.
```
import TICC_solver as TICC
(cluster_assignment, cluster_MRFs) = TICC.solve(window_size = 10,number_of_clusters = 5, lambda_parameter = 11e-2, beta = 400, maxIters = 100, threshold = 2e-5, write_out_file = False, input_file = "data.csv", prefix_string = "output_folder/"):
```
See `example.py`.


References
==========
(Add the reference to the paper.)
## References
[1] D. Hallac, S. Vare, S. Boyd, and J. Leskovec [Toeplitz Inverse Covariance-Based Clustering of
Multivariate Time Series Data](http://stanford.edu/~hallac/TICC.pdf) Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 215--223
Loading