TICC fails with clusters with only one observation #66

Heusdens97 · 2020-06-07T15:27:47Z

Hi David

I used your algorithm for anomaly detection on airplane data. This was in the context of my master thesis. I found the next potential problem.

If the number of clusters is too high, TICC might fail because it contains clusters with only one observation. It calculates the covariance matrix of the clusters using the unbiased covariance formula (with N-1 in the denominator), where N is the amount of observations.

In this case, it will divide by 0, which results in NaN and causes a failure of the algorithm. Clusters with only one observation are not typical, but might be interesting for anomaly detection. TICC is based on the EM-algorithm and will thus iterate, it is thus also possible that it has temporary clusters of size one. Hence, It would be handy if TICC could work with clusters of size one.

Thus, I added an option to choose between the unbiased and the biased (biased divides by N) covariance.
I also added some tests, which illustrate the failure and illustrate that both result in the same cluster assignment. I also had a closer look to the biased and the unbiased covariance matrices. The differences between them are mostly very small. In one of my experiments, the differences are of magnitude 10e-2 or smaller.

Another option would be to only use the biased covariance if there is only one observation, but I leave this up to u.

Kind regards
Jordy Heusdens

Heusdens97 added 2 commits June 7, 2020 17:12

Biased covariance

d953812

remove print

f6dfad0

davidhallac merged commit 85d45d1 into davidhallac:master Jun 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TICC fails with clusters with only one observation #66

TICC fails with clusters with only one observation #66

Uh oh!

Heusdens97 commented Jun 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

TICC fails with clusters with only one observation #66

TICC fails with clusters with only one observation #66

Uh oh!

Conversation

Heusdens97 commented Jun 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants