Add E-Divisive Mean algorithm #22

sfc-gh-mfruth · 2022-09-13T09:01:34Z

This patch adds the E-Divisive Mean algorithm that was defined in this paper:

Matteson, D.S. and James, N.A., 2014. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505), pp.334-345.

Additionally, the attribute min_cluster_size is added to the e_divisive call. This attribute allows to define the minimum amount of data points for each cluster. This attribute is also used in the reference implementation of the E-Divisive Mean algorithm: https://github.com/cran/ecp/blob/56d4d0ede1b361fd23010b75e5b0c5438de6cf7d/R/e_divisive.R#L42

If min_cluster_size == None: The old algorithm is used
If min_cluster_size >= 2: The new (patched) algorithm is used

To calculate a single change point, the current implementation has a runtime of O(T) (T = length of time series). The E-Divisive Mean algorithm has a runtime of O(T^2).

The following example should illustrate for which groups the Q-value is calculated:

time_series = [1, 2, 3, 4, 5, 6]
# X = 'left' cluster; Y = 'right' cluster
# Current implementation:
X = []
Y = [1, 2, 3, 4, 5, 6]

X = [1]
Y = [2, 3, 4, 5, 6]

X = [1, 2]
Y = [3, 4, 5, 6]

# ...

X = [1, 2, 3, 4, 5]
Y = [6]

####################################

# New/patched algorithm (E-Divisive Mean)
X = [1, 2]
Y = [3, 4]

X = [1, 2]
Y = [3, 4, 5]

X = [1, 2]
Y = [3, 4, 5, 6]

X = [1, 2, 3]
Y = [4, 5]

X = [1, 2, 3]
Y = [4, 5, 6]

X = [1, 2, 3, 4]
Y = [5, 6]

This patch adds the E-Divisive Mean algorithm that was defined in this paper: Matteson, D.S. and James, N.A., 2014. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505), pp.334-345. Additionally, the attribute `min_cluster_size` is added to the `e_divisive` call. This attribute allows to define the minimum amount of data points for each cluster. This attribute is also used in the reference implementation of the E-Divisive Mean algorithm: https://github.com/cran/ecp/blob/56d4d0ede1b361fd23010b75e5b0c5438de6cf7d/R/e_divisive.R#L42 - If `min_cluster_size` == None: The old algorithm is used - If `min_cluster_size` >= 2: The new (patched) algorithm is used To calculate a single change point, the current implementation has a runtime of O(T) (T = length of time series). The E-Divisive Mean algorithm has a runtime of O(T^2). The following example should illustrate for which groups the Q-value is calculated: ``` time_series = [1, 2, 3, 4, 5, 6] X = [] Y = [1, 2, 3, 4, 5, 6] X = [1] Y = [2, 3, 4, 5, 6] X = [1, 2] Y = [3, 4, 5, 6] X = [1, 2, 3, 4, 5] Y = [6] X = [1, 2] Y = [3, 4] X = [1, 2] Y = [3, 4, 5] X = [1, 2] Y = [3, 4, 5, 6] X = [1, 2, 3] Y = [4, 5] X = [1, 2, 3] Y = [4, 5, 6] X = [1, 2, 3, 4] Y = [5, 6] ```

michaelfruth · 2023-08-03T07:20:41Z

I would be happy to revise the PR for a merge, e.g. add tests or outsource the new method, etc. Are there any changes you want?

sfc-gh-mfruth requested review from a team and anjani-bhat and removed request for a team September 13, 2022 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add E-Divisive Mean algorithm #22

Add E-Divisive Mean algorithm #22

Uh oh!

sfc-gh-mfruth commented Sep 13, 2022

Uh oh!

michaelfruth commented Aug 3, 2023

Uh oh!

Uh oh!

Add E-Divisive Mean algorithm #22

Are you sure you want to change the base?

Add E-Divisive Mean algorithm #22

Uh oh!

Conversation

sfc-gh-mfruth commented Sep 13, 2022

Uh oh!

michaelfruth commented Aug 3, 2023

Uh oh!

Uh oh!