Skip to content

Add E-Divisive Mean algorithm #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sfc-gh-mfruth
Copy link

This patch adds the E-Divisive Mean algorithm that was defined in this paper:

Matteson, D.S. and James, N.A., 2014. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505), pp.334-345.

Additionally, the attribute min_cluster_size is added to the e_divisive call. This attribute allows to define the minimum amount of data points for each cluster. This attribute is also used in the reference implementation of the E-Divisive Mean algorithm: https://github.com/cran/ecp/blob/56d4d0ede1b361fd23010b75e5b0c5438de6cf7d/R/e_divisive.R#L42

  • If min_cluster_size == None: The old algorithm is used
  • If min_cluster_size >= 2: The new (patched) algorithm is used

To calculate a single change point, the current implementation has a runtime of O(T) (T = length of time series). The E-Divisive Mean algorithm has a runtime of O(T^2).

The following example should illustrate for which groups the Q-value is calculated:

time_series = [1, 2, 3, 4, 5, 6]
# X = 'left' cluster; Y = 'right' cluster
# Current implementation:
X = []
Y = [1, 2, 3, 4, 5, 6]

X = [1]
Y = [2, 3, 4, 5, 6]

X = [1, 2]
Y = [3, 4, 5, 6]

# ...

X = [1, 2, 3, 4, 5]
Y = [6]

####################################

# New/patched algorithm (E-Divisive Mean)
X = [1, 2]
Y = [3, 4]

X = [1, 2]
Y = [3, 4, 5]

X = [1, 2]
Y = [3, 4, 5, 6]

X = [1, 2, 3]
Y = [4, 5]

X = [1, 2, 3]
Y = [4, 5, 6]

X = [1, 2, 3, 4]
Y = [5, 6]

This patch adds the E-Divisive Mean algorithm that was defined in this paper:

Matteson, D.S. and James, N.A., 2014. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505), pp.334-345.

Additionally, the attribute `min_cluster_size` is added to the `e_divisive` call. This attribute allows to define the minimum amount of data points for each cluster. This attribute is also used in the reference implementation of the E-Divisive Mean algorithm: https://github.com/cran/ecp/blob/56d4d0ede1b361fd23010b75e5b0c5438de6cf7d/R/e_divisive.R#L42

- If `min_cluster_size` == None: The old algorithm is used
- If `min_cluster_size` >= 2: The new (patched) algorithm is used

To calculate a single change point, the current implementation has a
runtime of O(T) (T = length of time series). The E-Divisive Mean algorithm
has a runtime of O(T^2).

The following example should illustrate for which groups the Q-value is
calculated:

```
time_series = [1, 2, 3, 4, 5, 6]

X = []
Y = [1, 2, 3, 4, 5, 6]

X = [1]
Y = [2, 3, 4, 5, 6]

X = [1, 2]
Y = [3, 4, 5, 6]

X = [1, 2, 3, 4, 5]
Y = [6]

X = [1, 2]
Y = [3, 4]

X = [1, 2]
Y = [3, 4, 5]

X = [1, 2]
Y = [3, 4, 5, 6]

X = [1, 2, 3]
Y = [4, 5]

X = [1, 2, 3]
Y = [4, 5, 6]

X = [1, 2, 3, 4]
Y = [5, 6]
```
@sfc-gh-mfruth sfc-gh-mfruth requested review from a team and anjani-bhat and removed request for a team September 13, 2022 09:01
@michaelfruth
Copy link

I would be happy to revise the PR for a merge, e.g. add tests or outsource the new method, etc. Are there any changes you want?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants