Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting different results on the same data set #17

Open
alooferyj opened this issue Oct 13, 2023 · 1 comment
Open

Getting different results on the same data set #17

alooferyj opened this issue Oct 13, 2023 · 1 comment

Comments

@alooferyj
Copy link

Hi,

Am I supposed to get different clustering result from each run on the same data set? I was running the following code:

univariate_ts_datasets = np.expand_dims(np.random.rand(200, 60), axis=2)
num_clusters = 3

CPU Model

for j in range(5):
ksc = KShapeClusteringCPU(num_clusters, centroid_init='zero', max_iter=100, n_jobs=-1)
ksc.fit(univariate_ts_datasets)

labels = ksc.labels_ # or ksc.predict(univariate_ts_datasets)
cluster_centroids = ksc.centroids_
print(labels)

My understanding is that there's nothing random in the algo since I set the centroid init to be zero. But I get different results, e.g. sometimes the first ts and second ts belong to the same cluster and sometimes they don't

Thanks,
James

@HaojunLi
Copy link
Collaborator

Hi James,
K-Shape indeed has randomness. In your code, you set centroid init to zero, but k-Shape will randomly assign time series into clusters as an initialization. Please check here:
image
Also, when a cluster becomes empty during iterations, k-Shape will randomly assign a time series to that empty cluster. If you want to have a deterministic result, you need to set a random seed. I hope my answer can help.
Best,
Haojun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants