-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WEASEL Implementation #77
Comments
Hi, Thanks for your kind words. Regarding your first question, the limit on the word size is indeed arbitrary. However, this limit is already much higher of what is used in practice. Citing the WEASEL paper:
Let
And you do this for every window, which means that the maximum number of extracted features is:
The alphabet size Obviously not all the possible words will actually be present, but you increase the list of possible words, so you will increase the list of unique words, and the frequency of each word is lower. So you get words that are more specific to each time series. If you get words that are too specific, you are just overfitting the training set. Note that WEASEL is fitted on the training set and applied on the training and test sets (you learn the vocabulary on the training set and you compute the frequency of each word for each time series in the training and test sets). Regarding your second question, WEASEL is a bit serializable. The transformation is performed for each window (characterized by the
Minimal working example: from joblib import dump
import numpy as np
from pyts.datasets import load_gunpoint
from pyts.transformation import WEASEL
# Load the GunPoint dataset
X_train, X_test, y_train, y_test = load_gunpoint(return_X_y=True)
# Define the parameters
common_params = {'sparse': False, 'chi2_threshold': 1.2}
window_sizes = [0.1, 0.3, 0.5, 0.7, 0.9]
# Compute the transformation all in once
weasel = WEASEL(**common_params, window_sizes=window_sizes)
X_train_weasel_1 = weasel.fit_transform(X_train, y_train)
X_test_weasel_1 = weasel.transform(X_test)
# Compute the transformation for each window independently
X_train_weasel_2, X_test_weasel_2 = [], []
for window_size in window_sizes:
weasel_ind = WEASEL(**common_params, window_sizes=[window_size])
X_train_weasel_2.append(weasel_ind.fit_transform(X_train, y_train))
X_test_weasel_2.append(weasel_ind.transform(X_test))
# You can save the estimator (or both arrays if you prefer)
dump(weasel_ind, f'weasel_{window_size}.joblib')
X_train_weasel_2 = np.concatenate(X_train_weasel_2, axis=1)
X_test_weasel_2 = np.concatenate(X_test_weasel_2, axis=1)
# Check that the arrays are identical
np.testing.assert_equal(X_train_weasel_1, X_train_weasel_2)
np.testing.assert_equal(X_test_weasel_1, X_test_weasel_2) Hope this helps you a bit. |
Dear @johannfaouzi , First, thanks for your implementation and for the recent review paper. My question is somewhat relevant to this discussion. As far as I understand, in principle, Also, do you think it make sense to have a finer quantisation for larger windows? Could you share your intuition on this? Thanks! |
Hi,
I would consider With the current version of code, you would have to do it "manually" by indeed creating several independent instances of
Well, it depends. Having larger sliding windows implies having a lower number of windows for each time series (and thus fewer samples). Finer quantization usually implies that you have more samples, so you can afford a better estimation of the quantiles. Moreover, the number of possible words increase exponentially with the alphabet size: for any window size, the theoretical maximum number of features is
Otherwise you will get features that are extremely sparse (some words being possibly present in only one time series), which you don't want: if the words for the time series in the test set are never present in the training set, you will probably get a really bad predictive performance. In my opinion, this is very related to overfitting/underfitting: you want to extract features that are actually useful (not too specific to each time series - overfitting, not too generic - underfitting). I'm not perfectly sure of the literature, but in most cases, I think that the authors set it to Hope this helps you a bit. |
I don't have a bug or issue to report. Just some questions. First, the word-bank size for the Weasel transformation is limited to 26. Apparently this is because there are 26 letters in the English alphabet, but this seems like a rather arbitrary limitation. Why was this designed this way? Secondly, is it possible to serialize a Weasel transformer? suppose I have a large time-series that takes a while to model. Shouldn't there be a warm-start process?, best if it's serializable to str or json, but even pickle would be nice. if it is let me know!
Thanks for the great package!
Andrew
The text was updated successfully, but these errors were encountered: