-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dimensionality Reduction (approximation) along columns/time axis? #8
Comments
Hi legout, My convention is that each row is a time series, which means that the time axis is the second axis. For instance, If you're familiar with scikit-learn, you can think of the timestamps as the features of your data. Best regards, |
Hi Johann, my fault was to think of every timestamp being a new sample and every feature being a different measure (e.g. temperature and pressure). But this is only true, if there is also one label/output at each timestamp (or multiple labels/outputs). ** However, if I wanna map one (multivariate) timeseries to one label/output (or multiple labels/outouts) every timestamp is a feature. Btw, do you plan to implement WEASEL+MUSE into pyts? Best regards, **That was the case for me in every previous project. |
Hi legout, Multivariate time series are currently not supported in pyts. Adding specific algorithms for multivariate time series would definitely be a great idea. However, pyts is not under very active development currently and I can't make any promise on a release date with such algorithms. My on-the-fly thoughts for classification of multivariate time series would be to fit a classifier for each dimension and then use a voting classifier to predict one single label. The issue is that you lose the dependency between the dimensions though. You could also reduce the number of dimensions and use a single classifier, but it may be a bad idea if the time series are really different from each other in different dimensions. Best regards, |
it would great to add Multivariate time series like |
Tools for multivariate time series are provided in the pyts.multivariate module. The literature for multivariate time series classification is quite shallow (probably due to the lack of datasets for a very long time). Nonetheless, if you consider each feature of a multivariate time series independently, you can use the utility classes pyts.multivariate.transformation.MultivariateTransformer and pyts.multivariate.classification.MultivariateClassifier to apply a univariate time series algorithm to each feature a multivariate time series dataset independently. Hope this helps you a little. |
Really great news |
Do you mean time series with categorical values? I don't think that I have ever seen any algorithm in the time series classification literature that can deal with that. Maybe Markov chains would be more suited for such features. I think a few other Python packages like tslearn and sktime can also deal with multivariate time series. |
they do not have when data is mixer of continues and categorical variables for each time sample? for example they You can also apply one of the kernel methods & choose an appropriate kernel which can handle the categorical features...I think ARD kernel is one example, but I forget the details. You can see what the popular bayesian hyperparameter opt. packages do in this case "ARD kernel is one example, but I forget the details." https://www.cs.toronto.edu/~duvenaud/cookbook/ Then, simply put a product of SE kernels on those dimensions. This is the same as putting one SE ARD kernel on all of them. The lengthscale hyperparameter will now encode whether, when that coding is active, the rest of the function changes. If you notice that the estimated lengthscales for your categorical variables is short, your model is saying that it's not sharing any information between data of different categories. there is even code |
I'm a bit annoyed by the lack of the literature on this topic, but as the time there is no real way to deal with categorical time series in pyts at this stage. I will consider adding a |
Hi,
I wonder why the approximation functions PAA and DFT are applied to the rows? In my opinion based on what I found in the papers and dissertation of Patrick Schäfer, this should be applied to the columns (along the time axis). Am I wrong?
For example the code below returns an error:
However, what i´ve been expecting is the following:
Regards,
legout
The text was updated successfully, but these errors were encountered: