Lists of open-access functional datasets from different fields of application. We only collect data that can be used for cluster analysis. The main objective is to facilitate comparing with existing clustering methods (for functional data) and evaluating new clustering methods. A recent comprehensive review of clustering methods for functional data is available here. Our team is actively developing functional data clustering methods tailored to various data types and application domains. The software tools we have developed can be accessed here.
For datasets that need further processing on the linked data, a copy of them can be found in the Data folder. (This ongoing project is a bit slow, due to other commitments of the contributor.)
Name | Available at | Field | Task | Size | Length | Missing Value |
---|---|---|---|---|---|---|
ARC_Mobile | Publisher | Health | Clustering | 125 | 30/40 | Yes |
ArrowHead | UEA & UCR Time Series Classification Repository | Computer Vision | Classification | 211 | 251 | No |
BirdChicken | UEA & UCR Time Series Classification Repository | Computer Vision | Classification | 40 | 512 | No |
BTH_PM25 | Publisher | Environment | Clustering | 73 | 48 | Yes |
China_PM25 | Publisher | Environment | Clustering | 338 | 731 | Yes |
DiatomSizeReduction | UEA & UCR Time Series Classification Repository | Bioinformatics | Classification | 322 | 345 | No |
ECG200 | UEA & UCR Time Series Classification Repository | ECG | Classification | 200 | 96 | No |
FaceFour | UEA & UCR Time Series Classification Repository | Computer Vision | Classification | 112 | 350 | No |
Flour | R (cfda) | Food | Classification | 115 | 241 | No |
Fungi | UEA & UCR ... | Bioinformatics | Classification | 204 | 201 | No |
GunPoint | UEA & UCR Time Series Classification Repository | Motion | Classification | 200 | 150 | No |
Meat | UEA & UCR Time Series Classification Repository | Food | Classification | 120 | 448 | No |
Plane | UEA & UCR ... | Shape | Classification | 210 | 144 | No |
Phoneme | e-Book (ElemStatLearn) | Speech | Classification | 4K+ | 256 | No |
Strawberry | UEA & UCR Time Series Classification Repository | Food | Classification | 983 | 235 | No |
Symbols | UEA & UCR Time Series Classification Repository | Computer Vision | Classification | 1K+ | 398 | No |
Tecator | CMU StatLib | Food | Classification | 240 | 100 | No |
Name | Available at | Field | Task | Size | Length | Dimension |
---|---|---|---|---|---|---|
BasicMotions | UEA & UCR Time Series Classification Repository | Motion | Classification | 80 | 100 | 6 |
Blink | UEA & UCR ... | EEG | Classification | 950 | 510 | 4 |
ECG_Arrhythmia | Publisher | ECG | Classification | 10K+ | 5000 | 12 |
EEG_Full | UCI Machine Learning Repository | EEG | Classification | 122 | 256 | 64 |
Epilepsy | UEA & UCR ... | Motion | Classification | 275 | 207 | 3 |
ERing | UEA & UCR ... | Gesture | Classification | 300 | 65 | 4 |
EyesOpenShut | UEA & UCR ... | EEG | Classification | 98 | 128 | 14 |
FingerMovements | UEA & UCR ... | EEG | Classification | 416 | 50 | 28 |
Japanese_Vowels | UCI Machine Learning Repository | Speech | Classification | 640 | 29 | 12 |
UWaveGestureLibrary | UEA & UCR ... | Gesture | Classification | 4K+ | 315 | 3 |
We provided a Python generator for manifold-valued functional data. It can simulate five families of trajectories:
- Hypersphere (unit sphere trajectories)
- Hyperbolic (Poincaré ball model)
- Swiss roll (Swiss-roll curves, up to 3D)
- Lorenz (chaotic attractor, up to 3D)
- Pendulum (simple pendulum dynamics, up to 3D)
Each dataset is a collection of multi-dimensional functions that evolve along a specified manifold or dynamical system. The generator script lives in the Data/Manifold/
directory as manifold_valued_data_generator.py
. You can import it or run it directly. This generator was used in our NeurIPS work to evaluate FAEclust.
Outputs & Shapes
X.shape = (n_samples, n_features, n_steps)
: multivariate time series laid out as[sample, feature, time]
.y.shape = (n_samples,)
: integer labels (0 … n_clusters-1
) for cluster/dynamics identity.
Key Parameters
n_samples
: number of time series (functions) to generate.n_features
: dimensionality per time step (e.g., 2D, 3D coordinates).n_steps
: length (number of time points) in each trajectory.n_clusters
: number of distinct clusters/dynamics per dataset.base_noise
(optional): small perturbations; useful for realism.seed
(optional): random seed for reproducibility.
To change the size of a dataset (e.g., more functions), edit the corresponding tuple in specs
- no other code changes needed.