FAEclust is the first autoencoder architecture designed specifically for clustering multi-dimensional functional data. In FAEclust, we employ univariate functions as weights instead of integral kernels:
- Unlike traditional regression models, where integral kernels are used for functional predictors to capture full dependence, our encoder is designed to learn latent representations. Consequently, the univariate functions serve as a coordinate system in the Hilbert space.
- Using univariate functions not only reduces computational cost but also improves interpretability. In particular, the shape of the learned functional weights reveals which regions of the input functional data contribute most to the construction of the embedded representations.
Both the encoder and decoder are universal approximators. We introduce a shape-informed clustering objective that is robust to phase variation in functional data, and we develop a path-following homotopy algorithm with complexity O(n log(n)) to obtain the optimal clustering of the latent representations. If you used this package in your research, please cite our paper:
@inproceedings{SVS2025NeurIPS,
author = {Samuel Singh and Shirley Coyle and Mimi Zhang},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
editor = { },
pages = { },
publisher = {Curran Associates, Inc.},
title = {Shape-Informed Clustering of Multi-Dimensional Functional Data via Deep Functional Autoencoders},
volume = { },
year = {2025}
}- Network Update: In the backward phase, we update the network parameters by minimizing a unified objective function (Loss) that incorporates both the network training objective (Penalized Reconstruction Loss) and the clustering regularization (Clustering Loss).
- Cluster Update: During the forward phase, we update the learned latent representations, which necessitates a concurrent update of the clustering results.
The modular pipeline for FAEclust has the following structure:
- Similarity: Compute pairwise (elastic) distances with
TimeSeriesDistance(), identify the optimal number of nearest neighbors (min the paper) viaNearestNeighborsOpt(), and finally compute pairwise similarity measures. - Smoothing: Convert raw curves into basis functions and expansion coefficients via
smoothing_features(). - FAE network: Configure and train the functional network via
FunctionalAutoencoder(). - Convex Clustering: Cluster analysis of the latent representations, where the clustering objective function is a convex function.
Smoothing is a utility class that implements the smoothing step. It supports B-spline, Fourier series, and Wavelet-based smoothing. This class is used internally by smoothing_features().
Smoothing(
dis_p, # number of grid points for evaluating functions
fit, # basis function type: 'bspline', 'fourier', or Wavelet name
n, # for Fourier: number of harmonics (2n+1 basis functions)
smoothing_str, # initial smoothing parameter for B-splines if _terms_ is not given
terms, # number of basis terms/knots to use (if None, auto-optimize)
wavelet_level, # Wavelet decomposition level (for Wavelet fits) if _terms_ is not given
data = None # input data of shape (n_samples, n_features, n_timesteps)
) -
dis_p(int): Number of grid points for evaluating functions. Default:300 -
fit(str): The type of basis expansion to use for smoothing. Options are the same as insmoothing_features:'bspline','fourier', or a Wavelet name (e.g.,'db4'). Default:'bspline' -
n(int): Applicable iffit='fourier'. It specifies the number of Fourier harmonics to include. The total number of Fourier basis functions will be$2n + 1$ (including the constant term,$n$ cosine terms, and$n$ sine terms). Ifn=None, the smoothing is adaptive and Generalized Cross-Validation (GCV) is used to find the optimal number of harmonics up tonby GCV. Default:None -
smoothing_str(float): Parameters for B-spline fitting. This is passed to the spline fitting routine (scipy.interpolate.splrep()) to control the trade-off between smoothness and fidelity: higher values yield smoother curves (more regularization), while s=0 fits the spline through all points (interpolation). Ifterms(number of knots/basis functions) is not specified for B-splines, this parameter is internally optimized via GCV. Default:0.3 -
terms(int or None): Applicable iffit='bspline'. The number of basis functions or knots to use. Ifterms=None, the smoothing is adaptive and class will attempt GCV to find the optimum fit and the corresponding terms. Default:None -
wavelet_level(int): The level of decomposition for Wavelet smoothing. Higher levels capture coarser structures. Ifterms=None, andfitis a Wavelet, the code attempts to find an optimal level via GCV. Default:5 -
data(np.ndarray, shape (n_samples, n_features, n_timesteps)): The raw sample paths to smooth.
-
coeffs(np.ndarray, shape=(n_samples, n_features, m_basis)): Array of basis expansion coefficients. -
fn_s(list of callables): List of the smoothed functions evaluated/defined on the time grid. -
smoothing_basis(list of callables): The list of basis functions used for smoothing the raw sample paths.
TimeSeriesDistance computes the pairwise distance matrix for the raw sample paths.
TimeSeriesDistance(
X, # raw sample paths of shape (n_samples, n_features, n_timesteps)
metric, # distance metric to use ('fastdtw' or 'elastic')
n_jobs # number of parallel jobs for computation
) -
X(np.ndarray, shape=(n_samples, n_features, n_timesteps)): An array containing the raw multi-dimensional functional data. Functional data should be standardized before distance computation to ensure comparability. -
metric(str): Metric to use for distance computation. Options include:-
'fastdtw': Distance measure using the dynamic time warping method. -
'elastic'(default): The elastic distance metric.
-
-
n_jobs(int): Number of parallel jobs for computation. Default:-1
-
compute_distances(self): Compute the pairwise distance matrix.- Returns
dist_matrix(np.ndarray, shape (n_samples, n_samples)) : The pairwise distance matrix.
- Returns
-
plot_extremes(self): Visualise the pair of most similar and the most distinct samples.- Returns
None.
- Returns
NearestNeighborsOpt is a utility class for determining the optimal number of nearest neighbors (m) used in the pairwise similarity measure of the clustering objective function. Given a pairwise distance matrix, it examines how the structure of the k-nearest neighbor graph evolves as k varies, using two complementary criteria: the distance "knee" and graph connectivity.
NearestNeighborsOpt(
dist_matrix # pairwise distance matrix of shape=(n_samples, n_samples)
) dist_matrix(np.ndarray, shape (n_samples, n_samples)) : Distance matrix returned byTimeSeriesDistance().
-
estimate_optimal_m(method='avg_distance', max_m=None): Select the optimal number of nearest neighbors.-
method(str) : Method to use for neighbourhood optimization. Options include:-
'avg_distance'(default) : Find the$m$ at which the average m-th neighbor distance exhibits the largest jump or knee. -
'connectivity': Find the smallest$m$ at which the k-NN graph is fully connected.
-
-
max_m(int) : Maximum number of neighbors to consider. Defaultmax_m=n_samples-1 -
Returns
-
m(int) : The optimum number of nearest neighbors.
-
-
-
get_nearest_neighbors(opt_m=m): Construct the adjacency list (neighbor index list) for each data point given the neighborhood valuem.-
m(int): Estimated usingestimate_optimal_m(). -
Returns
-
neighbors_dict(dict) : A dictionary mapping each data point to itsmnearest neighbors.
-
-
-
compute_similarity(neighbors_dict): Calculate the pairwise similarities from the distance matrix and the the k-NN graph.-
neighbors_dict(dict) : ... -
Returns
-
sim_matrix(np.ndarray, shape (n_samples, n_samples)): The pairwise similarity matrix (n×n).
-
-
FAEclust is a deep learning framework for clustering multivariate functional data. It integrates three key components: (1) functional data smoothing via basis function expansion (e.g. B-splines, Fourier series, Wavelet family, ...) to provide a smooth representation of each sample path, (2) a functional autoencoder consisting of an encoder for learning complex relationships among the features and a decoder for flexibly reconstructing intricate functional patterns, and (3) a shape-informed convex clustering algorithm that automatically determines the optimal number of clusters. The FunctionalAutoencoder class is designed to handle these steps end-to-end, while providing various hyperparameters to tailor the model to different datasets.
FunctionalAutoencoder(
p, # number of component random functions (dimensions)
layers, # list specifying encoder/decoder layer width
l_basis, # number of basis functions for encoder functional weights
m_basis, # number of basis functions for smoothing the sample paths
basis_smoothing, # list of basis functions used for smoothing (e.g. Fourier basis)
basis_input, # list of basis functions for encoder functional weights (e.g. B-spline basis)
lambda_e, # penalty parameter for the orthogonality regularization on encoder functional weights
lambda_d, # penalty parameter for the roughness regularization on encoder functional weights and biases
lambda_c, # penalty parameter for the clustering loss in the integrated objective function
t, # time grid (array of length T) over which the smoothed functions are evaluated
sim_matrix # pairwise similarity matrix (n×n) in the clustering objective function
)-
p(int) – Number of component random functions (dimensions). For example,p=2for two-dimensional functional data. -
layers(list of int) – Architecture specification for the autoencoder’s layers. This list should include the width of each encoder layer, the latent dimension, and the width of each decoder layer. The format is[q1, q2, ..., s, ..., Q2, Q1, Z1, Z2], where:-
q1is the number of nodes in the encoder's first hidden layer. -
q2, ..., sare the sizes of the successive hidden layers in the encoder, withsbeing the final latent dimension (the size of the bottleneck vector). -
..., Q2are the sizes of the dense layers in the decoder (mirroring the encoder’s dense layers). -
Q1, Z1, Z2are the sizes of the last three layers of the decoder that output functions.
-
-
l_basis(int) – Functional weights and biases are represented as linear combinations of basis functions.l_basisis the number of basis functions for the functional weights in the encoder. -
m_basis(int) – Number of basis functions used for converting the raw sample paths into smooth functions. -
basis_smoothing(list of callables) – A list ofm_basisbasis functions (evaluated on the time gridt) for smoothing the raw sample paths. Each basis function is a callablebasis_smoothing(x: array_like[T, p]) -> np.ndarray[T, p]The provided utilitysmoothing_features()can generate this list along with the expansion coefficients. -
basis_input(list of callables) – A list ofl_basisbasis functions (evaluated on the time gridt) for representing the functional weights in the encoder. Each basis function is a callablebasis_input(x: array_like[T, p]) -> np.ndarray[T, p]. -
lambda_e(float) – Penalty parameter for the orthogonality regularization on encoder functional weights. The parameter controls the amount of regularization on encoder functional weights, encouraging within-component functional weights to be orthogonal. -
lambda_d(float) – Penalty parameter for the roughness regularization on encoder functional weights and biases. The parameter controls the amount of smoothness of the encoder functional weights and biases. -
lambda_c(float) –Penalty parameter for the clustering loss in the integrated objective function. A higherlambda_cplaces more emphasis on forming well-separated clusters in the latent space (at the potential cost of reconstruction accuracy). -
t(array-like of shape (T,)) – Time grid (array of length T) over which the input functions, functional weights, functional biases and output functions are evaluated/defined. -
sim_matrix(numpy.ndarray of shape (n_samples, n_samples)) – The pairwise similarity matrix among the$N$ sample paths, a term in the clustering objective function. The similarity matrix is essentially a weightedm-nearest neighbor graph, and the functionNearestNeighborsOpt()will construct the graph with the optimalmvalue.
-
model_summary(self): A generic function used to produce a summary of the trained model.-
Returns
None
-
Returns
-
train(self, coeffs, epochs, learning_rate, batch_size, neighbors_dict, sim_matrix): Model training is performed using the mini-batch gradient descent with momentum.-
epochs(int) – Number of training epochs (full passes over the dataset). Default:100 -
learning_rate(float) – Learning rate of the training algorithm. A smaller value might be used if the loss oscillates or diverges, whereas a larger value could speed up convergence if the loss is stable. Default:1e-3 -
batch_size(int) – The mini-batch size. Default:16 -
neighbors_dict(dict) A dictionary mapping each data point to itsmnearest neighbors. It is obtained from theget_nearest_neighbors()method. -
sim_matrix(numpy.ndarray of shape (n_samples, n_samples)) – The pairwise similarity matrix among the$N$ sample paths.
-
-
predict(coeffs, batch_size): Return the embedded data and the cluster labels of the functional data.-
Returns
-
S(np.ndarray, shape=(n_samples, s)) : Array of the embedded data. -
labels(np.ndarray, shape=(n_samples)) : Functional data cluster labels.
-
-
Returns
ConvexClustering is the path-following homotopy algorithm that produces a hierarchy of clusters and determines the optimal number of clusters via an internal validation metric.
ConvexClustering(
X, # embedded data of shape (n_samples, s)
neighbors_dict,
sim_matrix, # the pairwise similarity matrix
verbose # whether to print out the merging process and the Silhouette scores
)X(np.ndarray, shape=(n_samples, s)) – The embedded data in the latent space.neighbors_dict(dict): ...sim_matrix(np.ndarray, shape (n_samples, n_samples)) – ...verbose(bool, optional): IfTrue, the algorithm will print out the hierarchy of clusters and the corresponding Silhouette scores. Default:False
fit(self): Perform clustering on the embedded data.- Returns
cluster_labels(np.ndarray, shape=(n_samples)) : The (optimal) cluster labels.
- Returns
Below is an example of using FAEclust on a simulated dataset (the pendulum dataset in the paper).
We compute the similarity matrix using the elastic distance (metric='elastic'). After smoothing the data with B-spline basis functions, we configure a functional autoencoder with an encoder-decoder architecture (16 → 8 → 4 (latent) → 8 → 16 → 16 → 16 with functional layers of size 16 at both ends). We train the model for 150 epochs. The output labels from predict are the cluster assignments found by the path-following homotopy algorithm. We print the Adjusted Rand Index (ARI) and Adjusted Mutual Information (AMI) to evaluate how well the predicted clusters match the true clusters (y_true).
In this example, we demonstrate FAEclust on the "Plane" dataset using a similar workflow: similarity computation, smoothing, and clustering via the functional autoencoder.
We compute the similarity matrix using FastDTW (metric='fastdtw'). After building the similarity matrix, we smooth each sample path with B-spline basis functions. The functional autoencoder is configured with the architecture: [32, 16, 8, 16, 32, 32, 32] which produces a latent representation of dimension 8. After 100 epochs, we obtain the final cluster labels.
The above examples and parameters serve as a guide, but users are encouraged to experiment with the basis size (l, m), network depth (layers), and loss weights (lambda_e, lambda_d, lambda_c) to best fit their specific datasets.

