Skip to content

samuelveersingh/FAEclust

Repository files navigation

FAEclust: Cluster Analysis of Multi-Dimensional Functional Data

FAEclust is the first autoencoder architecture designed specifically for clustering multi-dimensional functional data. In FAEclust, we employ univariate functions as weights instead of integral kernels:

  1. Unlike traditional regression models, where integral kernels are used for functional predictors to capture full dependence, our encoder is designed to learn latent representations. Consequently, the univariate functions serve as a coordinate system in the Hilbert space.
  2. Using univariate functions not only reduces computational cost but also improves interpretability. In particular, the shape of the learned functional weights reveals which regions of the input functional data contribute most to the construction of the embedded representations.

Both the encoder and decoder are universal approximators. We introduce a shape-informed clustering objective that is robust to phase variation in functional data, and we develop a path-following homotopy algorithm with complexity O(n log(n)) to obtain the optimal clustering of the latent representations. If you used this package in your research, please cite our paper:

@inproceedings{SVS2025NeurIPS,
 author = {Samuel Singh and Shirley Coyle and Mimi Zhang},
 booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
 editor = { },
 pages = { },
 publisher = {Curran Associates, Inc.},
 title = {Shape-Informed Clustering of Multi-Dimensional Functional Data via Deep Functional Autoencoders},
 volume = { },
 year = {2025}
}

🏗️ The joint network training and clustering framework.

Framework_Schematic

  1. Network Update: In the backward phase, we update the network parameters by minimizing a unified objective function (Loss) that incorporates both the network training objective (Penalized Reconstruction Loss) and the clustering regularization (Clustering Loss).
  2. Cluster Update: During the forward phase, we update the learned latent representations, which necessitates a concurrent update of the clustering results.

🛠️ Core Modules

The modular pipeline for FAEclust has the following structure:

Modular_Pipeline

  1. Similarity: Compute pairwise (elastic) distances with TimeSeriesDistance(), identify the optimal number of nearest neighbors (m in the paper) via NearestNeighborsOpt(), and finally compute pairwise similarity measures.
  2. Smoothing: Convert raw curves into basis functions and expansion coefficients via smoothing_features().
  3. FAE network: Configure and train the functional network via FunctionalAutoencoder().
  4. Convex Clustering: Cluster analysis of the latent representations, where the clustering objective function is a convex function.

Class: Smoothing

Smoothing is a utility class that implements the smoothing step. It supports B-spline, Fourier series, and Wavelet-based smoothing. This class is used internally by smoothing_features().

Smoothing(
    dis_p, 			# number of grid points for evaluating functions 
    fit, 			# basis function type: 'bspline', 'fourier', or Wavelet name 
    n, 				# for Fourier: number of harmonics (2n+1 basis functions) 
    smoothing_str,	# initial smoothing parameter for B-splines if _terms_ is not given
    terms, 			# number of basis terms/knots to use (if None, auto-optimize) 
    wavelet_level, 	# Wavelet decomposition level (for Wavelet fits) if _terms_ is not given
    data = None  	# input data of shape (n_samples, n_features, n_timesteps)
) 

Parameters

  • dis_p (int): Number of grid points for evaluating functions. Default: 300

  • fit (str): The type of basis expansion to use for smoothing. Options are the same as in smoothing_features: 'bspline', 'fourier', or a Wavelet name (e.g., 'db4'). Default: 'bspline'

  • n (int): Applicable if fit='fourier'. It specifies the number of Fourier harmonics to include. The total number of Fourier basis functions will be $2n + 1$ (including the constant term, $n$ cosine terms, and $n$ sine terms). If n=None, the smoothing is adaptive and Generalized Cross-Validation (GCV) is used to find the optimal number of harmonics up to n by GCV. Default: None

  • smoothing_str (float): Parameters for B-spline fitting. This is passed to the spline fitting routine (scipy.interpolate.splrep()) to control the trade-off between smoothness and fidelity: higher values yield smoother curves (more regularization), while s=0 fits the spline through all points (interpolation). If terms (number of knots/basis functions) is not specified for B-splines, this parameter is internally optimized via GCV. Default: 0.3

  • terms (int or None): Applicable if fit='bspline'. The number of basis functions or knots to use. If terms=None, the smoothing is adaptive and class will attempt GCV to find the optimum fit and the corresponding terms. Default: None

  • wavelet_level (int): The level of decomposition for Wavelet smoothing. Higher levels capture coarser structures. If terms=None, and fit is a Wavelet, the code attempts to find an optimal level via GCV. Default: 5

  • data (np.ndarray, shape (n_samples, n_features, n_timesteps)): The raw sample paths to smooth.

Returns

  • coeffs (np.ndarray, shape=(n_samples, n_features, m_basis)): Array of basis expansion coefficients.

  • fn_s (list of callables): List of the smoothed functions evaluated/defined on the time grid.

  • smoothing_basis (list of callables): The list of basis functions used for smoothing the raw sample paths.

Class: TimeSeriesDistance

TimeSeriesDistance computes the pairwise distance matrix for the raw sample paths.

TimeSeriesDistance(
    X, 				# raw sample paths of shape (n_samples, n_features, n_timesteps)
    metric,         # distance metric to use ('fastdtw' or 'elastic') 
    n_jobs  		# number of parallel jobs for computation 
) 

Parameters

  • X (np.ndarray, shape=(n_samples, n_features, n_timesteps)): An array containing the raw multi-dimensional functional data. Functional data should be standardized before distance computation to ensure comparability.

  • metric (str): Metric to use for distance computation. Options include:

    • 'fastdtw': Distance measure using the dynamic time warping method.

    • 'elastic' (default): The elastic distance metric.

  • n_jobs (int): Number of parallel jobs for computation. Default: -1

Methods

  • compute_distances(self): Compute the pairwise distance matrix.

    • Returns
      • dist_matrix (np.ndarray, shape (n_samples, n_samples)) : The pairwise distance matrix.
  • plot_extremes(self): Visualise the pair of most similar and the most distinct samples.

    • Returns
      • None.

Class: NearestNeighborsOpt

NearestNeighborsOpt is a utility class for determining the optimal number of nearest neighbors (m) used in the pairwise similarity measure of the clustering objective function. Given a pairwise distance matrix, it examines how the structure of the k-nearest neighbor graph evolves as k varies, using two complementary criteria: the distance "knee" and graph connectivity.

NearestNeighborsOpt(
    dist_matrix 			# pairwise distance matrix of shape=(n_samples, n_samples)
) 

Parameters

  • dist_matrix (np.ndarray, shape (n_samples, n_samples)) : Distance matrix returned by TimeSeriesDistance().

Methods

  • estimate_optimal_m(method='avg_distance', max_m=None): Select the optimal number of nearest neighbors.

    • method (str) : Method to use for neighbourhood optimization. Options include:
      • 'avg_distance' (default) : Find the $m$ at which the average m-th neighbor distance exhibits the largest jump or knee.
      • 'connectivity' : Find the smallest $m$ at which the k-NN graph is fully connected.
    • max_m (int) : Maximum number of neighbors to consider. Default max_m=n_samples-1
    • Returns
      • m (int) : The optimum number of nearest neighbors.
  • get_nearest_neighbors(opt_m=m): Construct the adjacency list (neighbor index list) for each data point given the neighborhood value m.

    • m (int): Estimated using estimate_optimal_m().
    • Returns
      • neighbors_dict (dict) : A dictionary mapping each data point to its m nearest neighbors.
  • compute_similarity(neighbors_dict): Calculate the pairwise similarities from the distance matrix and the the k-NN graph.

    • neighbors_dict (dict) : ...
    • Returns
      • sim_matrix (np.ndarray, shape (n_samples, n_samples)): The pairwise similarity matrix (n×n).

Class: FunctionalAutoencoder

FAEclust is a deep learning framework for clustering multivariate functional data. It integrates three key components: (1) functional data smoothing via basis function expansion (e.g. B-splines, Fourier series, Wavelet family, ...) to provide a smooth representation of each sample path, (2) a functional autoencoder consisting of an encoder for learning complex relationships among the features and a decoder for flexibly reconstructing intricate functional patterns, and (3) a shape-informed convex clustering algorithm that automatically determines the optimal number of clusters. The FunctionalAutoencoder class is designed to handle these steps end-to-end, while providing various hyperparameters to tailor the model to different datasets.

FunctionalAutoencoder(
    p,                      # number of component random functions (dimensions)
    layers,                 # list specifying encoder/decoder layer width
    l_basis,                # number of basis functions for encoder functional weights
    m_basis,                # number of basis functions for smoothing the sample paths
    basis_smoothing,        # list of basis functions used for smoothing (e.g. Fourier basis)
    basis_input,            # list of basis functions for encoder functional weights (e.g. B-spline basis)
    lambda_e,               # penalty parameter for the orthogonality regularization on encoder functional weights
    lambda_d,               # penalty parameter for the roughness regularization on encoder functional weights and biases
    lambda_c,               # penalty parameter for the clustering loss in the integrated objective function
    t,                      # time grid (array of length T) over which the smoothed functions are evaluated
    sim_matrix              # pairwise similarity matrix (n×n) in the clustering objective function
)

Parameters

  • p (int) – Number of component random functions (dimensions). For example, p=2 for two-dimensional functional data.

  • layers (list of int) – Architecture specification for the autoencoder’s layers. This list should include the width of each encoder layer, the latent dimension, and the width of each decoder layer. The format is [q1, q2, ..., s, ..., Q2, Q1, Z1, Z2], where:

    • q1 is the number of nodes in the encoder's first hidden layer.

    • q2, ..., s are the sizes of the successive hidden layers in the encoder, with s being the final latent dimension (the size of the bottleneck vector).

    • ..., Q2 are the sizes of the dense layers in the decoder (mirroring the encoder’s dense layers).

    • Q1, Z1, Z2 are the sizes of the last three layers of the decoder that output functions.

  • l_basis (int) – Functional weights and biases are represented as linear combinations of basis functions. l_basis is the number of basis functions for the functional weights in the encoder.

  • m_basis (int) – Number of basis functions used for converting the raw sample paths into smooth functions.

  • basis_smoothing (list of callables) – A list of m_basis basis functions (evaluated on the time grid t) for smoothing the raw sample paths. Each basis function is a callable basis_smoothing(x: array_like[T, p]) -> np.ndarray[T, p]The provided utility smoothing_features() can generate this list along with the expansion coefficients.

  • basis_input (list of callables) – A list of l_basis basis functions (evaluated on the time grid t) for representing the functional weights in the encoder. Each basis function is a callable basis_input(x: array_like[T, p]) -> np.ndarray[T, p].

  • lambda_e (float) – Penalty parameter for the orthogonality regularization on encoder functional weights. The parameter controls the amount of regularization on encoder functional weights, encouraging within-component functional weights to be orthogonal.

  • lambda_d (float) – Penalty parameter for the roughness regularization on encoder functional weights and biases. The parameter controls the amount of smoothness of the encoder functional weights and biases.

  • lambda_c (float) –Penalty parameter for the clustering loss in the integrated objective function. A higher lambda_c places more emphasis on forming well-separated clusters in the latent space (at the potential cost of reconstruction accuracy).

  • t (array-like of shape (T,)) – Time grid (array of length T) over which the input functions, functional weights, functional biases and output functions are evaluated/defined.

  • sim_matrix (numpy.ndarray of shape (n_samples, n_samples)) – The pairwise similarity matrix among the $N$ sample paths, a term in the clustering objective function. The similarity matrix is essentially a weighted m-nearest neighbor graph, and the function NearestNeighborsOpt() will construct the graph with the optimal m value.

Methods

  • model_summary(self): A generic function used to produce a summary of the trained model.

    • Returns
      • None
  • train(self, coeffs, epochs, learning_rate, batch_size, neighbors_dict, sim_matrix) : Model training is performed using the mini-batch gradient descent with momentum.

    • epochs (int) – Number of training epochs (full passes over the dataset). Default: 100
    • learning_rate (float) – Learning rate of the training algorithm. A smaller value might be used if the loss oscillates or diverges, whereas a larger value could speed up convergence if the loss is stable. Default: 1e-3
    • batch_size (int) – The mini-batch size. Default: 16
    • neighbors_dict (dict) A dictionary mapping each data point to its m nearest neighbors. It is obtained from the get_nearest_neighbors() method.
    • sim_matrix(numpy.ndarray of shape (n_samples, n_samples)) – The pairwise similarity matrix among the $N$ sample paths.
  • predict(coeffs, batch_size): Return the embedded data and the cluster labels of the functional data.

    • Returns
      • S (np.ndarray, shape=(n_samples, s)) : Array of the embedded data.
      • labels (np.ndarray, shape=(n_samples)) : Functional data cluster labels.

Class: ConvexClustering

ConvexClustering is the path-following homotopy algorithm that produces a hierarchy of clusters and determines the optimal number of clusters via an internal validation metric.

ConvexClustering(
    X, 				# embedded data of shape (n_samples, s) 
    neighbors_dict, 	 
    sim_matrix, 	# the pairwise similarity matrix
    verbose		  	# whether to print out the merging process and the Silhouette scores 
)

Parameters

  • X (np.ndarray, shape=(n_samples, s)) – The embedded data in the latent space.
  • neighbors_dict (dict): ...
  • sim_matrix (np.ndarray, shape (n_samples, n_samples)) – ...
  • verbose (bool, optional): If True, the algorithm will print out the hierarchy of clusters and the corresponding Silhouette scores. Default: False

Methods

  • fit(self): Perform clustering on the embedded data.
    • Returns
      • cluster_labels (np.ndarray, shape=(n_samples)) : The (optimal) cluster labels.

Code Example

Example 1: Clustering Synthetic Data

Below is an example of using FAEclust on a simulated dataset (the pendulum dataset in the paper).

🔗 view the full notebook

We compute the similarity matrix using the elastic distance (metric='elastic'). After smoothing the data with B-spline basis functions, we configure a functional autoencoder with an encoder-decoder architecture (16 → 8 → 4 (latent) → 8 → 16 → 16 → 16 with functional layers of size 16 at both ends). We train the model for 150 epochs. The output labels from predict are the cluster assignments found by the path-following homotopy algorithm. We print the Adjusted Rand Index (ARI) and Adjusted Mutual Information (AMI) to evaluate how well the predicted clusters match the true clusters (y_true).

Example 2: Clustering Real Data

In this example, we demonstrate FAEclust on the "Plane" dataset using a similar workflow: similarity computation, smoothing, and clustering via the functional autoencoder.

🔗 view the full notebook

We compute the similarity matrix using FastDTW (metric='fastdtw'). After building the similarity matrix, we smooth each sample path with B-spline basis functions. The functional autoencoder is configured with the architecture: [32, 16, 8, 16, 32, 32, 32] which produces a latent representation of dimension 8. After 100 epochs, we obtain the final cluster labels.

The above examples and parameters serve as a guide, but users are encouraged to experiment with the basis size (l, m), network depth (layers), and loss weights (lambda_e, lambda_d, lambda_c) to best fit their specific datasets.

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •