Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



12 Commits

Repository files navigation

Tools for HDLSS data

Our laboratory provides [Tools] for high-dimension, low-sample-size (HDLSS) data. Please read [License] and use tools only if you agree. For more details on the analytical method, please refer to relevant manuals and papers.


Package Installation

From GitHub

Use the following command in the terminal to install packages locally.

git clone


Principal Component Analysis

[R] [Python] [Manual]

The "Noise-Reduction Methodology (NRM)" gives estimators of the eigenvalues, eigenvectors, and principal component scores.

Reference : K. Yata, M. Aoshima, Effective PCA for High-Dimension, Low-Sample-Size Data with Noise Reduction via Geometric Representations, Journal of Multivariate Analysis, 105 (2012) 193-215.
DOI: [10.1016/j.jmva.2011.09.002]

[R] [Python] [Manual]

The "Cross-Data-Matrix (CDM) Methodology" gives estimators of the eigenvalues, eigenvectors, and principal component scores.

Reference : K. Yata, M. Aoshima, Effective PCA for High-Dimension, Low-Sample-Size Data with Singular Value Decomposition of Cross Data Matrix, Journal of Multivariate Analysis, 101 (2010) 2060-2077.
DOI: [10.1016/j.jmva.2010.04.006]

[R] [Python] [Manual]

The "Automatic Sparse PCA (A-SPCA)" gives estimators of the eigenvalues and eigenvectors.

Reference : K. Yata, M. Aoshima, Automatic Sparse PCA for High-Dimensional Data, Statistica Sinica 35 (2025) (in press).
DOI: [10.5705/ss.202022.0319] [Supplement]

Correlation Test

[R] [Python] [Manual]

The "Extended Cross-Data-Matrix (ECDM) Methodology" gives an estimator of $\mathrm{Tr}(\Sigma^2)$, where $\Sigma$ is a covariance matrix. This code tests the correlation coefficient matrix by the ECDM estimator.

Reference : K. Yata, M. Aoshima, High-Dimensional Inference on Covariance Structures via the Extended Cross-Data-Matrix Methodology, Journal of Multivariate Analysis, 151 (2016) 151-166.
DOI: [10.1016/j.jmva.2016.07.011]

Outlier Detection

[R] [Python] [Manual]

The "PC-scores-based Outlier Detection (PC-OD)" identifies outliers based on the PC scores. The algorithm is provided in section 3.2 of Nakayama et al. (2024).

Reference : Y. Nakayama, K. Yata and M. Aoshima, Test for High-Dimensional Outliers with Principal Component Analysis, Japanese Journal of Statistics and Data Science (2024) (in print).
DOI : [10.1007/s42081-024-00255-0]

Discriminant Analysis

[R] [Python] [Manual]

The "Distance-Based Discriminant Analysis (DBDA)" provides high-dimensional discriminant analysis for multiclass data. The algorithm is provided in Aoshima and Yata (2014).

Reference : M. Aoshima and K. Yata, A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data, Annals of the Institute of Statistical Mathematics (2014).
DOI : [10.1007/s10463-013-0435-8]

[R] [Python] [Manual]

The "Geometrical quadratic discriminant analysis(GQDA)" provides high-dimensional discriminant analysis for multiclass data. The algorithm is provided in Aoshima and Yata (2015).

Reference : M. Aoshima and K. Yata, Geometric Classifier for Multiclass, High-Dimensional Data, Sequential Anal, 34, 279-294. (2015).
DOI : [10.1080/07474946.2015.1063256]


Covariance Structures Test


Copyright (C) <2024> <Makoto Aoshima>

This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International license.
To view a copy of this license, visit or
send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Makoto Aoshima, University of Tsukuba