This is the official PyTorch implementation of NUMAP, a new and generalizable UMAP implementation.
See our GitHub repository for more information and the latest updates.
NUMAP can be used to visualize many types of data in a low-dimensional space, while enabling a simple out-of-sample extension. One application of NUMAP is to visualize time-series data, and help understand the process in a given system. For example, the following figure shows the transition of a set of points from one state to another, using NUMAP. In a biological point of view, this can be viewed as a simplified simulation of the cellular differentiation process.
The package is based on UMAP and GrEASE (Generalizable and Efficient Approximate Spectral Embedding). It is easy to use and can be used with any PyTorch dataset, on both CPU and GPU. The package also includes a test dataset and a test script to run the model on the 2 Circles dataset.
The incorporation of GrEASE enables preservation of both local and global structures of the data, as UMAP, with the new capability of out-of-sample extension.
To install the package, simply use the following command:
pip install numap
The basic functionality is quite intuitive and easy to use, e.g.,
from numap import NUMAP
numap = NUMAP(n_components=2) # n_components is the number of dimensions in the low-dimensional representation
numap.fit(X) # X is the dataset and it should be a torch.Tensor
X_reduced = numap.transfrom(X) # Get the low-dimensional representation of the dataset
Y_reduced = numap.transform(Y) # Get the low-dimensional representation of a test dataset
You can read the code docs for more information and functionalities.
In order to run the model on the 2 Circles dataset, you can either run the file, or using the command-line command:
python tests/run_numap.py
This will run NUMAP and UMAP on the 2 Circles dataset and plot the results.