2dShapeSpacePortable is a structured, self-contained pipeline for quantifying the shape of cells and nuclei from 2D fluorescence microscopy images and mapping how proteins distribute within those shapes. Starting from pre-segmented masks, it encodes each cell's contour as Fourier coefficients, reduces the population to a compact set of principal shape modes, and measures per-cell protein intensity in a shape-aware coordinate system — making it possible to compare protein localizations across the cell shape space.
This portable version has been created to be a more generalizable version of the research work performed by Trang Le and Will Leineweber in the Cell Shape paper: https://www.sciencedirect.com/science/article/pii/S2405471226000712?via%3Dihub
- Python 3.10 or higher (tested on 3.11 and 3.12)
- The GUI (
GUI.py) additionally requirespython3-tk:- Linux:
sudo apt-get install python3-tk - macOS: included with the standard Python installer
- Windows: included with the standard Python installer
- Linux:
Setting up the virtual environment
Navigate to the directory where you want to install the project:
cd /path/to/your/working/directory
Create and activate a virtual environment:
python3 -m venv 2dShapeSpacePortable
source 2dShapeSpacePortable/bin/activate # Linux / macOS
2dShapeSpacePortable\Scripts\activate.bat # Windows
Place the project files inside the virtual environment directory, then install all dependencies:
pip install -r requirements.txt
The pipeline answers a practical question in cell biology: given a population of cells, what are the characteristic shapes they adopt, and where does a protein of interest tend to sit within those shapes?
Each pipeline step addresses one part of that question:
Step 1 — FFT coefficients (fftcoeff_step)
The outline of each cell and its nucleus is traced and described mathematically using Fourier coefficients. This converts an irregular biological shape into a compact, comparable set of numbers, while also recording basic cell statistics such as area and protein intensity. Cells whose nucleus-to-cell size ratio is abnormally large (likely segmentation errors) are flagged and excluded from downstream steps.
Step 2 — Shape modes (shapemode_step)
Principal Component Analysis (PCA) is applied to the Fourier coefficients of all cells together. This identifies the main "axes of variation" in shape across the dataset — the most common ways in which cells differ from the average cell. The result is a low-dimensional shape space where each cell occupies a position, and the principal components (PC1, PC2, …) each describe an interpretable shape deformation. An average cell shape is also computed and saved.
Step 3 — Protein parametrization (protparam_step)
For each cell, the protein fluorescence signal is sampled in a shape-aware coordinate system anchored to the cell and nuclear contours. Two modes are available: rings samples the protein along concentric isocontours interpolated between the nucleus and cell membrane, producing a compact 2D intensity map; warp morphs the cell image into the average cell shape using thin-plate spline warping, so all cells can be directly compared pixel-by-pixel. Per-PC-bin averages are then computed, showing the typical protein distribution for cells of each shape.
Step 4 — Comparison (comparison_step)
If cells carry a known protein location label (e.g. cytoplasm, nucleus, vesicles), this step groups them by label and computes per-label average intensity maps within each PC bin. A Pearson correlation heatmap is then generated comparing all location labels against each other, revealing which protein patterns co-vary across the shape space.
Activate your virtual environment, navigate to the project directory, and run either:
python GUI.py # graphical launcher — configure and run from a form
python process.py # command-line launcher — uses config.yaml and optional CLI flags
Input data
Each cell must be provided as three separate single-channel grayscale images:
- a nucleus mask (binary or label image)
- a cell mask (binary or label image)
- a protein channel image (raw fluorescence intensity)
All images for a given cell should have the same pixel dimensions.
path_list.csv
The pipeline reads the list of cells to process from path_list.csv in the project directory. Each row describes one cell:
#image_id,nuclei_mask,cell_mask,protein,location
cell_001,input/cell_001_nuc.png,input/cell_001_cell.png,input/cell_001_prot.png,Nucleus
cell_002,input/cell_002_nuc.png,input/cell_002_cell.png,input/cell_002_prot.png,Cytoplasm
| Column | Description |
|---|---|
image_id |
Unique identifier for the cell; used as filename stem throughout all outputs |
nuclei_mask |
Path to the nucleus mask image (relative to the project directory) |
cell_mask |
Path to the cell mask image (relative to the project directory) |
protein |
Path to the protein fluorescence channel image |
location |
Subcellular location label for the protein (e.g. Nucleus, Cytoplasm). Used by the comparison step; set to any placeholder if unknown |
Lines beginning with # are treated as comments and ignored.
Configuration
All parameters can be set in config.yaml, overridden by CLI flags, or configured interactively via the GUI. Priority order: CLI flags > config.yaml > built-in defaults.
General
| Parameter | Default | Description |
|---|---|---|
output_dir |
results |
Directory where all output files are written; created if it does not exist |
plot |
True |
Generate intermediate diagnostic plots alongside the main outputs |
seed |
0 |
Random seed passed to PCA for reproducibility |
FFT coefficients step
| Parameter | Default | Description |
|---|---|---|
fftcoeff_step |
True |
Run the FFT coefficient extraction step |
n_coeffs |
128 |
Number of Fourier coefficients used to describe each contour; higher values capture finer shape detail at the cost of increased dimensionality |
alignment |
fft_major_axis_polarized |
Contour alignment method before coefficient extraction. fft_major_axis rotates the cell to align its longest axis horizontally; fft_major_axis_polarized additionally flips the cell so the nucleus is always on the same side; fft_centroid aligns based on the centroid position only |
dismiss_ratio |
8 |
Cells with a cell-area-to-nucleus-area ratio above this threshold are excluded from the shape modes step as likely segmentation artefacts |
Shape modes step
| Parameter | Default | Description |
|---|---|---|
shapemode_step |
True |
Run the PCA shape modes step; requires FFT coefficients output |
Protein parametrization step
| Parameter | Default | Description |
|---|---|---|
protparam_step |
True |
Run the protein parametrization step; requires FFT coefficients output |
protparam_mode |
rings |
Parametrization method. rings samples protein intensity along concentric isocontours between nucleus and cell membrane (fast, shape-independent). warp morphs each cell into the average cell shape using thin-plate spline warping before sampling (slower, requires shape modes output, enables direct pixel-level comparison) |
Comparison step
| Parameter | Default | Description |
|---|---|---|
comparison_step |
False |
Run the location comparison step; requires protein parametrization and shape modes output, and meaningful location values in path_list.csv |
All outputs are written under output_dir (default: results/).
FFT coefficients step → results/shapespace/
| File | Description |
|---|---|
fft_coeffs.csv |
One row per cell containing: image, nuc_area, cell_area, prot_int_sum_nuc, prot_int_sum_cell, theta (alignment angle), centroid_y, centroid_x, e_c (cell eccentricity), e_n (nucleus eccentricity), followed by n_coeffs × 4 Fourier coefficient columns (nucleus x/y, cell x/y) |
{image_id}_fft_reconstruction.png (if plot=True) |
Overlay of the original contour and the FFT reconstruction for visual quality control |
Shape modes step → results/shapemode/
| File | Description |
|---|---|
Avg_cell.npz |
Numpy archive with the average cell contour points (ix_n, iy_n, ix_c, iy_c) used by the warp mode and downstream steps |
Avg_cell.jpg |
Plot of the average nucleus and cell membrane contours |
PCA_scree.jpg |
Scree plot showing explained variance per PC with cumulative threshold markers |
shapevar_PC{n}.png |
Strip of 7 shape outlines showing the cell deformation along PCn from −1.5 to +1.5 standard deviations |
shapevar_PC{n}.gif |
Animated version of the shape variation strip |
shapevar_PC{n}_hist.jpg |
Histogram of per-cell PCn scores with bin boundary markers |
shapevar_PC{n}.npz |
Raw nucleus and membrane contour arrays for each variation step along PCn |
cells_assigned_to_pc_bins.json |
Dictionary mapping each PC to a list of 7 bins, each containing the image_id values of cells assigned to that bin |
Protein parametrization step → results/protparam/
| File | Description |
|---|---|
{image_id}_protein.npy (rings mode) |
2D array of sampled protein intensities; rows are isocontours from nucleus to membrane, columns are points along the contour |
{image_id}_protein_interpolation.png (rings mode, plot=True) |
Diagnostic image showing the rotated cell, protein channel, isocontour sampling grid, and sampled intensities |
{image_id}_warp.png (warp mode) |
Protein channel image warped into the average cell shape coordinate system |
{image_id}_warp_plot.png (warp mode, plot=True) |
Five-panel diagnostic showing the original shape, resized images, and each warping stage |
avg/{PC}_bin{idx}_protein.npy (rings mode) |
Average intensity map across all cells in PC bin idx |
avg/{PC}_bin{idx}_protein.png (rings mode, plot=True) |
Heatmap of the averaged intensity map |
avg/{PC}_bin{idx}_warp.png (warp mode) |
Average warped protein image across all cells in PC bin idx |
Comparison step → results/comparison/
| File | Description |
|---|---|
avg_by_location/{PC}_bin{idx}_{location}.npy |
Average protein intensity array for cells of a given location label within a PC bin |
heatmaps/{PC}_bin{idx}_pearsonr.csv |
Pairwise Pearson correlation matrix between all location labels for a given PC bin |
heatmaps/{PC}_bin{idx}_pearsonr.png (if plot=True) |
Annotated heatmap of the correlation matrix |