This repo contains code related to Power analysis for spatial omics
If you use this in your work, please cite: Power analysis for spatial omics. Ethan Alexander García Baker, Denis Schapiro, Bianca Dumitrascu, Sanja Vickovic, Aviv Regev bioRxiv 2022.01.26.477748; doi: https://doi.org/10.1101/2022.01.26.477748
codex_spleen/
:Spleen_IST_generation.ipynb
: IST generation for mouse spleen, Figure 2kSpleen_NegBinom.ipynb
: Sampling experiments for spleen, Figure 2lProspectivePower_2mn.ipynb
: Prospective power analysis, Figure 2m-nFigS7_2021-12-03.ipynb
: FOV size experiments and visualization, Supplementary Figure 7Spleen_Cohort_Comparison.ipynb
: Interaction enrichment statistic, Supplementary Figure 8spleen_binning.ipynb
: Resolution analysis for spleen dataset, Figure 3spleen_multisample.ipynb
: Analysis on the impact of inclusion of multiple samples on power, Figure 3spleen_data/
: Contains support files.
simulation/
:FigureS3.ipynb
: Sampling experiments for synthetic data. Supplementary Figure 3FigureS4Heatmaps-2021-12-02.ipynb
: Clustering experiments for ISTs, Supplementary Figure 4
osmfish_cortex/
osmfish_generation.ipynb
: IST generation for mouse cortex, Figure 2gNB_Cell_Discov_clean.ipynb
: Cell type discovery sampling experiments, Figure 2h, Supplementary Figure 6ddata/
: Contains support files
hdst_breastcancer/
BreastCancer_IST_Generation.ipynb
: IST generation for breast cancer, Figure 2cBreastCancer_NB.ipynb
: Cell type discovery sampling experiments, Figure 2dHDST_binning.ipynb
: Resolution analysis for HDST breast cancer data, Figure 3BreastCancer_multisample.ipynb
: Analysis on the impact of inclusion of multiple samples on power, Figure 3data/
: Contains support files
spatialpower/
: Package for IST generation and supporting analysisscripts/
: Contains support scripts for other analysesgenerate_tiles.py
: Generates tiles for shuffling analysis corresponding to Figure 2m-nrandom_self_pref_cluster.py
: Generates ISTs for clustograms in Supplementary Figure 4.
We provide a clone of the conda environment used to generate these results in the env.yml file. To install the environment, use conda env create -n <environment name> --file env.yml
.
We provide a command line Python tool to generate tissue in silico.
Generating an IST requires knowledge of two parameters: a vector describing the abundance of the k cell types, p, and the k x k matrix describing probability that two cell types are directly adjacent, H. These objects are described in the paper. We suggest that p and H are estimated from pilot data; the IST generation notebooks above illustrate how one might do this.
The generalized steps for the construction of the IST are:
- Generate a tissue scaffold
- Estimate p and H
- Label the tissue scaffold.
To construct a tissue scaffold, execute random_circle_packing.py
:
python spatialpower/tissue_generation/random_circle_packing.py -x 1000 -y 1000 -o sample_results
Key arguments that can be adjusted to tune the circle packing are -x
and -y
, which control the width and height, respectively, of the rectangular tissue area, and --rmin
and --rmax
which control the minimum and maximum radius, respectively, of the circles that are packed within the bounding rectangle to generate the random planar graph.
A full enumeration of the arguments, including controls for visualization and export, is available using -h
flag.
We suggest obtaining values for p and H from a pilot experiment or estimating them by prior knowledge.
We provide a function for the efficient computation of H given the adjacency matrix, A and one-hot encoded assignment matrix B for a graph representation of a tissue (real or simulated). To calculate H:
import spatialpower.neighborhoods.permutationtest as perm_test
H = perm_test.calculate_neighborhood_distribution(A, B)
The cell type abundance p can be easily computed using the one-hot encoded assignment matrix B:
p = np.sum(B, axis=0)/np.sum(np.sum(B, axis=0))
To perform a labeling of the tissue scaffold using the optimization approach:
import spatialpower.tissue_generation.assign_labels as assign_labels
cell_assignments = assign_labels.optimize(A, p, H, learning_rate=1e-5, iterations = 10)
To perform a labeling of the tissue scaffold using the heuristic approach:
import networkx as nx
import spatialpower.tissue_generation.assign_labels as assign_labels
G = nx.from_numpy_array(A)
cell_assignments = assign_labels.heuristic_assignment(G, p, H, mode='graph', dim=1000, position_dict=position_dict, grid_size=50, revision_iters=100, n_swaps=25)
We provide simulated_tissue.ipynb
as an example of how to use our tissue generation method on a theoretical tissue (e.g. for testing methods to recover a specific spatial feature) following this approach above.
Additionally, we provide example usage of our approach for tissue generation. See the osmfish_cortex/osmfish_generation.ipynb
file for a complete example with generated tissue from raw osmFISH data. The run time should be below 30 minutes for the full notebook on a modern laptop.
Our work provides a general framework for the considerations that should be taken into account in spatial experimental design. In our manuscript, we consider experiments to detect several spatial features, including the discovery of a cell type of interest and the detection of cell-cell interactions.
We examine experiments to discover these spatial features as illustrative examples of our general framework; we encourage individual users to adapt these approaches for their particular question of interest.
In general, the overall procedure for cell type discovery is:
- Obtain pilot data
- Estimate model parameters
- Calculate probability of detecting cell type of interest given some level of sampling (e.g. number of cells or FOVs sampled)
We provide notebooks implementing this framework for the three differently-structured data sets discussed in our manuscript:
FigureS3.ipynb
: Sampling experiments for synthetic data. Supplementary Figure 3Spleen_NegBinom.ipynb
: Sampling experiments for spleen, Figure 2lNB_Cell_Discov_clean.ipynb
: Cell type discovery sampling experiments, Figure 2h, Supplementary Figure 6dBreastCancer_NB.ipynb
: Cell type discovery sampling experiments, Figure 2d
As discussed in the manuscript, we suggest the detection of cell-cell interactions via a permutation test, which we provide code for in spatialpower/neighborhoods/permutationtest.py
.
Additionally, we show an illustrative example of this analysis in:
FigS7_2021-12-03.ipynb
: FOV size experiments and visualization, Supplementary Figure 7ProspectivePower_2mn.ipynb
: Prospective power analysis, Figure 2m-n
We introduce the interaction enrichment statistic, which is implemented in the functions calculate_enrichment_statistic
and z_test
in the module spatialpower/neighborhoods/permutationtest.py
.
We provide an illustrative example of the IES test as implemented in our manuscript :
Spleen_Cohort_Comparison.ipynb
: Interaction enrichment statistic, Supplementary Figure 8