Skip to content

anthbapt/MOSAIK

                

MOSAIK

Documentation Status

Please cite the following paper if you use MOSAIK in your research: A.Baptista et al., MOSAIK: Multi-Origin Spatial Transcriptomics Analysis and Integration Kit, arXiv:2505.11384 (2025)

⭐ New: FOVs Stitching and Resegmentation with MOSAIK Stitch CosMx Field of Views (FOVs) and resegment your samples (CosMx and Xenium support) using MOSAIK's enhanced pipeline. The workflow preprocesses images, applies Cellpose-SAM for segmentation, and uses geometric features to automatically remove artifacts. These features can be found in src/resegmentation folder.

Update 06/02/2026: Update of the resegmentation in two steps (CellposeSAM + Proseg) for both CosMx and Xenium. Possibility to integrate MERSCOPE data with MOSAIK (the segmentation will be released soon)

Update 24/11/2025: Two new jupyter notebook have been upload to notebooks/.. to replicate the analysis done in the arXiv preprint for both CosMx and Xenium. The data can be downloaded at the following links: DOI 10.5281/zenodo.15365592

Update 03/11/2025: The Xenium reader has been updated in order to be aligned with the new naming of the channels in the morphology_focus folder. This update solve the following error:

- ValueError: Expected files in the morphology focus directory to be named as morphology_focus_0000.ome.tif to morphology_focus_0003.ome.tif, found {'ch0000_dapi.ome.tif', 'ch0001_atp1a1_cd45_e-cadherin.ome.tif', 'ch0003_alphasma_vimentin.ome.tif', 'ch0002_18s.ome.tif'}

📝 Introduction

At King's College London, the Spatial Biology Facility utilises both CosMx and Xenium technologies. We have identified a gap in the integration of spatial transcriptomics data, from raw output to Python-compatible formats, regardless of whether the data originates from 10X Xenium or NanoString CosMx instruments.

To address this, we have developed MOSAIK (Multi-Origin Spatial Transcriptomics Analysis and Integration Kit): a dynamic, evolving workflow designed to fully harness the power of spatial biology and support seamless multimodal integration.

MOSAIK is built to adapt to the latest versions of key platform providers, including 10X Genomics and NanoString, while staying aligned with emerging trends in the spatial biology community.

In the following section, we will guide you through the process of using MOSAIK for data integration and analysis.

Numerical materials included in MOSAIK

  • Napari v0.4.17 + CosMx plugin (napari/napari_CosMx-0.4.17.3-py3-none-any.whl)
  • CosMx/Xenium conda environment (env/..)
  • CosMx/Xenium Reader (src/qc/..)
  • CosMx/Xenium QC scripts (src/qc/..)
  • CosMx/Xenium notebooks (notebooks/..)
  • Tools
    • export_tiff.py: This script converts CosMx spatial transcriptomics image data from Zarr format to multiscale OME-TIFF. It supports selecting specific imaging channels, optionally includes segmentation label edge maps, and embeds spatial metadata such as pixel size and contrast limits. Channels are grouped into batches, processed with Dask for scalability, and written to OME-TIFF using tifffile, producing a tiled, compressed, pyramid image stack.
    • make_composite_revised_image.py: This script processes layered 2D morphology TIFF images by extracting individual channels, converting them to 8-bit grayscale, applying optional autocontrast, and generating colorized composite images. It validates user input for clipping percentage and output format, ensures required folders are created, and handles exceptions for folder conflicts or invalid input. The script supports JPG, PNG, and TIFF formats, and creates both raw and enhanced composite outputs using predefined color mappings.
    • remove_slice_tiff.py: This script removes a specific slice (index 4) from each multi-layer .TIF file in a folder and saves the modified files to an output directory.
    • CellLabels.sh: This bash script copies all CellLabels *.tif files from subdirectories into a single destination folder, extracts compressed .csv.gz files, and renames any *-polygons.csv files by replacing the hyphen with an underscore for consistency.
    • renaming_composite.sh: This bash script renames files in a specified directory by removing _composite or _composite_autocontrast from filenames and replacing the first occurrence of F with CellComposite_F, then outputs the changes made.
    • Xenium_data.xlsx: Xlsx table that explain each xenium data output and which one is important for the visualisation, analysis, and re-preprocessing.

MOSAIK workflow

Workflow

📋 Integrating the data step by step

💡 Alternatively to running cosmx_QC.py or xenium_QC.py (localised in src/qc/..), users interested in a more interactive approach can use the notebooks CosMx_Spatial_Analysis_Workflow.ipynb and Xenium_Spatial_Analysis_Workflow.ipynb (localised in notebooks/..) to proceed to the following steps.

⚠️ If you opt to run the notebook, we recommend running it from the project directory.

CosMx

  1. Export the data from the AtoMx platform. The export should include all flatFiles and DecodedFiles (Formerly referred to as RawFiles during export.)

    • AnalysisResults
    • CellStatsDir (which include Morphology2D folder)
    • RunSummary
      (In AtoMx v1.4, this includes all RawFiles. In earlier versions, it includes all RawFiles except the Spot files.)
  2. Now, the raw images should be included in the flatFiles folder. To do this, you can use the bash script CellLabels.sh available in the tools/ folder of the GitHub repository.

  3. Firstly, you have to change the SOURCE_DIR and the DEST_DIR variables in the script to match your own directories.

  4. Secondly, make the script executable and run it using the following commands:

    chmod +x CellLabels.sh
    ./CellLabels.sh
  5. This will create a CellLabels folder inside the flatFiles folder.

  6. The last folder to include is the CellComposite folder from Morphology2D. Depending on your choice:

    • If you opt for a composite .jpg image (CellComposite)
    • Or for raw multichannel .TIF images (Morphology2D, approximately 200 times larger than composite images)

    Both folders are located in the RawFiles directory, under the subfolder named CellStatsDir.

  7. If the CellComposite folder is not present or is unsatisfactory, and you don't want to use raw multichannel .TIF images (Morphology2D), a new one can be created using the Python script by specifying the folder that contains the TIF images:

    python tools/make_composite_revised_image.py

When the code finishes running, it generates multiple folders inside the Morphology2D directory. The relevant folders are either composite or composite_autocontrast. Whichever one you choose, you need to rename it to CellComposite, and move into the flatFiles folder. You can use the renaming_composite.sh script to rename the images in a suitable and consistent format. This script is in the tools/ folder.

  1. When the flatFiles folder is ready, i.e., enriched with the CellComposite/Morphology2D folder and the CellLabels folder, you can either import and create the .zarr object with the Python code src/qc/cosmx_QC.py.
  2. Create the Napari visualisation (see the section How to use Napari).
  3. The Napari visualisation can be used to proceed with the QC using the same code as the one used for the .zarr object creation (src/qc/cosmx_QC.py), especially for defining the different sample FOVs.
  4. You can create and/or import the .zarr object with the Python code src/qc/cosmx_QC.py.
  5. ❗The code src/qc/cosmx_QC.py needs to call functions from the following libraries:
  • src/qc/sbf.py
  • src/qc/cosmx.py
  • src/qc/_constants.py

⚠️ To run cosmx_QC.py without any issues, we recommend creating a conda environment based on the YAML file located in the env folder, and then running the following command in a terminal:

 conda env create -f cosmx.yml

Xenium

  1. Export the data from the Xenium instrument, the folder contains a lot of files that are described in tools/Xenium_data.xlsx file of the GitHub repository.
  2. Exploring the data with Xenium Explorer
  3. You can create and/or import the .zarr object with the Python code src/qc/xenium_QC.py.
  4. ❗The code src/qc/xenium_QC.py needs to call functions from the following libraries:
  • src/qc/sbf.py
  • src/qc/xenium.py
  • src/qc/_constants.py
  1. When the .zarr object is created, you can proceed with the QC using the same code as before (src/qc/xenium_QC.py).

⚠️ To run xenium_QC.py without any issues, we recommend creating a conda environment based on the YAML file located in the env folder, and then running the following command in a terminal:

 conda env create -f xenium.yml

🔬 How to use Napari

Install Napari and CosMx plugin

  1. Install Napari 0.4.17.

⚠️ We recommend creating a conda environment based on the YAML file (napari.yml) located in the env folder. This environment will install Napari 0.4.17 and all required dependencies during its creation. To create the corresponding conda environment, run the following command in a terminal:

 conda env create -f napari.yml
  1. Launch Napari from the terminal by typing napari, using the previously created conda environment.

  2. Open the IPython console (symbol ">_").

  3. Install the CosMx plugin:

    pip install napari_CosMx-0.4.17.3-py3-none-any.whl

Use CosMx plugin

  1. Drag the napari_cosmx_launcher folder into the Napari window. You can download it from this link
  2. In the right panel, select the parent folder that contains your CosMx raw data, this folder should include the following folders: AnalysisResults, CellStatsDir, RunSummary.
  3. Choose the output folder.
  4. Click the "Stitch" button.
  5. Wait for the stitching to finish (the only indicator is the loading cursor).
  6. After stitching, the output directory will contain:
  • an images folder with all FOVs
  • a targets.hdf5 file with the transcripts
  1. Restart Napari 0.4.17 and drag the project folder into the window.

  2. Once loaded, use the panels to explore:

    • Morphology Images: Add fluorescent channels.
    • RNA Transcripts: Add transcripts.
    • Layer list: Manage transcripts, channels, and segmentation.
    • Layer controls: Adjust visualisation.

🗺️ SpatialData object overview

The SpatialData object forms the foundation for analysing spatial omics data.

Core concepts

A SpatialData object is a container for Elements, either:

  • SpatialElements:
    • Images: e.g. H&E stains.
    • Labels: Segmentation maps.
    • Points: Transcripts or landmarks.
    • Shapes: Cell/nucleus boundaries or ROIs.
  • Tables: Annotate spatial elements or store metadata (non-spatial).

Categories

  • Rasters: Pixel-based data (Images, Labels)
  • Vectors: Geometry-based data (Points, Shapes)

Transformations

  • Vectorisation: Converts Labels → Shapes (shapely polygons)
  • Rasterisation: Converts Shapes/Points → Labels (2D image representation)

You can explore a SpatialData object visually using the spatialdata-napari plugin.

For tutorials, see the spatialdata-napari documentation.

SpatialData Object

Resegmentation

New: FOV Stitching and Resegmentation with MOSAIK

MOSAIK now supports stitching of CosMx Field of Views (FOVs) and provides comprehensive resegmentation capabilities for both CosMx and Xenium samples. The enhanced pipeline includes:

  • FOV Stitching: Seamlessly combine multiple fields of view into a single spatial dataset for whole-sample analysis
  • Image Preprocessing: Intelligent channel selection and rescaling to optimize image quality for segmentation
  • Cellpose-based Segmentation: Leverages Cellpose with customizable parameters (flow threshold, cell probability threshold, tile overlap) for accurate cell boundary detection
  • Artifact Removal: Applies geometric feature-based quality control metrics to automatically filter out segmentation artifacts
  • SpatialData Integration: Automatically updates your spatial transcriptomics data with improved segmentation masks

The workflow dramatically improves segmentation quality compared to default methods (see figure), enabling more accurate downstream spatial analysis and cell-level quantification.

Workflow

Key Functions: stitching(), Resegmentation(), qc_metric(), qc_plot(), filter_norm()

Downstream analysis

Working with a SpatialData object enables seamless integration with many tools from the scverse ecosystem. We highly recommend using this ecosystem to contribute to the broader community in an interoperable and standardised way.

We also recommend the following benchmark for Xenium user: Xenium best practices

Finally, for a more comprehensive list of spatial transcriptomics tools, please visit our Github page: Spatial-Biology-Tools

Segmentation

Name Released Documentation Links
sopa 06/2024

Cell typing

Name Released Documentation Links
TANGRAM 10/2021
cell2location 01/2022
cellTypist 05/2022

Domaine identification

Name Released Documentation Links
SpaGCN 10/2021 N/A
STAGATE 04/2022
Banksy 02/2024 N/A

Genes imputation

Name Released Documentation Links
SpaOTsc 04/2020 N/A
SpaGE 09/2020 N/A
TANGRAM 10/2021

Spatially variable genes

Name Released Documentation Links
SpatialDE 03/2018 N/A
SINFONIA 02/2023

Cell-cell communications

Name Released Documentation Links
NCEM 10/2022
COMMOT 01/2023
FlowSig 08/2024 N/A
DeepTalk 08/2024
NicheCompass 03/2025

📫 Contact

For any questions, reach out to Anthony Baptista: 📧 anthony.baptista@kcl.ac.uk

🤝 Contributing

Contributions are welcome and appreciated! 💙
Whether it’s fixing a bug, improving documentation, or adding a new feature — every bit helps make this project better.

🛠️ How to Contribute

  1. Fork the repository on GitHub.
  2. Clone your fork locally:
    git clone https://github.com/your-username/your-repo-name.git
    cd your-repo-name
  3. Create a new branch for your feature or fix:
    git clone https://github.com/your-username/your-repo-name.git
    cd your-repo-name
  4. Commit your changes with a clear and concise message::
    git add .
    git commit -m "Add a short description of your changes"
  5. Push your branch to your fork and open a Pull Request::
    git push origin feature/your-feature-name
    

♻️ License

This work is licensed under:

  • MIT license (for code)
  • Creative Commons Attribution 4.0 International license (for docs)

You may share/adapt for any purpose, including commercially, with proper credit and no added restrictions.

✨ Contributors & Contacts

Anthony Baptista
Anthony Baptista

Diane Cruiziat
Diane Cruiziat

About

Documentation

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE-CCBY.md
MIT
LICENSE-MIT.md

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors