Pycytominer is a suite of common functions used to process high dimensional readouts from high-throughput cell experiments. The tool is most often used for processing data through the following pipeline:
Image data flow from the microscope to segmentation and feature extraction tools (e.g. CellProfiler or DeepProfiler). From here, additional single cell processing tools curate the single cell readouts into a form manageable for pycytominer input. For CellProfiler, we use cytominer-database or cytominer-transport. For DeepProfiler, we include single cell processing tools in pycytominer.cyto_utils.
From the single cell output, we perform five steps using a simple API (described below), before passing along our data to cytominer-eval for quality and perturbation strength evaluation.
The API is consistent for the five major processing functions:
- Aggregate
- Annotate
- Normalize
- Feature select
- Consensus
Each processing function has unique arguments, see our documentation for more details.
Pycytominer is still in beta, and can only be installed from GitHub:
pip install git+git://github.com/cytomining/pycytominer
Since the project is actively being developed, with new features added regularly, we recommend installation using a hash:
# Example:
pip install git+git://github.com/cytomining/pycytominer@2aa8638d7e505ab510f1d5282098dd59bb2cb470
Using pycytominer is simple and fun.
# Real world example
import pandas as pd
import pycytominer
commit = "da8ae6a3bc103346095d61b4ee02f08fc85a5d98"
url = f"https://media.githubusercontent.com/media/broadinstitute/lincs-cell-painting/{commit}/profiles/2016_04_01_a549_48hr_batch1/SQ00014812/SQ00014812_augmented.csv.gz"
df = pd.read_csv(url)
normalized_df = pycytominer.normalize(
profiles=df,
method="standardize",
samples="Metadata_broad_sample == 'DMSO'"
)
Pycytominer was written with a goal of processing any high-throughput profiling data. However, the initial use case was developed for processing image-based profiling experiments specifically. And, more specifically than that, image-based profiling readouts from CellProfiler measurements from Cell Painting data.
Therefore, we have included some custom tools in pycytominer/cyto_utils
.