Skip to content

Input and Output

Jeff Winchell edited this page Aug 19, 2024 · 1 revision

This page details the input and output of the ScaleFEx pipeline.

Input

1. Parameters File

The default parameters file parameters.yaml is included in the git repository and should be used as a template when attempting to run the tool on your own data.

For specific descriptions of the parameters, consult the wiki's Home page or the default parameters file

Pass your parameters file using the -p flag:

python scalefex_main.py -p /path/to/your/parameters.yaml

2. Imaging Data

The structure of your data should generally adhere to the following structure:

└── experiment
    ├── plate1
    │   ├── well1_site1_ch1.png
    │   ├── well1_site2_ch1.png
    │   ├── well2_site1_ch1.png
    │   └── well2_site2_ch1.png
    └── plate2
        ├── well1_site1_ch1.png
        ├── well1_site2_ch1.png
        ├── well2_site1_ch1.png
        └── well2_site2_ch1.png

with a main "experiment" directory containing a folder for each plate's images you wish to analyze.

There can be any number of subfolders between a plate folder (e.g. experiment/plate1/) and its images, as long as each plate's images follows the same structure with the same subfolder names.

For more information, see the wiki page on Querying Data

3. Pre-computed Coordinates (Optional)

In the case that you have pre-computed coordinates saved, you may use them as an input to ScaleFEx and avoid re-computing cell locations(segmenting nuclei). You can specify the path to the file using the csv_coordinates parameter.

The file is should be a .csv file structured as follows:

image

Output

Generally, the separate output files each follow a similar structure, making them easy to aggregate together and across experiments. Output filenames include the experiment_name specified in the parameters file, sometimes divided up into multiple files to ease data loading/analysis after computation.

1. ScaleFEx Features

The ScaleFEx feature vectors are written to CSV files (with a max size limited by the max_file_size parameter).

Aside from metadata, the CSV includes feature columns named according to the feature name, image channel, and the feature parameters (when applicable).

For specific descriptions of the features, please consult the paper [LINK TO PAPER].

2. Site Computation Tracking

Each time a site's cells' feature vectors are computed, info about that site is written to a separate CSV file labeled <experiment_name>_sites-computed.csv.

This file tracks cells where the ScaleFEx vector is:

  1. computed successfully
  2. not computed because the cell is on the edge of the image
  3. not computed because the computation failed

Because ScaleFEx relies on binary masks of individual channels to compute its features, issues with individual channels, although rare, might cause computation to fail on specific cells.

Glancing at this file after running the pipeline is also a good sanity check to confirm it was computed correctly. If there are lots of "failed" cells, this might indicate a parameter/data issue.

3. Copy of Input Parameters File

When the pipeline starts, a time-stamped copy of the parameters file is saved to the specified saving_folder. This is intended to aid in documentation and experimental standardization.

If the pipeline is interrupted in the middle, you might pass a time-stamped parameters file to ensure the computation resumes with the same parameters (see the syntax specified in Input).

4. Quality Control Metrics (Optional)

When the QC parameter is set to True, a comprehensive list of quality control metrics are computed at site-level (image-level) for each channel and recorded to a CSV in a subfolder of the saving_folder called QC_analysis/

5. Cell Coordinates (Optional)

If the save_coordinates parameter is set to True, coordinates will be written to a separate file in the specified saving_folder.

Clone this wiki locally