Skip to content

Latest commit

 

History

History
84 lines (75 loc) · 22.5 KB

Loom_schema_OUTDATED.md

File metadata and controls

84 lines (75 loc) · 22.5 KB

What's in the Optimus Pipeline Loom File?

The Loom file is an HDF5 file generated using Loompy v.2.0.17. It contains UMI-corrected counts as well as multiple metrics for both individual cells (the columns of the matrix; Table 1) and individual genes (the rows of the matrix; Table 2). The tables below document these metrics, list which tools generate them, and define them. This Loom file is an optional output of the Optimus pipeline. The default matrix output of the Optimus pipeline is a ZARR Array. The Loom file is directly derived from the ZARR and contains the same information with only minor header updates for schema compatibility.

Note: Loom files generated by Optimus are different from the final Loom file distributed on the Human Cell Atlas Data Portal, which removes some of the metadata detailed in this document and contains additional metadata relating to each individual project.

Table 1. Column Attributes (Cell Metrics)

Cell Metrics Program Details
CellID SC Tools The unique identifier for each cell based on 10X Cell Barcodes
n_reads SC Tools The number of reads associated with this entity. Metrics Definitions
noise_reads SC Tools Number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. Metrics Definitions
perfect_molecule_barcodes SC Tools The number of reads with molecule barcodes that have no errors. Metrics Definitions
reads_mapped_exonic SC Tools The number of reads for this entity that are mapped to exons. Metrics Definitions
reads_mapped_intronic SC Tools The number of reads for this entity that are mapped to introns. Metrics Definitions
reads_mapped_utr SC Tools The number of reads for this entity that are mapped to 3' untranslated regions (UTRs). Metrics Definitions
reads_mapped_uniquely SC Tools The number of reads mapped to a single unambiguous location in the genome. Metrics Definitions
reads_mapped_multiple SC Tools The number of reads mapped to multiple genomic positions with equal confidence. Metrics Definitions
duplicate_reads SC Tools The number of reads that are duplicates (see README.md for definition of a duplicate). Metrics Definitions
spliced_reads SC Tools The number of reads that overlap splicing junctions. Metrics Definitions
antisense_reads SC Tools The number of reads that are mapped to the antisense strand instead of the transcribed strand. Metrics Definitions
molecule_barcode_fraction_bases_above_30_mean SC Tools The average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. Metrics Definitions
molecule_barcode_fraction_bases_above_30_variance SC Tools The variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. Metrics Definitions
genomic_reads_fraction_bases_quality_above_30_mean SC Tools The average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity (included for 10x Cell Ranger count comparison). Metrics Definitions
genomic_reads_fraction_bases_quality_above_30_variance SC Tools The variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity (included for 10x Cell Ranger count comparison). Metrics Definitions
genomic_read_quality_mean SC Tools Average quality of Illumina base calls in the genomic reads corresponding to this entity. Metrics Definitions
genomic_read_quality_variance SC Tools Variance in quality of Illumina base calls in the genomic reads corresponding to this entity. Metrics Definitions
n_molecules SC Tools Number of molecules corresponding to this entity. See README.md for the definition of a Molecule. Metrics Definitions
n_fragments SC Tools Number of fragments corresponding to this entity. See README.md for the definition of a Fragment. Metrics Definitions
reads_per_fragment SC Tools The average number of reads associated with each fragment in this entity. Metrics Definitions
fragments_per_molecule SC Tools The average number of fragments associated with each molecule in this entity. Metrics Definitions
fragments_with_single_read_evidence SC Tools The number of fragments associated with this entity that are observed by only one read. Metrics Definitions
molecules_with_single_read_evidence SC Tools The number of molecules associated with this entity that are observed by only one read. Metrics Definitions
perfect_cell_barcodes SC Tools The number of reads whose cell barcodes contain no error. Metrics Definitions
reads_mapped_intergenic SC Tools The number of reads mapped to an intergenic region for this cell. Metrics Definitions
reads_mapped_too_many_loci SC Tools The number of reads that were mapped to too many loci across the genome and as a consequence, are reported unmapped by the aligner. Metrics Definitions
cell_barcode_fraction_bases_above_30_variance SC Tools The variance of the fraction of Illumina base calls for the cell barcode sequence that are greater than 30, across molecules. Metrics Definitions
cell_barcode_fraction_bases_above_30_mean SC Tools The average fraction of Illumina base calls for the cell barcode sequences that are greater than 30, across molecules. Metrics Definitions
n_genes SC Tools The number of genes detected by this cell. Metrics Definitions
genes_detected_multiple_observations SC Tools The number of genes that are observed by more than one read in this cell. Metrics Definitions
reads_unmapped SC Tools Reads that are non-transcriptomic
emptydrops_FDR dropletUtils False Discovery Rate (FDR) for being a non-empty droplet
emptydrops_IsCell dropletUtils Binarized call of cell/background based on predefined FDR cutoff
emptydrops_Limited dropletUtils Indicates whether a lower p-value could be obtained by increasing the number of iterations
emptydrops_LogProb dropletUtils The log-probability of observing the barcode’s count vector under the null model
emptydrops_PValue dropletUtils Numeric, the Monte Carlo p-value against the null model
emptydrops_Total dropletUtils Numeric, the total read counts for each barcode

Table 2. Row Attributes (Gene Metrics)

Gene Metrics Program Details
Accession GENCODE GTF The gene_id listed in the GENCODE GTF
Gene GENCODE GTF The unique gene_name provided in the GENCODE GTF
n_reads SC Tools The number of reads associated with this entity. Metrics Definitions
noise_reads SC Tools Number of reads that are categorized by 10x Genomics Cell Ranger as "noise". Refers to long polymers, or reads with high numbers of N (ambiguous) nucleotides. Metrics Definitions
perfect_molecule_barcodes SC Tools The number of reads with molecule barcodes that have no errors. Metrics Definitions
reads_mapped_exonic SC Tools The number of reads for this entity that are mapped to exons. Metrics Definitions
reads_mapped_intronic SC Tools The number of reads for this entity that are mapped to introns. Metrics Definitions
reads_mapped_utr SC Tools The number of reads for this entity that are mapped to 3' untranslated regions (UTRs). Metrics Definitions
reads_mapped_uniquely SC Tools The number of reads mapped to a single unambiguous location in the genome. Metrics Definitions
reads_mapped_multiple SC Tools The number of reads mapped to multiple genomic positions with equal confidence. Metrics Definitions
duplicate_reads SC Tools The number of reads that are duplicates (see README.md for definition of a duplicate). Metrics Definitions
spliced_reads SC Tools The number of reads that overlap splicing junctions. Metrics Definitions
antisense_reads SC Tools The number of reads that are mapped to the antisense strand instead of the transcribed strand. Metrics Definitions
molecule_barcode_fraction_bases_above_30_mean SC Tools The average fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. Metrics Definitions
molecule_barcode_fraction_bases_above_30_variance SC Tools The variance in the fraction of bases in molecule barcodes that receive quality scores greater than 30 across the reads of this entity. Metrics Definitions
genomic_reads_fraction_bases_quality_above_30_mean SC Tools The average fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity (included for 10x Cell Ranger count comparison). Metrics Definitions
genomic_reads_fraction_bases_quality_above_30_variance SC Tools The variance in the fraction of bases in the genomic read that receive quality scores greater than 30 across the reads of this entity (included for 10x Cell Ranger count comparison). Metrics Definitions
genomic_read_quality_mean SC Tools Average quality of Illumina base calls in the genomic reads corresponding to this entity. Metrics Definitions
genomic_read_quality_variance SC Tools Variance in quality of Illumina base calls in the genomic reads corresponding to this entity. Metrics Definitions
n_molecules SC Tools Number of molecules corresponding to this entity. See README.md for the definition of a Molecule. Metrics Definitions
n_fragments SC Tools Number of fragments corresponding to this entity. See README.md for the definition of a Fragment. Metrics Definitions
reads_per_molecule SC Tools The average number of reads associated with each molecule in this entity. Metrics Definitions
reads_per_fragment SC Tools The average number of reads associated with each fragment in this entity. Metrics Definitions
fragments_per_molecule SC Tools The average number of fragments associated with each molecule in this entity. Metrics Definitions
fragments_with_single_read_evidence SC Tools The number of fragments associated with this entity that are observed by only one read. Metrics Definitions
molecules_with_single_read_evidence SC Tools The number of molecules associated with this entity that are observed by only one read. Metrics Definitions
number_cells_detected_multiple SC Tools The number of cells which observe more than one read of this gene. Metrics Definitions
number_cells_expressing SC Tools The number of cells that detect this gene. Metrics Definitions