-
Notifications
You must be signed in to change notification settings - Fork 41
Output vcf_files
S M Ashiqul Islam edited this page Jan 27, 2026
·
1 revision
This page describes the vcf_files output folder structure.
The vcf_files folder contains text-based files with original mutations paired with their SigProfilerMatrixGenerator classifications.
vcf_files/
├── DBS/ # Dinucleotide substitutions
├── MNS/ # Multinucleotide substitutions
├── SNV/ # Single nucleotide variants
└── ID/ # Small insertions and deletions
Contains files with double base substitution mutations and their DBS classifications.
Output includes:
- Original mutation coordinates
- Reference dinucleotide
- Alternate dinucleotide
- DBS classification category
Contains mutations involving 3 or more consecutive nucleotides.
Output includes:
- Mutation coordinates
- Reference sequence
- Alternate sequence
- MNS classification
Contains single base substitutions with their SBS classifications.
Output includes:
- Chromosome and position
- Reference nucleotide
- Alternate nucleotide
- Sequence context
- SBS96 classification
- Transcriptional strand information
Contains small insertions and deletions with their ID classifications.
Output includes:
- Mutation coordinates
- Indel sequence
- Repeat/microhomology context
- ID classification category
Each file is tab-delimited with the following general columns:
| Column | Description |
|---|---|
| Sample | Sample name |
| Chromosome | Chromosome identifier |
| Position | Genomic position |
| Ref | Reference allele |
| Alt | Alternate allele |
| Context | Sequence context |
| Classification | SigProfilerMatrixGenerator category |
| Strand | Transcriptional strand (if applicable) |
These files are useful for:
- Quality control: Verify mutation classifications
- Custom analysis: Export mutations with classifications for downstream analysis
- Visualization: Create custom plots with mutation-level data
- Integration: Combine with other genomic annotations
To generate vcf_files output, set seqInfo=True:
matrices = matGen.SigProfilerMatrixGeneratorFunc(
"project",
"GRCh37",
"/path/to/input",
seqInfo=True
)