This is a Perl script used to identify genetic variants consistently present across multiple samples in cohort studies or population-level analyses. It processes Variant Call Format (VCF) files to detect single-nucleotide variants (SNVs) & indels shared at defined frequency thresholds (10% to 100% in 10% increments). Unlike simple presence/absence tools, it performs genotype-aware parsing that accurately accounts for heterozygous calls (e.g., "0/1" genotypes) & filters out missing data ("./." entries). The tool generates comprehensive reports detailing variant positions (chromosome, coordinate), reference/alternate alleles & both absolute counts & percentages of samples containing each variant. Its threshold-based approach helps researchers identify core genomic elements in pathogen populations, conserved mutations in study cohorts, or transmission clusters in outbreak investigations.
The script features robust input validation, automatically verifying file paths & VCF integrity before analysis. Outputs are sorted by genomic position & prevalence frequency, facilitating downstream interpretation in tools like Excel or R. Designed for efficiency, it handles large variant sets through optimized hashing algorithms while maintaining low memory footprint. Applications range from identifying vaccine targets in viral quasispecies to detecting founder mutations in genetic epidemiology studies. The command-line interface supports integration into automated pipelines & its tab-separated output format enables seamless incorporation into genomic databases or visualization platforms. Particularly valuable for studies requiring variant prioritization based on ubiquity, this tool bridges the gap between raw variant calling and population-level biological interpretation.
Multiple VCF file analysis
-
Genotype-aware variant counting
-
Threshold reporting (10-100% sample sharing)
-
Comprehensive output with percentages
-
Input validation and error checking
-
Perl 5.20+
-
Perl modules: Getopt::Long, List::Util
perl shared_variant_analyzer.pl -i vcf_list.txt > variant_report.tsv
**vcf_list.txt**:
/path/to/sample1.vcf
/path/to/sample2.vcf
/path/to/sample3.vcf
**Chrom | Position | Ref | Alt | SampleCount | Percentage**
MIT License.