MergeGenome - Toolkit for Merging VCF files

This repository includes a Python implementation of the MergeGenome toolkit, which underlies the importance of cleaning genomic sequences prior to analysis. The MergeGenome toolkit is designed to integrate DNA sequences from a query and a reference datasets in variant call format (VCF) while targeting data quality. MergeGenome is a robust pipeline of comprehensive steps to merge both datasets, including chromosome nomenclature standardization, SNP ambiguities removal, SNP flips detection, SNP mismatches elimination, and query/reference mismatches detection and/or fixing. MergeGenome works with any organism’s DNA sequences, which brings a broad solution to having access to more than one source of data but only being able to exploit one for statistical analysis.

This repository also includes the implementation of other common tasks related to merging genomic sequences, such as identifying the common markers (i.e. SNPs with identical CHROM, POS, REF, and ALT fields) between two datasets and subsetting the available data to those common markers.

Preprocessing Steps

Partition data into a separate VCF file per chromosome
Rename chromosome notation
Clean VCF files
Impute
Subset SNPs to common markers with another dataset
Machine Learning Source Classifiers for SNP Filtering
SNP Error Correction
Concat VCF files
Merge VCF files

Merging Evaluation

Other util commands

License

This project is under the CC BY-NC 4.0 license. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
evaluation		evaluation
figures		figures
modules		modules
readmes		readmes
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
MergeGenome.py		MergeGenome.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MergeGenome - Toolkit for Merging VCF files

Preprocessing Steps

Merging Evaluation

Other util commands

License

About

Releases

Packages

Languages

License

AI-sandbox/MergeGenome

Folders and files

Latest commit

History

Repository files navigation

MergeGenome - Toolkit for Merging VCF files

Preprocessing Steps

Merging Evaluation

Other util commands

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages