Skip to content

Would like contig name remapping #22

Open
@jblachly

Description

https://github.com/dpryan79/ChromosomeMappings provides an outstanding repo of contig name maps across builds.

In the simplest case, UCSC/Gencode naming calls the first human chromosome chr1, while Ensembl calls it 1. It is not merely enough to slice off (or add) chr however, because chrM == MT, and there are numerous unlocalized and unplaced contigs. In addition, UCSC and Gencode are only identical with respect to the basic 23 chromosomes; they have different names for alt/unloc/unplaced contigs.

Crossmap takes the naive approach of renaming based on the chr prefix, which is of course a hug ehelp to users who face the very real problem of mismatching contig names, but an incomplete solution.

Here, I propose two possible remappings:

  1. Remap the contig name from the BED or VCF file immediately, before hitting the chain file. This would be useful for instance if you had a VCF from Gnomad with Ensembl contigs, but your chain file expected UCSC/Gencode style contigs
  2. Remap the contig name coming out of the liftover. Incidentally, this also means we could create "identity" chain files that performed zero coordinate translation but the tool would essentially then be a contig renaming tool (albeit an overcomplicated one, but I don't know of another good tool that does this)

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions