Skip to content

Create tabix index from scratch #208

Answered by zaeleus
holtgrewe asked this question in Q&A
Discussion options

You must be logged in to vote

The strategy for building an index is to track the start and end virtual positions of each record. Each of these pairs is called a chunk. For tabix, the reference sequence name, record start position, record end position, and chunk are then used by the indexer.

For your first use case, see tabix_write. This shows writing and indexing a BED-like record structure, but it can be applied to VCF as well.

vcf_index shows an example of reading a bgzip-compressed VCF and writing a tabix index for it.

Please also update to noodles 0.54.0 / noodles-csi 0.26.0. While adding the tabix_write example, I discovered a bug (fixed in 599aefa) in the builder, so thank you for asking about indexing!

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by holtgrewe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
2 participants