Tags: aprilweilab/grgl
Tags
Reworking GRG file format, version 2 of GRGL (#39) * Reworking GRG file format, version 2 of GRGL * BREAKING CHANGE: GRG file format is bumped to version 5, which stores the Node and Edge data the same on disk as it does in RAM. The representation is identical to the CSRGRG (immutable) and uses libvbyte to encode the edge lists. Size of the GRGs is often 50% smaller, load time can be 20x faster, and the actual graph traversal (entire graph) is between 20-40% slower. * MapMutations is simplified: the topological order is just based on NodeID ordering now, which is gauranteed to keep the order based on the current algorithm. * MapMutations is sped up: there is a new topological traversal algorithm based on bitvectors and NodeID ordering, which can be significantly faster on larger graphs. This can speed up Map- Mutations by 30-40%. * Mutation-to-node mapping is stored in a sorted vector instead of a multimap which reduces RAM usage and speeds up access. * Previously when constructing a GRG it was the default to always use the major allele as the reference, unless the user specified `--no-maf-flip`. This has a small impact on speed/size of GRGs for real data, but can have a larger impact on simulated data. This behavior being the default probably would have confused users who expect the GRG to "faithfully" represent their VCF/IGD file. Now the default is to keep the reference the same, and only flip when user specifies `--maf-flip`. * Update documentation significantly * Update packaging script