Skip to content

support different lengths for each type of sequence #19

@d-laub

Description

@d-laub

A common setup for sequence to function modeling is using a larger window of genetic context as input to predict a smaller window of functional data. For reduced storage requirements and better performance, GVL could write and/or decompress just the data that is needed, rather than using a single length for all sequence types. I'm not sure what the best API would look like here. Users could ostensibly even want sequence that are not centered on each other, and then each type of sequence would need its own set of regions that are paired with all the others. This could look like passing a BED file for each reader, where each BED file has the same # of regions. Downstream, this would require expanding the definition of "output sequence length" described in the gvl.Dataset.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions