-
Notifications
You must be signed in to change notification settings - Fork 6
Description
A common setup for sequence to function modeling is using a larger window of genetic context as input to predict a smaller window of functional data. For reduced storage requirements and better performance, GVL could write and/or decompress just the data that is needed, rather than using a single length for all sequence types. I'm not sure what the best API would look like here. Users could ostensibly even want sequence that are not centered on each other, and then each type of sequence would need its own set of regions that are paired with all the others. This could look like passing a BED file for each reader, where each BED file has the same # of regions. Downstream, this would require expanding the definition of "output sequence length" described in the gvl.Dataset.