Proposal: remove "static block" usage for some track types #5171
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
The "block-based rendering" system has been a basic building block of jbrowse visualizations since jbrowse 1 days. This method divides the visible region into a number of small 'blocks' that are rendered independently. However, it causes a lot of complexity because some track types need to synchronize e.g. the y-scalebar across all the blocks, and it can also actually creates slowness because all this synchronization creates multiple independent calls to the renderers and data stores that are difficult to reason about, cache efficiently, and reprocesses the same data in the visible region many times over just to render one screen of data.
Overview of current rendering pipeline with static blocks, for a single alignment track
Here is an overview of creating a rendering a BAM track, just to highlight the volume of data fetching and rendering requests:
An alignments track has two subtracks: a "pileup" subtrack and "snpcoverage" subtrack
Each screen of data has approximately 3 "static blocks" (we hardcode it to width 800px, and average screen widths are about 3x that)
Subroutine 1: check if too much data to display by estimating byte download from CRAM or BAM index. this tends to be fast and avoids actually processing any bam data.
Subroutine 2: calculate the y-scalebar of the data in the visible region
Subroutine 3: calculate the types of 'modifications' in the visible region. this is done to generate the 'track menu options' for color by modifications.
Subroutine 4: render a single SNPCoverage block
Subroutine 5: render a single pileup render block
The pileup subtrack calls subroutine 1, and then subroutine 5 three times
Then the snpcoverage subtrack calls subroutine 1, and then calls subroutine 2, and then calls subroutine 3, and then calls subroutine 4 three times.
This total process can result in re-processing the data many times. This is just for initial render also. After side scrolling or zooming, all of this is fully re-done. Side scrolling sometimes saves a little bit of processing because it only introduces maybe one new 'static block' but oftentimes it still reprocessed everything, because e.g. the y-scalebar changes slightly, so it has to rerender all the blocks.
All this code also fully transits through the BamAdapter getFeatures function. The returned values of the adapter are not really effectively 'cached' for the rendering system, and it is surprisingly hard to cache the results because different numbers of bgzip chunks of the BAM file are decompressed for the varying side-by-side blocks. Figuring out how to cache better is an orthogonal issue not addressed here, but before this I looked into trying to fully remove the caching in bam-js library because the cache was seemingly ineffective (spoiler: completely removing it wasn't a great idea, but still needs work), and produced this figure showing the request pattern to the library.
Figure showing data store requests just for a single initial rendering of a BAM file, from GMOD/bam-js#110
Note also that the current static blocks concept is sort of "overfit" to the side-scrolling use case but zooming in and out does not benefit at all from static blocks, and zooming could be seen as a possibly even more important metric to optimize for. While this PR doesn't really address zooming performance, it potentially removes a lot of conceptual complexity by removing all the overfitting for the side scroll blocks that might help
Example of how we can speed it up: the alternative pipeline
Pileup subtrack does a single render call
Snpcoverage subtract does a single render call
This PR
This PR is a demonstration of using dynamic blocks for the alignments track. It has an additional trick where instead of using an autorun to pre-calculate the y-scalebar and visible modifications, it actually does this on the fly in the renderer when only one region is being rendered which reduces the 're-processing' of data be redundant hits to the data store
(see caveat below for the "why one region?")
Caveat: we still have to retain block-based calculations when rendering more than one region
If, for example, two entirely different chromosomes are being rendered, we still need to render two different blocks and synchronize e.g. the y-scalebar across both chromosomes (imagine rendering bigwig across the whole genome...then it needs synchronizing across all the chromosomes)
In this case, we will still continue to need routines that calculate the y-scalebar on the main thread and then synchronize that score across blocks
We will continue to need the block based logic unless renderers are programmed to render multiple 'regions' at a time. This may potentially be a bridge too far, though...or at least farther in the future.
Benchmark
Benchmark still to be done
Next steps
There is still work that is needed. The dynamic blocks user experience is not ideal because the instant you e.g. side scroll, the data is removed from the view and then you wait for it to re-render. I think that we would want the old data to stay on screen and re-render once ready, instead of blanking.