Skip to content

Conversation

@cmdcolin
Copy link
Collaborator

@cmdcolin cmdcolin commented Sep 30, 2025

Background

The "block-based rendering" system has been a basic building block of jbrowse visualizations since jbrowse 1 days. This method divides the visible region into a number of small 'blocks' that are rendered independently. However, it causes a lot of complexity because some track types need to synchronize e.g. the y-scalebar across all the blocks, and it can also actually creates slowness because all this synchronization creates multiple independent calls to the renderers and data stores that are difficult to reason about, cache efficiently, and reprocesses the same data in the visible region many times over just to render one screen of data.

Overview of current rendering pipeline with static blocks, for a single alignment track

Here is an overview of creating a rendering a BAM track, just to highlight the volume of data fetching and rendering requests:

An alignments track has two subtracks: a "pileup" subtrack and "snpcoverage" subtrack

Each screen of data has approximately 3 "static blocks" (we hardcode it to width 800px, and average screen widths are about 3x that)

Subroutine 1: check if too much data to display by estimating byte download from CRAM or BAM index. this tends to be fast and avoids actually processing any bam data.

Subroutine 2: calculate the y-scalebar of the data in the visible region

Subroutine 3: calculate the types of 'modifications' in the visible region. this is done to generate the 'track menu options' for color by modifications.

Subroutine 4: render a single SNPCoverage block

Subroutine 5: render a single pileup render block

The pileup subtrack calls subroutine 1, and then subroutine 5 three times

Then the snpcoverage subtrack calls subroutine 1, and then calls subroutine 2, and then calls subroutine 3, and then calls subroutine 4 three times.

This total process can result in re-processing the data many times. This is just for initial render also. After side scrolling or zooming, all of this is fully re-done. Side scrolling sometimes saves a little bit of processing because it only introduces maybe one new 'static block' but oftentimes it still reprocessed everything, because e.g. the y-scalebar changes slightly, so it has to rerender all the blocks.

All this code also fully transits through the BamAdapter getFeatures function. The returned values of the adapter are not really effectively 'cached' for the rendering system, and it is surprisingly hard to cache the results because different numbers of bgzip chunks of the BAM file are decompressed for the varying side-by-side blocks. Figuring out how to cache better is an orthogonal issue not addressed here, but before this I looked into trying to fully remove the caching in bam-js library because the cache was seemingly ineffective (spoiler: completely removing it wasn't a great idea, but still needs work), and produced this figure showing the request pattern to the library.

image

Figure showing data store requests just for a single initial rendering of a BAM file, from GMOD/bam-js#110

Note also that the current static blocks concept is sort of "overfit" to the side-scrolling use case but zooming in and out does not benefit at all from static blocks, and zooming could be seen as a possibly even more important metric to optimize for. While this PR doesn't really address zooming performance, it potentially removes a lot of conceptual complexity by removing all the overfitting for the side scroll blocks that might help

Example of how we can speed it up: the alternative pipeline

Pileup subtrack does a single render call
Snpcoverage subtract does a single render call

This PR

This PR is a demonstration of using dynamic blocks for the alignments track. It has an additional trick where instead of using an autorun to pre-calculate the y-scalebar and visible modifications, it actually does this on the fly in the renderer when only one region is being rendered which reduces the 're-processing' of data be redundant hits to the data store

(see caveat below for the "why one region?")

Caveat: we still have to retain block-based calculations when rendering more than one region

If, for example, two entirely different chromosomes are being rendered, we still need to render two different blocks and synchronize e.g. the y-scalebar across both chromosomes (imagine rendering bigwig across the whole genome...then it needs synchronizing across all the chromosomes)

In this case, we will still continue to need routines that calculate the y-scalebar on the main thread and then synchronize that score across blocks

We will continue to need the block based logic unless renderers are programmed to render multiple 'regions' at a time. This may potentially be a bridge too far, though...or at least farther in the future.

Benchmark

Benchmark still to be done

Next steps

There is still work that is needed. The dynamic blocks user experience is not ideal because the instant you e.g. side scroll, the data is removed from the view and then you wait for it to re-render. I think that we would want the old data to stay on screen and re-render once ready, instead of blanking.

@cmdcolin cmdcolin force-pushed the newstyle branch 4 times, most recently from 4f90303 to 63c5074 Compare October 3, 2025 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant