Skip to content

Comments

Plotting rework: the collection#184

Open
casblaauw wants to merge 235 commits intomainfrom
v2.0
Open

Plotting rework: the collection#184
casblaauw wants to merge 235 commits intomainfrom
v2.0

Conversation

@casblaauw
Copy link
Collaborator

@casblaauw casblaauw commented Feb 5, 2026

This is the collective branch for my big plotting rework, along with other code cleanup accompanying a large version increase.

Executive summary

  • The plotting functions have been reorganised into new modules (pl.qc, pl.region, pl.corr, pl.dist, pl.explain, pl.locus, pl.modisco, and pl.design) and renamed accordingly. See Plotting rework 3: renaming functions #183 or the docs for an overview.
  • Existing plotting functions:
    • Almost all functions now accept an optional ax object to plot their data on, and will optionally return their ax/axs as well. This allows for composite plots and further easy customisation.
    • All plotting functions now take plot_kws, allowing customization of the underlying plotting function. All existing defaults (like dot size, color, alpha, ...) can also be adjusted with this.
    • Many plotting functions had their defaults (labels, size) improved and new customization arguments added.
    • Almost all functions now use render_plot, making them customizable by passing kwargs to the plotting function directly. render_plot itself is reworked to primarily work at the ax level, and allows easy setting of labels, titles, tick rotation, etc, per subplot (if passing a list) or for all subplots together (if passing a single value).
  • New plotting functions:
    • crested.pl.qc.filter_cutoff lets you visualize the impact of your gini specificity filtering.
    • crested.pl.qc.sort_and_filter_cutoff lets you visualize the gini distributions of your top k regions when doing per-class specificity filtering.
    • crested.pl.region.scatter lets you plot a single region's ground truth vs predictions as a scatterplot.
  • Almost all plotting functions are now tested.
  • Enhancer design functions are now grouped in tl.design. Again, see the docs for an overview.
  • Added crested.tl.evaluate function to not have to reinstate the Crested object just to get test metrics.
  • All tutorials are updated with new plotting function names and new functionality (where I remembered).

Changelog

Features

  • All plotting functions:
    • All functions now use render_plot (except a few pl.patterns modisco clustermaps)
    • (Almost) all functions now use create_plot, which means they accept an axis to plot their data on, if plotting a single panel, and have unified width/height/sharex/sharey support.
    • (Almost) all functions now accept plot_kws to add and customize the underlying plotting function's arguments.
    • All functions take width and height to set dimensions, and multi-plot functions also take sharex/sharey.
    • render_plot now can also set ax-level labels and titles, set x/y lims, control both axis and sup title/label fontsizes, and add a (nicely behaving) grid.
    • Rotated labels now align with their ticks, optimized to some heuristics. Primarily important with longer cell type names.
    • Returning a plot with show=False now returns both the fig and (list of) axes, instead of just the fig.
    • Default axis labels now denote whether you're using log_transform or not.
    • Lots of plotting functions had their figure size, labels, etc defaults improved (see below), but the core plotting has been untouched.
    • Plotting now uses matplotlib's new recommended 'constrained' layout engine (which is set at plot creation) by default (if creating a new plot), making fig.tight_layout() unneeded. This should improve layouts with any suptitles, and organisation in contribution_scores if plotting multiple sequences.
  • crested.pl.qc.filter_cutoff is a new plotting function to show the impact of different possible cutoffs when doing pp.sort_regions_by_specificity.
  • crested.pl.qc.sort_and_filter_cutoff is a new plotting function to show the gini scores of different classes when doing pp.sort_and_filter_regions_by_specificity, to establish the top k regions you want to take.
  • crested.pl.scatter.region is a new plotting function to show ground truth vs correlation for a single region. Especially useful over region_predictions for many-class models where the bars are not easily interpretable. -
  • crested.pl.locus.track was expanded into a fully fledged function, and now supports large multi-class inputs, center zooming, and highlights. (Expand pl.track.locus #161)
  • crested.pl.patterns.contribution_scores now supports plotting on genomic coordinates with coordinates.
  • The region-based barplots are now one multifunctional plotting function, crested.pl.region.bar. This combines pl.bar.region, pl.bar.region_predictions and pl.bar.prediction. It supports both a single prediction matrix (like bar.prediction) as well as a just an anndata+region combo, and can show multiple models and/or the ground truth (like bar.region_predictions), while still of course being able to plot a single model/ground truth from anndata (like bar.region).
  • Added crested.utils.strand_reverser and crested.utils.parse_region, which make working with region strings/tuples easier.
  • Enhancer design functions (tl.enhancer_design_*, utils.EnhancerOptimizer, utils.derive_intermediate_sequences) are now grouped in tl.design.
  • Added crested.tl.test function to not have to reinstate the Crested object just to get test metrics.
  • Added default config 'peak_regression_count' to represent the intended default with cut sites (the 'alternative config' shown in the tutorial)
  • Removed ZeroPenaltyMetric from default configs.
  • Added crested.tl.evaluate function: calculates metrics on a set of choice (like test set), from model or saved preds. Means you no longer need to reinstate the Crested object just to get test metrics, only to ignore it in the rest of the workflow.

Minor plotting function improvements

  • pl.region.bar: now uses a y-only grid by default, since an x-grid is superfluous with a categorical axis. Now takes log_transform to transform the values before plotting.
  • corr.heatmap[_self]:
    • Colormap is now customizable.
    • Colorbar now has a label to show its units (pearson correlation), indicating log1p-transformation if used.
    • Heatmaps are now square (sns.heatmap(square=True)) by default, and default fig size was slightly changed to make it fit a square heatmap + a colorbar well.
  • pl.dist.histogram: Add nice default axis labels, including denoting log-transformation if used. Non-used plots in the plot grid (if plotting multiple classes) are now hidden by default.
  • pl.locus.locus_scoring now takes separate plot_kws for both the locus and bigwig plots. Previous custom arguments are now folded into the plot_kws or render_plot kwargs. Highlights can now also be customized with highlight_kws.
  • pl.explain.contribution_scores:
    • Class labels are updated to be consistently at 70% of plot height (instead of 70% of the positive values) and at 2.5% of the plot width (instead of at x=5). For 'mutagenesis', it's at 30% by default since we expect those values to be mostly negative.
    • Input dimensions are now automatically attempted to be expanded if dimensions are missing.
    • Highlights can now be customized with highlight_kws.
    • y-limit sharing between sequences can now explicitly be customized with sharey=True/False/'sequence'.
    • Internals cleaned up, also makes some behavior more consistent.
    • Now takes coordinates, to plot the explainer on genomic coordinates rather than just range(0, seq_len).
  • pl.design.steps_predictions: Spelling mistake in the arguments fixed. Now always creates a square grid of plots if supplying a lot of classes, following hist.distributions.
  • pl.design.step_contribution_scores is fully reworked to wrap around contribution_scores and do all nice things contribution_scores can do.
  • pl.modisco.* (prev patterns stuff):
    • All functions now take width/height, and the non-clustermap functions now all use render_plot. Clustermap functions now use g.savefig() as recommended by seaborn instead of fig.savefig.
    • pl.modisco.clustermap_with_pwm_logos pwm positioning logic was slightly adjusted, since they were all overlapping on my test run.
    • pl.modisco.selected_instances now takes an axis if plotting a single index.
    • All clustermaps/heatmaps in this module should now have cmap as an argument.
  • pl.corr.scatter:
    • Now has a square argument, to make the subplot square and unify the axes and their aspect ratios (so that y=x is a perfect diagonal).
    • Now has an optional argument for an identity (y=x) line.
    • Now allows disabling of the colorbar (off by default).
    • Now has nicer default labels.
  • pl.corr.violin: Label adjusted if using log-transformed data.

Bugfixes

  • crested.pl.design.step_contribution_scores (ex-patterns.enhancer_design_contribution_scores)'s zoom_n_bases argument now works (zoom_n_bases broken in enhancer_design_steps_contribution_scores #167)
  • crested.pl.modisco.clustermap_with_pwm_logos's pwm adjusting logic was improved, since they overlapped with the clustermap previously.
  • crested.pl.corr.scatter's colorbar now properly shows the color range (without alpha diluting the colors, and properly using the range of z fit in the function).
  • crested.pl.modisco.clustermap_with_pwm_logos: improved pwm positioning and sizing
  • crested.tl.modisco.find_pattern_matches: support new modisco versions using 'pval' rather than 'qval' columns, rename cutoff argument to 'p_val_cutoff'

Documentation and infrastructure

  • (Almost) all plotting functions are now tested.

Function deprecation warnings

  • All old names in crested.pl, except crested.pl.locus.
  • grad_times_input_to_df
  • grad_times_input_to_df_mutagenesis
  • grad_times_input_to_df_mutagenesis_letters

Function removals

  • The methods of {class}~crested.tl.Crested which were superseded by {mod}~crested.tl functions:
    • get_embeddings -> {func}~crested.tl.extract_layer_embeddings
    • predict -> {func}~crested.tl.predict
    • predict_regions -> {func}~crested.tl.predict
    • predict_sequence -> {func}~crested.tl.predict
    • score_gene_locus -> {func}~crested.tl.score_gene_locus
    • calculate_contribution_scores -> {func}~crested.tl.contribution_scores
    • calculate_contribution_scores_regions -> {func}~crested.tl.contribution_scores
    • calculate_contribution_scores_sequence -> {func}~crested.tl.contribution_scores
    • calculate_contribution_scores_enhancer_design -> {func}~crested.tl.contribution_scores
    • tfmodisco_calculate_and_save_contribution_scores_sequences -> {func}~crested.tl.contribution_scores_specific
    • tfmodisco_calculate_and_save_contribution_scores -> {func}~crested.tl.contribution_scores_specific
    • enhancer_design_motif_implementation -> {func}~crested.tl.design.motif_insertion
    • enhancer_design_in_silico_evolution -> {func}~crested.tl.design.in_silico_evolution
    • _derive_intermediate_sequences -> {func}~crested.tl.design.derive_intermediate_sequences
  • Aliases for models that didn't properly reflect their nature:
    • chrombpnet -> {func}~crested.tl.zoo.dilated_cnn
    • chrombpnet_decoupled -> {func}~crested.tl.zoo.dilated_cnn_decoupled
  • Superseded or obsolete utility functions:
    • extract_bigwig_values_per_bp -> {func}~crested.utils.read_bigwig_region
    • get_value_from_dataframe -> df.loc[row_name, column_name]

For more info on the rationale behind each change, see the individual PRs #166 , #182 and #183 .

Compatibility

I've endeavored to keep code as reverse compatible as possible.

  • All renamed arguments still work, and raise a warning on how to use them with the renamed version or new syntax.
  • All renamed functions still work with their old aliases, and raise an info statement (for now) to use their renamed version.
  • If using show=False, render_plot does now return both a fig and ax(s), so code previously doing fig = crested.pl.func(show=False); axs = fig.axes or something similar will have to update to fig, axs = crested.pl.func(show=False).
  • title as a kwarg now refers to the axis title rather than suptitle; suptitle's now under suptitle. This leads to better titles and nicer plots in 90% of cases, but might need some manual changes if doing multi-panel plots where you expected suptitle.
  • I've tested all base functions (everything except modisco_results) pretty thoroughly (also adjusting plot_kws, etc), but something might've slipped through.
    • for _modisco_results , I tested that all functions at least work as used in the enhancer code analysis tutorial, but I haven't tried different arguments or parameters.

…size, fix single-string labels in step_contrib_scores
Plotting rework 4: update tutorials
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant