Conversation
…lor to region_predictions
…size, fix single-string labels in step_contrib_scores
…and_filter_cutoff
Plotting rework 4: update tutorials
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the collective branch for my big plotting rework, along with other code cleanup accompanying a large version increase.
Executive summary
pl.qc,pl.region,pl.corr,pl.dist,pl.explain,pl.locus,pl.modisco, andpl.design) and renamed accordingly. See Plotting rework 3: renaming functions #183 or the docs for an overview.axobject to plot their data on, and will optionally return theirax/axsas well. This allows for composite plots and further easy customisation.plot_kws, allowing customization of the underlying plotting function. All existing defaults (like dot size, color, alpha, ...) can also be adjusted with this.render_plot, making them customizable by passing kwargs to the plotting function directly.render_plotitself is reworked to primarily work at the ax level, and allows easy setting of labels, titles, tick rotation, etc, per subplot (if passing a list) or for all subplots together (if passing a single value).crested.pl.qc.filter_cutofflets you visualize the impact of your gini specificity filtering.crested.pl.qc.sort_and_filter_cutofflets you visualize the gini distributions of your top k regions when doing per-class specificity filtering.crested.pl.region.scatterlets you plot a single region's ground truth vs predictions as a scatterplot.tl.design. Again, see the docs for an overview.crested.tl.evaluatefunction to not have to reinstate the Crested object just to get test metrics.Changelog
Features
render_plot(except a fewpl.patternsmodisco clustermaps)create_plot, which means they accept an axis to plot their data on, if plotting a single panel, and have unified width/height/sharex/sharey support.plot_kwsto add and customize the underlying plotting function's arguments.widthandheightto set dimensions, and multi-plot functions also takesharex/sharey.render_plotnow can also set ax-level labels and titles, set x/y lims, control both axis and sup title/label fontsizes, and add a (nicely behaving) grid.show=Falsenow returns both the fig and (list of) axes, instead of just the fig.log_transformor not.fig.tight_layout()unneeded. This should improve layouts with any suptitles, and organisation incontribution_scoresif plotting multiple sequences.crested.pl.qc.filter_cutoffis a new plotting function to show the impact of different possible cutoffs when doingpp.sort_regions_by_specificity.crested.pl.qc.sort_and_filter_cutoffis a new plotting function to show the gini scores of different classes when doingpp.sort_and_filter_regions_by_specificity, to establish the top k regions you want to take.crested.pl.scatter.regionis a new plotting function to show ground truth vs correlation for a single region. Especially useful overregion_predictionsfor many-class models where the bars are not easily interpretable. -crested.pl.locus.trackwas expanded into a fully fledged function, and now supports large multi-class inputs, center zooming, and highlights. (Expand pl.track.locus #161)crested.pl.patterns.contribution_scoresnow supports plotting on genomic coordinates withcoordinates.crested.pl.region.bar. This combinespl.bar.region,pl.bar.region_predictionsandpl.bar.prediction. It supports both a single prediction matrix (likebar.prediction) as well as a just an anndata+region combo, and can show multiple models and/or the ground truth (likebar.region_predictions), while still of course being able to plot a single model/ground truth from anndata (likebar.region).crested.utils.strand_reverserandcrested.utils.parse_region, which make working with region strings/tuples easier.tl.enhancer_design_*,utils.EnhancerOptimizer,utils.derive_intermediate_sequences) are now grouped intl.design.crested.tl.testfunction to not have to reinstate the Crested object just to get test metrics.'peak_regression_count'to represent the intended default with cut sites (the 'alternative config' shown in the tutorial)crested.tl.evaluatefunction: calculates metrics on a set of choice (like test set), from model or saved preds. Means you no longer need to reinstate the Crested object just to get test metrics, only to ignore it in the rest of the workflow.Minor plotting function improvements
pl.region.bar: now uses a y-only grid by default, since an x-grid is superfluous with a categorical axis. Now takeslog_transformto transform the values before plotting.corr.heatmap[_self]:sns.heatmap(square=True)) by default, and default fig size was slightly changed to make it fit a square heatmap + a colorbar well.pl.dist.histogram:Add nice default axis labels, including denoting log-transformation if used. Non-used plots in the plot grid (if plotting multiple classes) are now hidden by default.pl.locus.locus_scoringnow takes separate plot_kws for both the locus and bigwig plots. Previous custom arguments are now folded into the plot_kws or render_plot kwargs. Highlights can now also be customized with highlight_kws.pl.explain.contribution_scores:highlight_kws.sharey=True/False/'sequence'.coordinates, to plot the explainer on genomic coordinates rather than justrange(0, seq_len).pl.design.steps_predictions: Spelling mistake in the arguments fixed. Now always creates a square grid of plots if supplying a lot of classes, followinghist.distributions.pl.design.step_contribution_scoresis fully reworked to wrap aroundcontribution_scoresand do all nice thingscontribution_scorescan do.pl.modisco.*(prevpatternsstuff):render_plot. Clustermap functions now useg.savefig()as recommended by seaborn instead offig.savefig.pl.modisco.clustermap_with_pwm_logospwm positioning logic was slightly adjusted, since they were all overlapping on my test run.pl.modisco.selected_instancesnow takes an axis if plotting a single index.cmapas an argument.pl.corr.scatter:squareargument, to make the subplot square and unify the axes and their aspect ratios (so that y=x is a perfect diagonal).pl.corr.violin: Label adjusted if using log-transformed data.Bugfixes
crested.pl.design.step_contribution_scores(ex-patterns.enhancer_design_contribution_scores)'szoom_n_basesargument now works (zoom_n_bases broken in enhancer_design_steps_contribution_scores #167)crested.pl.modisco.clustermap_with_pwm_logos's pwm adjusting logic was improved, since they overlapped with the clustermap previously.crested.pl.corr.scatter's colorbar now properly shows the color range (withoutalphadiluting the colors, and properly using the range ofzfit in the function).crested.pl.modisco.clustermap_with_pwm_logos: improved pwm positioning and sizingcrested.tl.modisco.find_pattern_matches: support new modisco versions using 'pval' rather than 'qval' columns, rename cutoff argument to 'p_val_cutoff'Documentation and infrastructure
Function deprecation warnings
crested.pl, exceptcrested.pl.locus.grad_times_input_to_dfgrad_times_input_to_df_mutagenesisgrad_times_input_to_df_mutagenesis_lettersFunction removals
~crested.tl.Crestedwhich were superseded by {mod}~crested.tlfunctions:get_embeddings-> {func}~crested.tl.extract_layer_embeddingspredict-> {func}~crested.tl.predictpredict_regions-> {func}~crested.tl.predictpredict_sequence-> {func}~crested.tl.predictscore_gene_locus-> {func}~crested.tl.score_gene_locuscalculate_contribution_scores-> {func}~crested.tl.contribution_scorescalculate_contribution_scores_regions-> {func}~crested.tl.contribution_scorescalculate_contribution_scores_sequence-> {func}~crested.tl.contribution_scorescalculate_contribution_scores_enhancer_design-> {func}~crested.tl.contribution_scorestfmodisco_calculate_and_save_contribution_scores_sequences-> {func}~crested.tl.contribution_scores_specifictfmodisco_calculate_and_save_contribution_scores-> {func}~crested.tl.contribution_scores_specificenhancer_design_motif_implementation-> {func}~crested.tl.design.motif_insertionenhancer_design_in_silico_evolution-> {func}~crested.tl.design.in_silico_evolution_derive_intermediate_sequences-> {func}~crested.tl.design.derive_intermediate_sequenceschrombpnet-> {func}~crested.tl.zoo.dilated_cnnchrombpnet_decoupled-> {func}~crested.tl.zoo.dilated_cnn_decoupledextract_bigwig_values_per_bp-> {func}~crested.utils.read_bigwig_regionget_value_from_dataframe->df.loc[row_name, column_name]For more info on the rationale behind each change, see the individual PRs #166 , #182 and #183 .
Compatibility
I've endeavored to keep code as reverse compatible as possible.
show=False,render_plotdoes now return both a fig and ax(s), so code previously doingfig = crested.pl.func(show=False); axs = fig.axesor something similar will have to update tofig, axs = crested.pl.func(show=False)._modisco_results, I tested that all functions at least work as used in the enhancer code analysis tutorial, but I haven't tried different arguments or parameters.