Skip to content

Plotting a subset of the data with a dendrogram in e.g. dotplot should retain the dendrogram ordering #3668

@timslittle

Description

@timslittle

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

Hi Team,

Thank you again for this invaluable tool.

I've encountered this warning often and think it is worth requesting an enhancement since it does make analysis a bit trickier sometimes.

Some plots permit the use of a dendrogram to order grouping variables by similarity. However this will not work if the data has been subsetted, throwing a warning and ordering the data by a default measure such as alphabetically.

As an example:

# Set-up
import scanpy as sc
pbmc = sc.datasets.pbmc68k_reduced()
sc.tl.leiden(pbmc, key_added='clusters', resolution=0.5, flavor="igraph", n_iterations=2)
# Dendrogram
sc.tl.dendrogram(pbmc,
                 groupby="clusters")
sc.pl.dendrogram(pbmc,
                 groupby="clusters")

The dendrogram shows which clusters are more similar to one another. When I combine this with sc.pl.rank_genes_groups_dotplot in the next chunk, it will order the data by the dendrogram ordering instead of '0', '1', '2', etc...

sc.tl.rank_genes_groups(
    pbmc, 
    groupby='clusters', 
    method="wilcoxon", 
    key_added = "dea_clusters"
)
sc.pl.rank_genes_groups_dotplot(
    pbmc, 
    groupby='clusters',
    standard_scale="var", 
    key= "dea_clusters",
    show = True,
    dendrogram = True
)

However if I (harshly) filter the ranking of the genes to get a more refined view of differentially expressed genes in each cluster, the subsequent dotplot will not order by the dendrogram. This is because some of the groups are 'removed' because no genes met the criteria. As you can see, this now orders alphabetically.

sc.tl.filter_rank_genes_groups(
    pbmc,
    groupby='clusters',
    min_in_group_fraction=0.9, #default 0.25
    min_fold_change = 1, #default 1
    max_out_group_fraction=0.1, #default 0.5
    key = "dea_clusters",
    key_added="dea_clusters_filtered",
)
sc.pl.rank_genes_groups_dotplot(
    pbmc, 
    groupby='clusters',
    standard_scale="var", 
    key= "dea_clusters_filtered",
    show = True,
    dendrogram = True
)
WARNING: No genes found for group 1
WARNING: No genes found for group 2
WARNING: No genes found for group 4
WARNING: No genes found for group 8
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: 0, 3, 5, etc.

I suggest that an improvement to this would be for the dendrogram ordering to be used even if the data has been subsetted. A warning could perhaps be included stating that the dendrogram ordering may be different when the data is subsetted, but in many cases people would still want the graph to be ordered by some similarity instead of default.

What do y'all think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Triage 🩺This issue needs to be triaged by a maintainer

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions