Plotting a subset of the data with a dendrogram in e.g. dotplot should retain the dendrogram ordering

### What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

### Please describe your wishes

Hi Team,

Thank you again for this invaluable tool.

I've encountered this warning often and think it is worth requesting an enhancement since it does make analysis a bit trickier sometimes.

Some plots permit the use of a dendrogram to order grouping variables by similarity. However this will not work if the data has been subsetted, throwing a warning and ordering the data by a default measure such as alphabetically.

As an example:
```{python}
# Set-up
import scanpy as sc
pbmc = sc.datasets.pbmc68k_reduced()
sc.tl.leiden(pbmc, key_added='clusters', resolution=0.5, flavor="igraph", n_iterations=2)
# Dendrogram
sc.tl.dendrogram(pbmc,
                 groupby="clusters")
sc.pl.dendrogram(pbmc,
                 groupby="clusters")
```

The dendrogram shows which clusters are more similar to one another. When I combine this with `sc.pl.rank_genes_groups_dotplot` in the next chunk, it will order the data by the dendrogram ordering instead of '0', '1', '2', etc...

```
sc.tl.rank_genes_groups(
    pbmc, 
    groupby='clusters', 
    method="wilcoxon", 
    key_added = "dea_clusters"
)
sc.pl.rank_genes_groups_dotplot(
    pbmc, 
    groupby='clusters',
    standard_scale="var", 
    key= "dea_clusters",
    show = True,
    dendrogram = True
)
```

However if I (harshly) filter the ranking of the genes to get a more refined view of differentially expressed genes in each cluster, the subsequent dotplot will not order by the dendrogram. This is because some of the groups are 'removed' because no genes met the criteria. As you can see, this now orders alphabetically.

```
sc.tl.filter_rank_genes_groups(
    pbmc,
    groupby='clusters',
    min_in_group_fraction=0.9, #default 0.25
    min_fold_change = 1, #default 1
    max_out_group_fraction=0.1, #default 0.5
    key = "dea_clusters",
    key_added="dea_clusters_filtered",
)
sc.pl.rank_genes_groups_dotplot(
    pbmc, 
    groupby='clusters',
    standard_scale="var", 
    key= "dea_clusters_filtered",
    show = True,
    dendrogram = True
)
```
```
WARNING: No genes found for group 1
WARNING: No genes found for group 2
WARNING: No genes found for group 4
WARNING: No genes found for group 8
WARNING: Groups are not reordered because the `groupby` categories and the `var_group_labels` are different.
categories: 0, 1, 2, etc.
var_group_labels: 0, 3, 5, etc.
```

I suggest that an improvement to this would be for the dendrogram ordering to be used even if the data has been subsetted. A warning could perhaps be included stating that the dendrogram ordering may be different when the data is subsetted, but in many cases people would still want the graph to be ordered by some similarity instead of default. 

What do y'all think?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Plotting a subset of the data with a dendrogram in e.g. dotplot should retain the dendrogram ordering #3668

What kind of feature would you like to request?

Please describe your wishes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plotting a subset of the data with a dendrogram in e.g. dotplot should retain the dendrogram ordering #3668

Description

What kind of feature would you like to request?

Please describe your wishes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions