Seurat FindMarkers calculates wrong log2 fold change

related issue: [#4178](https://github.com/satijalab/seurat/issues/4178)

I discovered great difference between log2fc calculated by Seurat FindMarkers function and the script I wrote myself. Usually, the log2fc is underestimated as mentioned in issue #4178.

I didn't find the source code of FindMarkers function, but I guess you use exp install of expm1, or add the pseudocount 1 when convert the log1p transformed data to depth normalized counts. scRNA-seq expression matrix is usually sparse and the pseudocount 1 may greatly change the original distribution.

See the case below, when i use exp to process the data slot, I get the exact log2fc calculated by FindMarkers. But if I use expm1 to process the data slot or just use the count slot, I get great larger log2fc (1.7 vs 2.8).

Also, I guess it is a better way to add the pseudocount: `log2(sum(expm1(data)[gene,cells.1])+pseudocount) - log2(sum(expm1(data)[gene,cells.2])+pseudocount) +log2(cells.2 number) - log2(cells.1 number)`

So, please check the log2fc calculated by FindMarkers. Also, I am not quite familiar with R, so I would be very grateful if you can show me the link to your FindMarkers source code. 

Thank you very much, and thank you for your amazing work.
Yiming

```r
> #load data
> setwd('/data/User/sunym/temp/')
> .libPaths('/data/User/sunym/software/R_lib/R_4.1.3/')
> library(Seurat)
Attaching SeuratObject
Attaching sp
> library(SeuratData)
── Installed datasets ────────────────────────────────────────────────────────────────────────────────────── SeuratData v0.2.2 ──
✔ ifnb         3.1.0                                            ✔ pbmc3k       3.1.4
✔ panc8        3.0.2                                            ✔ pbmcMultiome 0.1.2

────────────────────────────────────────────────────────────── Key ──────────────────────────────────────────────────────────────
✔ Dataset loaded successfully
❯ Dataset built with a newer version of Seurat than installed
❓ Unknown version of Seurat installed

> data('pbmc3k')
> 
> #normalize
> pbmc3k <- NormalizeData(object = pbmc3k,assay = 'RNA',normalization.method = 'LogNormalize',scale.factor = 10000)
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> 
> #subset
> temp_T <- pbmc3k[,pbmc3k$seurat_annotations == 'CD8 T']
> temp_NK <- pbmc3k[,pbmc3k$seurat_annotations == 'NK']
> 
> #find marker using Seurat findmarkers
> FindMarkers(object = pbmc3k,ident.1 = 'CD8 T',ident.2 = 'NK',group.by = 'seurat_annotations',assay = 'RNA',slot = 'data',features = 'CD8A')
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
            p_val avg_log2FC pct.1 pct.2    p_val_adj
CD8A 9.280891e-18   1.755037 0.506 0.084 1.272781e-13
> 
> #the process I guess how seurat calculate log2fc
> log2(rowMeans(exp(temp_T@assays$RNA@data['CD8A',,drop = FALSE]))/rowMeans(exp(temp_NK@assays$RNA@data['CD8A',,drop = FALSE])))
    CD8A 
1.755037 
> 
> #calculate log2fc myself
> #use slot data
> log2(rowMeans(expm1(temp_T@assays$RNA@data['CD8A',,drop = FALSE]))/rowMeans(expm1(temp_NK@assays$RNA@data['CD8A',,drop = FALSE])))
    CD8A 
2.840186 
> #use slot count
> log2(mean(temp_T@assays$RNA@counts['CD8A',]/temp_T$nCount_RNA)/mean(temp_NK@assays$RNA@counts['CD8A',]/temp_NK$nCount_RNA))
[1] 2.840186
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seurat FindMarkers calculates wrong log2 fold change #6202

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seurat FindMarkers calculates wrong log2 fold change #6202

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions