Skip to content

Issue with FindMarkers() methodology #795

@danielcgingerich

Description

@danielcgingerich

The two parts of the find markers function: 1) fold change calculation and 2) p value calculation. Fold change is calculated using raw counts, while the p value is calculated using normalized data. Why is this? I do not believe raw counts are comparable to normalized data in this scenario, and here is why:

Raw counts vs normalized counts result in a change of directionality of many peaks. Example: peak A might have a positive logFC value with raw counts, but it becomes negative when using normalized data. This means that the input data for the p value and the input data for the fold change are contradicting each other.

I calculated fold change on the raw counts and normalized counts and looked at how this changes direction of peak fold changes.

# raw counts
u.load <- mtx[, load.cells]
u.load <- rowMeans(u.load)+1
u.normal <- mtx[, norm.cells]
u.normal <- rowMeans(u.normal)+1
fc <- u.load/u.normal

table(fc > 1)
FALSE  TRUE 
55299 47509

a <- fc > 1

# normalized counts
mtx2 <- atac[['peaks']]@data
u.load <- mtx2[, load.cells]
u.load <- rowMeans(u.load)+1
u.normal <- mtx2[, norm.cells]
u.normal <- rowMeans(u.normal)+1
fc <- u.load/u.normal

table(fc > 1)
FALSE  TRUE 
19302 83506

b <- fc > 1

table(a == b)
FALSE  TRUE 
36399 66409

In this case, out of 102808 peaks, 36399 of them change direction depending on whether raw vs normalized data is used as input. For these peaks, the input for the p value and the input for the fold change calculation are directly contradicting each other.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions