Add 'equal_frequency' option to highly_variable_genes #572
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes #415, by allowing one to find variable genes using the
equal_frequency
option. It also adds and option to change the number of bins for cell ranger flavor.I originally tried to copy the implementation in Seurat, which would allow a test similar to what's already present for the
equal_width
implementation. However the Seurat code has an error:The
-1
in the code makes it such that there is always only one value in the first bin, which goes from -1 to the minimum value. Not sure why they have this, but then we get different answers since the Scanpy code inhighly_variable_genes
always makes bins that have only one gene significant (to correct the other error from Seurat that normally excludes these bins/genes, which often contain some highly-expressed genes). Additionally, thecut
function in R sometimes returns bin edges with different rounding than the Seurat implementation since Seurat does not modify the defaultdig.lab = 3
. In contrast, I believe pandas uses the actual cutoffs in the data.