Diffxpy returns log2fc and coef_mle values as dask array

Hello,
thanks a lot for your package.

I'm currently facing an issue that after running walds test, the values for log2fc and coef_mle are returned as a dask array and not as a numeric value. 

I'm using the following versions:
anndata                   0.7.8
diffxpy                   0.7.4+103.gceb79aa  ig/batchglm_api_update       
batchglm                  0.7.4.   dev_0
dask                      2021.4.1
sparse                    0.9.1

Batchglm and diffxpy of those branches because of this issue #93.

I'm running the following code:

`adata.X=adata.layers["raw_counts"]`
`res=de.test.wald(data=adata,
                formula_loc="~ 1 + GROUP",
                factor_loc_totest="GROUP")`

This prints:

The provided param_names are ignored as the design matrix is of type <class 'patsy.design_info.DesignMatrix'>.
The provided param_names are ignored as the design matrix is of type <class 'patsy.design_info.DesignMatrix'>.
/lippat00/python/diffxpy_packages/batchglm/batchglm/models/base_glm/utils.py:102: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  np.vstack([np.mean(densify(x[np.where(grouping == g)[0], :]), axis=0) for g in np.unique(grouping)])
/lippat00/python/diffxpy_packages/batchglm/batchglm/models/base_glm/utils.py:151: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  np.vstack([np.mean(densify(x[np.where(grouping == g)[0], :]), axis=0) for g in np.unique(grouping)])
/lippat00/python/diffxpy_packages/batchglm/batchglm/models/base_glm/utils.py:168: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  [np.mean(np.square(densify(x[np.where(grouping == g)[0], :])), axis=0) for g in np.unique(grouping)]

training location model: False
training scale model: True
iter   0: ll=70663625.678866
iter   1: ll=70663625.678866, converged: 0.00% (loc: 100.00%, scale update: False), in 0.00sec
iter   2: ll=50303650.242279, converged: 11.08% (loc: 11.08%, scale update: True), in 177.80sec
iter   3: ll=50303650.242279, converged: 11.08% (loc: 100.00%, scale update: False), in 0.00sec
iter   4: ll=50076239.645603, converged: 79.02% (loc: 79.02%, scale update: True), in 149.27sec
iter   5: ll=50076239.645603, converged: 79.02% (loc: 100.00%, scale update: False), in 0.00sec
iter   6: ll=50039459.932892, converged: 92.41% (loc: 92.41%, scale update: True), in 59.11sec
iter   7: ll=50039459.932892, converged: 92.41% (loc: 100.00%, scale update: False), in 0.00sec
iter   8: ll=50031801.602276, converged: 98.12% (loc: 98.12%, scale update: True), in 44.64sec
iter   9: ll=50031801.602276, converged: 98.12% (loc: 100.00%, scale update: False), in 0.00sec
iter  10: ll=50030709.007629, converged: 99.70% (loc: 99.70%, scale update: True), in 41.18sec
iter  11: ll=50030709.007629, converged: 99.70% (loc: 100.00%, scale update: False), in 0.00sec
iter  12: ll=50030695.725333, converged: 99.97% (loc: 99.97%, scale update: True), in 36.95sec
iter  13: ll=50030695.725333, converged: 99.97% (loc: 100.00%, scale update: False), in 0.00sec
iter  14: ll=50030652.029279, converged: 99.99% (loc: 99.99%, scale update: True), in 23.52sec
iter  15: ll=50030652.029279, converged: 99.99% (loc: 100.00%, scale update: False), in 0.00sec
iter  16: ll=50030652.029279, converged: 100.00% (loc: 100.00%, scale update: True), in 155.19sec
/mambaforge/envs/diffxpy/lib/python3.9/site-packages/dask/array/core.py:2912: RuntimeWarning: divide by zero encountered in true_divide
  size = (limit / dtype.itemsize / largest_block) ** (1 / len(autos))

However `res.summary()` then returns:

gene | pval | qval | log2fc | mean | zero_mean | grad | coef_mle | coef_sd | ll
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
AL627309.1 | 0.080515 | 0.237762 | [dask.array<getitem, shape=(), dtype=float64, ... | 0.001634 | False | 3.163008e-03 | [dask.array<getitem, shape=(), dtype=float64, ... | 6.267832e-01 | 0.000000

I also tried converting `adata.X=adata.X.toarray()` which returns the same result.

Looking forward to your response, in case you need additional information, please let me know.

Best,
Patrick




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diffxpy returns log2fc and coef_mle values as dask array #223

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development