Diffxpy returns log2fc and coef_mle values as dask array #223
Description
Hello,
thanks a lot for your package.
I'm currently facing an issue that after running walds test, the values for log2fc and coef_mle are returned as a dask array and not as a numeric value.
I'm using the following versions:
anndata 0.7.8
diffxpy 0.7.4+103.gceb79aa ig/batchglm_api_update
batchglm 0.7.4. dev_0
dask 2021.4.1
sparse 0.9.1
Batchglm and diffxpy of those branches because of this issue #93.
I'm running the following code:
adata.X=adata.layers["raw_counts"]
res=de.test.wald(data=adata, formula_loc="~ 1 + GROUP", factor_loc_totest="GROUP")
This prints:
The provided param_names are ignored as the design matrix is of type <class 'patsy.design_info.DesignMatrix'>.
The provided param_names are ignored as the design matrix is of type <class 'patsy.design_info.DesignMatrix'>.
/lippat00/python/diffxpy_packages/batchglm/batchglm/models/base_glm/utils.py:102: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set({'array.slicing.split_large_chunks': True}):
... array[indexer]
np.vstack([np.mean(densify(x[np.where(grouping == g)[0], :]), axis=0) for g in np.unique(grouping)])
/lippat00/python/diffxpy_packages/batchglm/batchglm/models/base_glm/utils.py:151: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set({'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set({'array.slicing.split_large_chunks': True}):
... array[indexer]
np.vstack([np.mean(densify(x[np.where(grouping == g)[0], :]), axis=0) for g in np.unique(grouping)])
/lippat00/python/diffxpy_packages/batchglm/batchglm/models/base_glm/utils.py:168: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
>>> with dask.config.set({'array.slicing.split_large_chunks': False}):
... array[indexer]
To avoid creating the large chunks, set the option
>>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
... array[indexer]
[np.mean(np.square(densify(x[np.where(grouping == g)[0], :])), axis=0) for g in np.unique(grouping)]
training location model: False
training scale model: True
iter 0: ll=70663625.678866
iter 1: ll=70663625.678866, converged: 0.00% (loc: 100.00%, scale update: False), in 0.00sec
iter 2: ll=50303650.242279, converged: 11.08% (loc: 11.08%, scale update: True), in 177.80sec
iter 3: ll=50303650.242279, converged: 11.08% (loc: 100.00%, scale update: False), in 0.00sec
iter 4: ll=50076239.645603, converged: 79.02% (loc: 79.02%, scale update: True), in 149.27sec
iter 5: ll=50076239.645603, converged: 79.02% (loc: 100.00%, scale update: False), in 0.00sec
iter 6: ll=50039459.932892, converged: 92.41% (loc: 92.41%, scale update: True), in 59.11sec
iter 7: ll=50039459.932892, converged: 92.41% (loc: 100.00%, scale update: False), in 0.00sec
iter 8: ll=50031801.602276, converged: 98.12% (loc: 98.12%, scale update: True), in 44.64sec
iter 9: ll=50031801.602276, converged: 98.12% (loc: 100.00%, scale update: False), in 0.00sec
iter 10: ll=50030709.007629, converged: 99.70% (loc: 99.70%, scale update: True), in 41.18sec
iter 11: ll=50030709.007629, converged: 99.70% (loc: 100.00%, scale update: False), in 0.00sec
iter 12: ll=50030695.725333, converged: 99.97% (loc: 99.97%, scale update: True), in 36.95sec
iter 13: ll=50030695.725333, converged: 99.97% (loc: 100.00%, scale update: False), in 0.00sec
iter 14: ll=50030652.029279, converged: 99.99% (loc: 99.99%, scale update: True), in 23.52sec
iter 15: ll=50030652.029279, converged: 99.99% (loc: 100.00%, scale update: False), in 0.00sec
iter 16: ll=50030652.029279, converged: 100.00% (loc: 100.00%, scale update: True), in 155.19sec
/mambaforge/envs/diffxpy/lib/python3.9/site-packages/dask/array/core.py:2912: RuntimeWarning: divide by zero encountered in true_divide
size = (limit / dtype.itemsize / largest_block) ** (1 / len(autos))
However res.summary()
then returns:
gene | pval | qval | log2fc | mean | zero_mean | grad | coef_mle | coef_sd | ll |
---|---|---|---|---|---|---|---|---|---|
AL627309.1 | 0.080515 | 0.237762 | [dask.array<getitem, shape=(), dtype=float64, ... | 0.001634 | False | 3.163008e-03 | [dask.array<getitem, shape=(), dtype=float64, ... | 6.267832e-01 | 0.000000 |
I also tried converting adata.X=adata.X.toarray()
which returns the same result.
Looking forward to your response, in case you need additional information, please let me know.
Best,
Patrick