Skip to content

Possible solutions for GRNBoost2/GENIE3 Dask issues #163

Open
@cflerin

Description

@cflerin

A recurring problem is that the GRN inference step of pySCENIC (using Arboreto's GRNBoost2/GENIE3 implementation) fails to complete successfully. This seems to be due to issues with newer Dask releases being incompatible with the existing GRNBoost2/GENIE3 implementation.

Possible errors

  • ValueError: Metadata mismatch found in from_delayed
  • Expected partition of type DataFrame but got NoneType
  • ValueError: tuple is not allowed for map key
  • ...

Possible solutions

  1. In many cases using an older version of the dask/distributed packages can help to fix this. This is ideally accomplished using the Docker images, which already contain the stable versions of these packages (see here for usage details). Or, to install these via pip:
    pip install dask==1.0.0 distributed'>=1.21.6,<2.0.0'
    
  1. Another option is to use a helper script (arboreto_with_multiprocessing.py) that runs the Arboreto GRN algorithms (GRNBoost2, GENIE3) without Dask for compatibility.
    See here, or the basic usage is:

    arboreto_with_multiprocessing.py \
        expr_mat.loom \
        allTFs_hg38.txt \
        --output adj.tsv \
        --num_workers 20 \
    

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions