Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running flight process #33

Closed
mmac0026 opened this issue Jan 23, 2023 · 6 comments
Closed

Error when running flight process #33

mmac0026 opened this issue Jan 23, 2023 · 6 comments

Comments

@mmac0026
Copy link

Hi! I am having an issue running rosella recover on a Linux server. I have (as far as I can tell) installed rosella and flight within a conda environment using the instructions given. I also have CoverM installed within the same environment.

I am passing a depth file from MetaBAT2 to the -i argument, and a contig coassembly FASTA file to -r.
I saw on another issue that the quality of the assembly may be to blame, and I'm pretty sure my assembly isn't great quality either... so that could be the issue 🤷‍♂️

[2023-01-23T22:43:07Z INFO  rosella] rosella version 0.4.2
[2023-01-23T22:43:07Z INFO  rosella] Using min-covered-fraction 0%
[2023-01-23T22:43:07Z INFO  rosella] Using min-read-aligned-percent 0%
[00:00:02] ████████████████████████████████████████   77524/77524   Read results from previous run. If this is not desired please rerun with --force... ETA: [0s]
[00:00:42] ⠈ Calculating UMAP embeddings and clustering...    3/6   
[2023-01-23T22:43:53Z ERROR bird_tool_utils::command] Error when running flight process. Exitstatus was : ExitSt
atus(unix_wait_status(256))
thread 'main' panicked at 'Failed to grab stderr from failed flight process', /home/conda/.cargo/registry/src/gi
thub.com-1ecc6299db9ec823/bird_tool_utils-0.3.0/src/command.rs:17:14

Here are my contig stats from BBMap:

A	C	G	T	N	IUPAC	Other	GC	GC_stdev
0.2697	0.2334	0.2317	0.2652	0.0000	0.0000	0.0000	0.4651	0.0586

Main genome scaffold total:         	77524
Main genome contig total:           	77524
Main genome scaffold sequence total:	69.275 MB
Main genome contig sequence total:  	69.275 MB  	0.000% gap
Main genome scaffold N/L50:         	23691/874
Main genome contig N/L50:           	23691/874
Main genome scaffold N/L90:         	64500/553
Main genome contig N/L90:           	64500/553
Max scaffold length:                	89.952 KB
Max contig length:                  	89.952 KB
Number of scaffolds > 50 KB:        	4
% main genome in scaffolds > 50 KB: 	0.40%


Minimum 	Number        	Number        	Total         	Total         	Scaffold
Scaffold	of            	of            	Scaffold      	Contig        	Contig  
Length  	Scaffolds     	Contigs       	Length        	Length        	Coverage
--------	--------------	--------------	--------------	--------------	--------
    All 	        77,524	        77,524	    69,274,591	    69,274,591	 100.00%
    250 	        77,524	        77,524	    69,274,591	    69,274,591	 100.00%
    500 	        77,524	        77,524	    69,274,591	    69,274,591	 100.00%
   1 KB 	        17,278	        17,278	    28,708,070	    28,708,070	 100.00%
 2.5 KB 	         1,661	         1,661	     6,906,158	     6,906,158	 100.00%
   5 KB 	           268	           268	     2,386,936	     2,386,936	 100.00%
  10 KB 	            39	            39	       862,445	       862,445	 100.00%
  25 KB 	             8	             8	       448,746	       448,746	 100.00%
  50 KB 	             4	             4	       275,586	       275,586	 100.00%

If you have any suggestions on what is happening here, that would be great. Thanks!

@rhysnewell
Copy link
Owner

Hi!

Sorry just seeing this now. Yeah, I would assume that this has to do with your assembly quality not being great. The python component (flight) is unable to generate any suitable clusters and just breaks. I'll work on getting some better error handling for these scenarios in future.

If you can get a better assembly produced and retry using rosella and see if the error happens again, please let me know.

Cheers,
Rhys

@rhysnewell
Copy link
Owner

Hi,

Just a follow up on this. I've noticed some dependency issues arising recently. Could you run flight independently of rosella? i.e.

flight bin --input MEGAHIT-group-MetaGeno-depth.txt --kmer_frequencies rosella_test/rosella_kmer_table.tsv --assembly MEGAHIT-group-MetaGeno.contigs.fa --cores 8 --output_directory rosella_test/

and check what the actual error is?

@maureenbug
Copy link

Hi!

jumping into this thread because I'm coming across the same issues. I ran the flight bin command and this is what I got:

(I'm using the CAMI dataset just to test that its installing correctly on our new cluster)
flight bin --input depth-low.txt --kmer_frequencies rosella_output/rosella_kmer_table.tsv --assembly CAMI_low_RL_S001__insert_270_GoldStandardAssembly.fasta --output_directory rosella_test/

and this is what I got:

05/23/2023 11:25:57 AM INFO: Time - 11:25:57 23-05-2023
05/23/2023 11:25:57 AM INFO: Command - /clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/bin/flight bin --input depth-low.txt --kmer_frequencies rosella_output/rosella_kmer_table.tsv --assembly CAMI_low_RL_S001__insert_270_GoldStandardAssembly.fasta --output_directory rosella_test/
Traceback (most recent call last):
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/bin/flight", line 8, in <module>
    sys.exit(main())
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/flight/flight.py", line 449, in main
    args.func(args)
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/flight/flight.py", line 565, in bin
    rosella = rosella_engine_constructor(args)
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/flight/flight.py", line 536, in rosella_engine_constructor
    from flight.rosella.rosella import Rosella
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/flight/rosella/rosella.py", line 48, in <module>
    from flight.rosella.validating import Validator
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/flight/rosella/validating.py", line 49, in <module>
    from flight.rosella.clustering import Clusterer, iterative_clustering_static, kmeans_cluster
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/flight/rosella/clustering.py", line 56, in <module>
    from flight.rosella.binning import Binner
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/flight/rosella/binning.py", line 43, in <module>
    import skbio.stats.composition
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/skbio/__init__.py", line 11, in <module>
    import skbio.io  # noqa
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/skbio/io/__init__.py", line 243, in <module>
    import_module('skbio.io.format.clustal')
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/skbio/io/format/clustal.py", line 148, in <module>
    from skbio.alignment import TabularMSA
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/skbio/alignment/__init__.py", line 204, in <module>
    from ._pairwise import (
  File "/clusterfs/jgi/groups/science/homes/mberg/.micromamba/envs/flight/lib/python3.9/site-packages/skbio/alignment/_pairwise.py", line 15, in <module>
    from skbio.alignment._ssw_wrapper import StripedSmithWaterman
  File "skbio/alignment/_ssw_wrapper.pyx", line 1, in init skbio.alignment._ssw_wrapper
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I had problems installing rosella with the instructions, so I went and installed flight first (with their .yml file), and then installed rosella via the instructions.

thanks in advance!
maureen

@rhysnewell
Copy link
Owner

Hi all, sorry for the delay on this.

I've managed to get a working environment for the current version of rosella on the main branch by using the following yaml:

channels:
  - conda-forge
  - numba
  - bioconda
  - defaults
dependencies:
  - pip
  - gcc
  - cxx-compiler
  - numba
  - numpy<=1.21
  - rosella==0.4.2
  - checkm-genome==1.1.3
  - flight-genome>=1.5.0
  - joblib==1.1.0 # For https://github.com/scikit-learn-contrib/hdbscan/pull/563 which is used by flight. Can remove when hdbscan releases past 0.8.28
  - scikit-bio>=0.5.7
  - seaborn
  - umap-learn >= 0.5.3
  - scipy <= 1.8.1
  - pandas >= 1.3
  - pynndescent >= 0.5.7
  - hdbscan >= 0.8.28
  - imageio
  - matplotlib
  - tqdm
  - tbb
  - joblib
  - pebble
  - scikit-learn==1.0.2
  - threadpoolctl
  - biopython

Let me know if this fixes the C header issue for you

Cheers,
Rhys

@maureenbug
Copy link

Hi!

Just following up that the new yaml file worked great, and it runs perfect now!

thanks so much!

@rhysnewell
Copy link
Owner

Flight error logs should be exposed in v0.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants