Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flight error #36

Closed
greenmna opened this issue Mar 8, 2023 · 7 comments
Closed

Flight error #36

greenmna opened this issue Mar 8, 2023 · 7 comments

Comments

@greenmna
Copy link

greenmna commented Mar 8, 2023

Hello there!

I have been trying to use Rosella for a while now, but I keep persistently running into issues when trying to execute it. I've lost count of how many fixes I've tried.

For context, I have metagenomic assemblies from short and long read data. Prior to running Rosella, I first create a mapping using minimap2-ont for long reads or bwa-mem for short reads. These mappings then go through some conversion and sorting with samtools before I take the sorted bam files and run them through CoverM. Everything up to this point works great, no errors, etc.

Now the issue is when trying to execute Rosella. When I do, I get the following error: ERROR bird_tool_utils::command] Error when running flight process. Exitstatus was : ExitStatus(unix_wait_status(256)) thread 'main' panicked at 'Failed to grab stderr from failed flight process', /home/conda/.cargo/registry/src/github.com-1ecc6299db9ec823/bird_tool_utils-0.3.0/src/command.rs:17:14 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

This is an error I've seen in multiple issues recently when perusing what issues are active, but near as I can tell it's just a generic output because you must use Rust and the mentioned RUST_BACKTRACE=1 to get the error itself (which is not the most convenient when you're trying to debug an issue, but it is what it is).

As mentioned, I've gone through numerous attempted fixes and I will post my current working environment that's gotten me the closest success, including the pip and conda package lists at the end. The fixes have varied, but many involve some annoying conflicts that stem from dependencies (notably either scikit-bio, numba, numpy, hdbscan, etc.). From what I could tell, they all somewhat go back to numpy because of an issue with a C header expecting some number of bytes and getting a different value from the PyObject (mentioned in Issue #35).

From what I can tell, I've seemed to solve that problem, but now a new one has arisen. When trying to run flight bin using a coverage file and a kmer table (an older one admittedly), I get this error:
Traceback (most recent call last): File "/home/noah/miniconda3/envs/rosellaEnvironment/bin/flight", line 8, in <module> sys.exit(main()) File "/home/noah/miniconda3/envs/rosellaEnvironment/lib/python3.8/site-packages/flight/flight.py", line 449, in main args.func(args) File "/home/noah/miniconda3/envs/rosellaEnvironment/lib/python3.8/site-packages/flight/flight.py", line 566, in bin rosella.perform_binning(args) File "/home/noah/miniconda3/envs/rosellaEnvironment/lib/python3.8/site-packages/flight/rosella/rosella.py", line 242, in perform_binning self.filter() File "/home/noah/miniconda3/envs/rosellaEnvironment/lib/python3.8/site-packages/flight/rosella/embedding.py", line 96, in filter initial_disconnections = self.check_contigs(self.large_contigs['tid'], minimum_connections, close_check) File "/home/noah/miniconda3/envs/rosellaEnvironment/lib/python3.8/site-packages/flight/rosella/embedding.py", line 190, in check_contigs for x in index_rho.neighbor_graph[1][idx, 1:(minimum_connections + 1)]) # exclude first index since it is to itself IndexError: index 251 is out of bounds for axis 0 with size 251

Now here's the weirdest part of the issues: I attempted to run Rosella using the same code steps mentioned earlier on a dataset where it did run successfully, and it worked! I am at a total loss for why it fails on some and doesn't on others.

Any insight would be greatly appreciated.

Pip package list.txt
rosella environment package list.txt

@willem-stock
Copy link

I'm not able to help with your issue, but I'm keen to know how you resolved the numpy issue. could you share that fix? thank you.

@rhysnewell
Copy link
Owner

Hey,

I'm very sorry that I missed this. I'll spend some more time looking into this. I've been pressed for time with work and have let repo slip into disarray it seems.

I'll work on making error messages more legible for users so they can gain access to flight errors.

In our pipelines we mainly access rosella via Aviary, which is a snakemake workflow wrapper that handles the installation of rosella for the user. This seems to work in most cases, so it is frustrating that things fall down when users try to install it on their own.

As for your current error, can I get you to run some basic stats on your assembly using stats.sh from bbmap and post them here?

Cheers,
Rhys

@greenmna
Copy link
Author

greenmna commented May 9, 2023

Hello @rhysnewell, sorry for taking so long to get back to you on this.

I see the value in the snakemake workflow. I am wondering if maybe the issue is happening due to conda not always playing nice with programs that are installed with pip.

I ran stats.sh on one of the assemblies that it did not work on and here's the result
image

The exact code I ran was conda run -n <environment> stats.sh <assembly>.fasta

I have not yet tried to run Rosella recently, though I would like to return to it provided I can get it to work.

Thank you for the assistance!

@rhysnewell
Copy link
Owner

Hi @greenmna,

No worries! I took ages to get back to you in the first place, github notifications can be really busted sometimes.

I see why flight would fail on that assembly. It looks like a really good assembly, but it has a really minimal amount of contigs. The particular part that flight is failing on expects there to be at least 250 contigs in the assembly, which your assembly does not have. This is an oversight on my end for sure.

I've opened a new branch and have made some significant changes to the code and workflow of Rosella in ISS-36, the main idea of this branch is to eliminate the need for any python backing library as was requested here: #25

The Rust ecosystem for UMAP has evolved to a degree where I think we can move away from Python, thus eliminating all of these errors people are having with flight. The numpy, UMAP, hdbscan, scikit, skbio web of dependencies has broken on me too many times and I really do wish to be done with it for good.

I would not yet recommend porting over to the new branch (it does work, but many features are missing), but I'm just letting you know that work is being done and I hope to have a more stable version to show you in the near future. So please stay posted and await the good news.

Cheers,
Rhys

@greenmna
Copy link
Author

greenmna commented May 10, 2023

Thanks for such a quick reply @rhysnewell 👍

I did not realize that flight actually expected a minimum number of contigs, which is surprising. The assembly I have is actually from a simulated metagenome with only 10 genomes making it up. I would imagine for real samples this wouldn't be as big an issue given there would be far more contigs from de novo assembly that would easily meet the threshold.

I am excited to hear of the development and look forward to the release of the new branch! Thank you for doing this, Rosella is such a good program!

Best of luck and much thanks,
Noah

@rhysnewell
Copy link
Owner

Thank you for your kind words, that's very uplifting to hear you say!

And yes, luckily the minimum contig count limitation won't have to stick around in future (fingers crossed).

Cheers,
Rhys

@rhysnewell
Copy link
Owner

v0.5.0 should fix this issue if you create a new conda environment and install from the bioconda channel. If you still get an error from flight, please open a new issue and post the error log that gets produced. The flight error should be exposed to users now. Although I believe this run will still fail due to low contig count

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants