Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote FASTA reference support? #837

Closed
kyleabeauchamp opened this issue Apr 27, 2018 · 2 comments
Closed

Remote FASTA reference support? #837

kyleabeauchamp opened this issue Apr 27, 2018 · 2 comments

Comments

@kyleabeauchamp
Copy link

I noticed that samtools view -T http://genome.fa.gz my.cram fails with an exception:

[samfaipath] fail to read file http://genome.fa.gz.
http://genome.fa.gz: No such file or directory

I'm curious what the "optimal" workflow is for handling remote reference genomes with operating on CRAMs. Obviously I can do a manual copy of the reference first, but I'm just curious if that's the recommended workflow.

@daviesrob
Copy link
Member

No that doesn't work (even with a valid URL). The code that loads cram references from fasta files makes calls to functions like stat() and fopen() which only work on local files. There are plans to replace this with a version that uses hfile throughout which would allow loading over http.

It is possible to get references using the MD5 checksum via http if you have @SQ headers with M5 tags (cram files should have these). The default is to try to get them from the EBI server and then cache them locally (to reduce load on the EBI), but if you set REF_PATH to point to your own server that takes the MD5 and serves up the corresponding sequence it will work.

@kyleabeauchamp
Copy link
Author

SGTM, I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants