Remote FASTA reference support? #837

kyleabeauchamp · 2018-04-27T20:29:06Z

I noticed that samtools view -T http://genome.fa.gz my.cram fails with an exception:

[samfaipath] fail to read file http://genome.fa.gz.
http://genome.fa.gz: No such file or directory

I'm curious what the "optimal" workflow is for handling remote reference genomes with operating on CRAMs. Obviously I can do a manual copy of the reference first, but I'm just curious if that's the recommended workflow.

The text was updated successfully, but these errors were encountered:

daviesrob · 2018-05-01T17:03:42Z

No that doesn't work (even with a valid URL). The code that loads cram references from fasta files makes calls to functions like stat() and fopen() which only work on local files. There are plans to replace this with a version that uses hfile throughout which would allow loading over http.

It is possible to get references using the MD5 checksum via http if you have @SQ headers with M5 tags (cram files should have these). The default is to try to get them from the EBI server and then cache them locally (to reduce load on the EBI), but if you set REF_PATH to point to your own server that takes the MD5 and serves up the corresponding sequence it will work.

kyleabeauchamp · 2018-05-01T17:04:50Z

SGTM, I'll close this.

kyleabeauchamp closed this as completed May 1, 2018

dnil mentioned this issue Oct 23, 2024

cram file support? MariaNattestad/Ribbon#123

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote FASTA reference support? #837

Remote FASTA reference support? #837

kyleabeauchamp commented Apr 27, 2018

daviesrob commented May 1, 2018

kyleabeauchamp commented May 1, 2018

Remote FASTA reference support? #837

Remote FASTA reference support? #837

Comments

kyleabeauchamp commented Apr 27, 2018

daviesrob commented May 1, 2018

kyleabeauchamp commented May 1, 2018