Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cram input necessary? #88

Closed
gbdias opened this issue Apr 22, 2024 · 4 comments · Fixed by #96
Closed

cram input necessary? #88

gbdias opened this issue Apr 22, 2024 · 4 comments · Fixed by #96
Assignees
Labels
enhancement Improvement of the existing features user request Requests made by users and public
Milestone

Comments

@gbdias
Copy link

gbdias commented Apr 22, 2024

I was wondering if the cram input requirement for HiC data alignment makes sense since the first step of the pipeline is to convert back to fastq. Would it be ok to also accept fastq as input and skip this step?

// Convert from CRAM to FASTQ

@muffato muffato added this to the 1.3.0 milestone Apr 23, 2024
@muffato muffato added enhancement Improvement of the existing features user request Requests made by users and public labels Apr 23, 2024
@muffato
Copy link
Member

muffato commented Apr 23, 2024

Hi @gbdias . You're absolutely right

@yumisims
Copy link

@gbdias We use CRAM files as input for several reasons:
(1) to save storage, which is very obvious.
(2) CRAM files can be segmented into 10k containers per batch for streaming in BWA-MEM2/Minimap2 using cram_filter, allowing us to parallelize the process. This significantly enhances the alignment performance.

@gbdias
Copy link
Author

gbdias commented Apr 24, 2024

hi @yumisims thanks for the info. I can appreciate the advantage of streaming CRAM to bwa mem. I was just confused because the current code doesn't seem to use CRAM like that, in which case taking fastq directly would be faster.

@muffato
Copy link
Member

muffato commented Apr 25, 2024

You're right @gbdias, the pipeline here doesn't implement the parallel aligner @yumisims is talking about, which requires CRAM, and is currently implemented in https://github.com/sanger-tol/treeval

@muffato muffato removed the enhancement Improvement of the existing features label Jun 1, 2024
@muffato muffato added the enhancement Improvement of the existing features label Jun 17, 2024
@reichan1998 reichan1998 self-assigned this Jun 28, 2024
@muffato muffato moved this from Todo to In Progress in Genome After Party Jul 12, 2024
@muffato muffato linked a pull request Jul 12, 2024 that will close this issue
9 tasks
@tkchafin tkchafin moved this from In Progress to Done in Genome After Party Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement of the existing features user request Requests made by users and public
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants