-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MEM/SMEM in target coordinate space #4
Comments
Possible, but will be times slower. |
What about a single (random) position? |
I would be happy also with a flag to emit all (ref_name,ref_positions) hits, highlighting in the flag documentation "super slow".
…________________________________
From: Erik Garrison ***@***.***>
Sent: Wednesday, October 2, 2024 21:51
To: lh3/ropebwt3 ***@***.***>
Cc: Andrea Guarracino ***@***.***>; Author ***@***.***>
Subject: Re: [lh3/ropebwt3] MEM/SMEM in target coordinate space (Issue #4)
What about a single (random) position?
—
Reply to this email directly, view it on GitHub<#4 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AO26XHTGC4GVTKBZ3XLJF63ZZSWKHAVCNFSM6AAAAABPITDWBKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJQGM4TCNRVHA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Finding one position will be faster, but with a naive implementation it will probably be slower than MEM finding itself. There should be faster ways with standard suffix array samples. I will try. r-index or sr-index will be faster but will take a lot more effort. |
Just added |
The positions in output are unpredictable and should be close to, but not exactly, random. |
I tried, but I get a seg fault when using ropebwt3 build scerevisiae8.fa -bo scerevisiae8.fmr
[M::main_build::0.195*1.01] read 192511286 symbols from file 'scerevisiae8.fa'
[M::main_build::9.092*2.23] constructed partial BWT for 192511286 symbols
[M::main_build::9.461*2.23] encoded the partial BWT for 192511286 symbols
[M::main] Version: 3.7-r244-dirty
[M::main] CMD: ropebwt3 build -bo scerevisiae8.fmr scerevisiae8.fa
[M::main] Real time: 9.531 sec; CPU: 21.201 sec; Peak RSS: 2.175 GB ropebwt3 mem -l15 scerevisiae8.fmr 1read.fq
[M::mr_restore] ($, A, C, G, T, N) = (272, 59461631, 36672377, 36672377, 59461631, 242998)
[M::rb3_fmi_load_all::0.038*1.03] loaded the BWT
simulated.1 14 30 8
simulated.1 16 32 8
simulated.1 39 54 7
[M::worker_pipeline::0.039*1.03] processed 1 sequences
[M::main] Version: 3.7-r244-dirty
[M::main] CMD: ropebwt3 mem -l15 scerevisiae8.fmr 1read.fq
[M::main] Real time: 0.045 sec; CPU: 0.046 sec; Peak RSS: 0.062 GB ropebwt3 mem -l15 scerevisiae8.fmr 1read.fq -p 1
[M::mr_restore] ($, A, C, G, T, N) = (272, 59461631, 36672377, 36672377, 59461631, 242998)
[M::rb3_fmi_load_all::0.040*1.04] loaded the BWT
[1] 51729 segmentation fault (core dumped) ropebwt3 mem -l15 scerevisiae8.fmr 1read.fq -p With gdb --args ropebwt3 mem -l15 scerevisiae8.fmr 1read.fq -p 1
GNU gdb (Ubuntu 14.0.50.20230907-0ubuntu1) 14.0.50.20230907-git
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ropebwt3...
(gdb) run
Starting program: ropebwt3 mem -l15 scerevisiae8.fmr 1read.fq -p 1
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) n
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[M::mr_restore] ($, A, C, G, T, N) = (272, 59461631, 36672377, 36672377, 59461631, 242998)
[M::rb3_fmi_load_all::0.044*1.05] loaded the BWT
[New Thread 0x7ffff38bc6c0 (LWP 51861)]
[New Thread 0x7fffebfff6c0 (LWP 51862)]
[New Thread 0x7ffff30bb6c0 (LWP 51863)]
[New Thread 0x7ffff28ba6c0 (LWP 51864)]
Thread 4 "ropebwt3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff30bb6c0 (LWP 51863)]
ssa_add_intv (ssa=ssa@entry=0x0, aux=aux@entry=0x7ffff30bacf0, lo=lo@entry=147357453, hi=hi@entry=147357461, off=off@entry=0) at ssa.c:141
141 int64_t k = ((lo - m) >> ssa->ss << ssa->ss) + m;
(gdb) bt
#0 ssa_add_intv (ssa=ssa@entry=0x0, aux=aux@entry=0x7ffff30bacf0, lo=lo@entry=147357453, hi=hi@entry=147357461, off=off@entry=0) at ssa.c:141
#1 0x000055555558ff7f in rb3_ssa_multi (km=0x7fffec008fe0, f=f@entry=0x7fffffffd420, ssa=0x0, lo=147357453, hi=147357461, max_sa=<optimized out>, sa=0x7ffff18b7dc8) at ssa.c:168
#2 0x000055555559264b in worker_for_seq (data=<optimized out>, i=<optimized out>, tid=<optimized out>) at search.c:127
#3 0x000055555557d3c3 in ktf_worker (data=0x7fffec0090e0) at kthread.c:47
#4 0x00007ffff7c97b5a in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
#5 0x00007ffff7d285fc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 |
Have you generated suffix array samples and the sequence names? See README. Of course, ropebwt3 should throw an error rather than segfault. |
Oops, thanks! After also creating a sampled suffix array and a list of sequence names and lengths, I get the expected output: ropebwt3 mem -l15 scerevisiae8.fmd 1read.fq -p 3 | column -t
simulated.1 14 30 8 3 SGDref#1#chrXIV:+:458823 DBVPG6765#1#chrXIV:+:446379 UWOPS034614#1#chrXIV:+:446906
simulated.1 16 32 8 3 YPS128#1#chrX:+:327075 Y12#1#chrX:+:330139 SGDref#1#chrX:+:336423
simulated.1 39 54 7 3 Y12#1#chrXIV:-:560052 DBVPG6044#1#chrXIV:-:542558 S288C#1#chrXIV:-:543079 |
Great! Thanks for the confirmation. I will close this issue. Let me know if you see problems. |
Thanks @lh3, this is extremely useful! |
@lh3, what does it mean when the reported reference position in -1? A00404:155:HV27LDSXX:3:2515:17011:18552/2 121 150 89 1 NA20129#1#JAHEPE010000077.1:3240254-3317239:+:-1 |
This is a bug. Now fixed at HEAD. |
FYI: the latest v3.8 release now has this feature. Thanks again for the suggestion. |
For each MEM/SMEM found in a query, is it also possible to get in output in which sequences and positions of the pangenome it was found?
The text was updated successfully, but these errors were encountered: