You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running "make check", involving a lot of OSv boots (each test is done in a separate boot), I virtually always, see a hang of one of the tests (apparently, each time a different one, at random).
The hang happens because of a segfault we have in sched::start_early_threads(). The relevant part of the stack trace: #3 0x000000000030e05a in mmu::vm_sigsegv (addr=addr@entry=0,
ef=ef@entry=0xffffc0003fdfc008) at /home/nyh/osv/core/mmu.cc:904
#4 0x000000000030e13b in mmu::vm_fault (addr=, addr@entry=8,
ef=ef@entry=0xffffc0003fdfc008) at /home/nyh/osv/core/mmu.cc:916
#5 0x00000000003345d5 in page_fault (ef=0xffffc0003fdfc008)
at /home/nyh/osv/arch/x64/mmu.cc:35
#6 #7 sched::start_early_threads () at /home/nyh/osv/core/sched.cc:1165 #8 0x000000000035387f in sched::cpu::idle (this=0xffffc0003ffe0000)
at /home/nyh/osv/core/sched.cc:370
The crash always happens in the same place.
The crashing line (1165 in sched.cc) is:
t->remote_thread_local_var(s_current) = t;
The text was updated successfully, but these errors were encountered:
In issue #145 I reported a crash during boot in start_early_threads().
I wasn't actually able to replicate this bug on master, but it happens
quite frequently (e.g., on virtually every "make check" run) with some
patches of mine that seem unrelated to this bug.
The problem is that start_early_threads() (added in 63216e8)
iterates on the threads in the thread list, and uses
t->remote_thread_local_var() for each thread. This can only work if
the thread has its TLS initialized, but unfortunately in thread's
constructor we first added the new thread to the list, and only later
called setup_tcb() (which allocates and initializes the TLS). If we're
unlucky, start_early_threads() can find a thread on the list which still
doesn't have its TLS allocated, so remote_thread_local_var() will crash.
The simple fix is to switch the order of the construction: First
set up the new thread's TLS, and only then add it to the list of
threads.
Fixes#145.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
When running "make check", involving a lot of OSv boots (each test is done in a separate boot), I virtually always, see a hang of one of the tests (apparently, each time a different one, at random).
The hang happens because of a segfault we have in sched::start_early_threads(). The relevant part of the stack trace:
#3 0x000000000030e05a in mmu::vm_sigsegv (addr=addr@entry=0,
#4 0x000000000030e13b in mmu::vm_fault (addr=, addr@entry=8,
#5 0x00000000003345d5 in page_fault (ef=0xffffc0003fdfc008)
#6
#7 sched::start_early_threads () at /home/nyh/osv/core/sched.cc:1165
#8 0x000000000035387f in sched::cpu::idle (this=0xffffc0003ffe0000)
The crash always happens in the same place.
The crashing line (1165 in sched.cc) is:
t->remote_thread_local_var(s_current) = t;
The text was updated successfully, but these errors were encountered: