Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSv often crashes during boot #145

Closed
nyh opened this issue Jan 1, 2014 · 2 comments
Closed

OSv often crashes during boot #145

nyh opened this issue Jan 1, 2014 · 2 comments

Comments

@nyh
Copy link
Contributor

nyh commented Jan 1, 2014

When running "make check", involving a lot of OSv boots (each test is done in a separate boot), I virtually always, see a hang of one of the tests (apparently, each time a different one, at random).

The hang happens because of a segfault we have in sched::start_early_threads(). The relevant part of the stack trace:
#3 0x000000000030e05a in mmu::vm_sigsegv (addr=addr@entry=0,

ef=ef@entry=0xffffc0003fdfc008) at /home/nyh/osv/core/mmu.cc:904

#4 0x000000000030e13b in mmu::vm_fault (addr=, addr@entry=8,

ef=ef@entry=0xffffc0003fdfc008) at /home/nyh/osv/core/mmu.cc:916

#5 0x00000000003345d5 in page_fault (ef=0xffffc0003fdfc008)

at /home/nyh/osv/arch/x64/mmu.cc:35

#6
#7 sched::start_early_threads () at /home/nyh/osv/core/sched.cc:1165
#8 0x000000000035387f in sched::cpu::idle (this=0xffffc0003ffe0000)

at /home/nyh/osv/core/sched.cc:370

The crash always happens in the same place.

The crashing line (1165 in sched.cc) is:
t->remote_thread_local_var(s_current) = t;

@nyh
Copy link
Contributor Author

nyh commented Jan 1, 2014

Hmm, maybe a false alarm? Now I don't see this bug on master. Need to test some more...

@nyh
Copy link
Contributor Author

nyh commented Jan 1, 2014

I was wrong, this bug doesn't happen in master.

@nyh nyh closed this as completed Jan 1, 2014
penberg pushed a commit that referenced this issue Jan 2, 2014
In issue #145 I reported a crash during boot in start_early_threads().
I wasn't actually able to replicate this bug on master, but it happens
quite frequently (e.g., on virtually every "make check" run) with some
patches of mine that seem unrelated to this bug.

The problem is that start_early_threads() (added in 63216e8)
iterates on the threads in the thread list, and uses
t->remote_thread_local_var() for each thread. This can only work if
the thread has its TLS initialized, but unfortunately in thread's
constructor we first added the new thread to the list, and only later
called setup_tcb() (which allocates and initializes the TLS). If we're
unlucky, start_early_threads() can find a thread on the list which still
doesn't have its TLS allocated, so remote_thread_local_var() will crash.

The simple fix is to switch the order of the construction: First
set up the new thread's TLS, and only then add it to the list of
threads.

Fixes #145.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant