Lazily allocate thread stacks

We currently allocate for each pthread, by default, a 1 MB stack. This allocation is done with mmap with mmu::mmap_populate, i.e., we force the entire allocation to be done up-front. This is a terrible waste: thread stacks are an excellent candidate for lazy allocation - they cannot use huge-pages anyway (they are less than 2MB, and have a guard page), and they often use much less memory than the full 1MB allotment.

Unfortunately, OSv crashes with lazy-population of stack threads, e.g., in tst-pipe.so, as explained in commit 41efdc1cb8b192 which made the allocation immediate (with mmu::mmap_populate). The problem is that with lazy allocation of stacks, _any_ code can cause us to need to allocate one more page for the stack, and cause a page fault, which may sleep (waiting on a mutex for memory allocation, for example). The crash happens when this happens in some OSv "kernel" code which has preemption disabled or irq disabled but runs on the user's stack (e.g., RCU lock, wait_until, the scheduler, or numerous other places), and a sleep is not allowed.

On the OSv mailing list, I proposed the following workaround: In preempt_disable() and irq_disable(), before we disable preemption, read a byte 4096 bytes ahead of the current stack top. Usually, this will add a tiny performance penalty (a single read, not even a if()), but if the next page of the stack is not yet allocated, it will cause the page fault to be taken here and now, before we disable preemption.
We also need to read a byte from the stack in thread::init_stack(), because new threads start with preemption disabled so we need them to start with a minimal part of the stack allocated.

If we want more than 4096 bytes of available stack to be guaranteed, we can read several bytes at 4096-byte stride, but this is slower than reading a single byte. Instead, we can consider doing this: read just one byte a few pages from the current stack position, and modify the page-fault handler to allocate more than one adjacent page on each page fault.

On the mailing list, Avi Kivity suggested other options to solve our lazily-allocated stack problem:
- insert "thunk code" between user and kernel code that switches the stacks to known resident stacks.  We could abuse the elf linker code to do that for us, at run time.
- use -fsplit-stack to allow a dynamically allocated, discontiguous stack on physical memory


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily allocate thread stacks #143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development