Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LibOS] Use RW locks in the VMA tree #1794

Open
dimakuv opened this issue Mar 5, 2024 · 1 comment · May be fixed by #1795
Open

[LibOS] Use RW locks in the VMA tree #1794

dimakuv opened this issue Mar 5, 2024 · 1 comment · May be fixed by #1795

Comments

@dimakuv
Copy link
Contributor

dimakuv commented Mar 5, 2024

Problem

Multi-threaded workloads with many syscalls stress the VMA subsystem a lot, because almost all syscalls verify their buffers for read/write access using the following functions:

  • is_user_memory_readable()
  • is_user_memory_writable()
  • is_user_string_readable()
  • is_user_memory_writable_no_skip()

All these functions call test_user_memory() helper:

/*
* Tests whether whole range of memory `[addr; addr+size)` is readable, or, if `writable` is true,
* writable. The intended usage of this function is checking memory pointers passed to system calls.
* Note that this does not check the accesses to the memory themselves and is only meant to handle
* invalid syscall arguments (e.g. LTP test suite checks syscall arguments validation).
*/
static bool test_user_memory(const void* addr, size_t size, bool writable) {
if (!access_ok(addr, size)) {
return false;
}
return is_in_adjacent_user_vmas(addr, size, writable ? PROT_WRITE : PROT_READ);
}

This helper in turn calls the is_in_adjacent_user_vmas() func:

bool is_in_adjacent_user_vmas(const void* addr, size_t length, int prot) {
uintptr_t begin = (uintptr_t)addr;
uintptr_t end = begin + length;
assert(begin <= end);
struct adj_visitor_ctx ctx = {
.prot = prot,
.is_ok = true,
};
spinlock_lock(&vma_tree_lock);
bool is_continuous = _traverse_vmas_in_range(begin, end, adj_visitor, &ctx);
spinlock_unlock(&vma_tree_lock);
return is_continuous && ctx.is_ok;
}

The important part is spinlock_lock(&vma_tree_lock) and spinlock_unlock(&vma_tree_lock). On a multi-threaded app, this lock contention becomes the bottleneck.

Gramine introduced a workaround to sidestep this bottleneck, via the libos.check_invalid_pointers manifest option; it translates to the g_check_invalid_ptrs variable. However, this cannot be used in all cases:

  • Some runtimes like Java rely on being able to check invalid pointers. Thus they cannot set libos.check_invalid_pointers = false; this would lead to Java apps failing.
  • The is_user_memory_writable_no_skip() function does not honor the libos.check_invalid_pointers manifest option; this is because in certain situations Gramine really must decide whether the VMA is writable or read-only, see e.g. the ppoll() case which emulates how Linux works.

Solution

Use the RW lock that was previously introduced in the Gramine codebase: https://github.com/gramineproject/gramine/blob/master/libos/include/libos_rwlock.h

Example usage of this RW lock: f071450

Benchmark results

TODO

@dimakuv
Copy link
Contributor Author

dimakuv commented Mar 5, 2024

There is one problematic point:

  • The current VMA lock is a light-weight spinlock
  • The proposed RW VMA lock is more heavy-weight (on the slow path), as it uses PalEventSet() and PalEventWait()

Potentially, in contention cases, PalEventSet() and PalEventWait() perform a futex OCALL, which can outweigh the benefits of switching to the RW lock. On the other hand, it's typically better to sleep on a futex in a contention case. Plus these PAL APIs are optimized to elide the OCALL when possible.

@dimakuv dimakuv linked a pull request Mar 5, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant