Tags: bjackman/linux
Tags
SQUASHME: mm: asi: keep ASI domains initialized until the mm_struct i… …s destroyed BJ: In this branch, this is a bugfix. Otherwise we're taking a mutex from asi_destroy() when we are atomic. Currently, ASI domains are initialized individually when needed, and destroyed when no longer being used. Multiple initializations are allowed, and an init_count is used to keep track of when the domain should be destroyed. This design allows for an ASI domain of a specific class to be initialized, destroyed, and re-initialized again by the same process. However, the common case is that once an ASI domain is initialized, it will remain in use until the end of the process lifetime (or shortly before then). Remove this complexity by keeping initialized ASI domains alive until the containing mm_struct is being destroyed (after mm->mm_count drops to zero). __asi_destroy() is mostly emptied, we no longer need to increment the TLB gen to make sure the TLB is flushed if the ASI domain is re-initialized. asi_destroy() is replaced with asi_destroy_mm_state(), which destroys all ASI domains in an mm_struct. asi_destroy_mm_state() can be called from unsleepable contexts and cannot hold mm->asi_init_lock, so make sure asi_ini() could only be called from a process where current->mm == asi->mm. This guarantees that we cannot race with asi_destroy_mm_state(), which is only executed after all users of the mm_struct are gone. Ideally, asi_destroy_mm_state() is cheap enough that it doesn't impact the process exit path. Keeping the initialized ASI domains around has two effects: (a) In a following change these domains will be dynamically allocated, by delaying their destruction until the mm_struct is destroyed we miss a chance to free their memory earlier. However, the size of struct asi is trivial, and the window between an ASI domain going out of use and the destruction of mm_struct is expected to be small. (b) Keeping mm->asi[*] initialized when the ASI domain is no longer used means that we will unnecessarily flush the TLB in that ASI domain in asi_tlb_flush_one_user() -> asi_invpcid_nonsensitive_one(). However, this is probably fine because it is a single address flush, and the window between an ASI domain going out of use and the destruction of mm_struct is expected to be small. The goal of this is beyond code simplification. Incoming changes will support context switching and exiting to userspace without exiting ASI in some cases, which means that arbitrary kernel code can be run in an ASI domain. This requires a protection mechanism to make sure that ASI domains are not destroyed while they are being used or referenced. Tying the lifetime to ASI domains sidesteps this problem. As long as a process is running in an ASI domain, it naturally holds a ref to the containing mm_struct (through task->mm or task->active_mm). This means that the ASI domain cannot be destroyed. To enforce this, add a warning in asi_enter() if asi->mm is not the same as current->mm. This applies to ASI domains retrieved with asi_get_current() as long as preemption is disabled. If preemption is enabled, task->active_mm may change (e.g. for kthreads), so the ASI domain may be destroyed if the containing mm_struct is destroyed. Add a comment to document this and a warning to enforce it. Stop using asi_get_current() in asi_in_nonsensitive() to avoid the warning if preemption is enabled. asi_in_nonsensitive() is inherently racy and does not access the ASI domain. It also applies naturally to ASI domains retrieved through mm->asi[*], assuming the retriever is naturally holding a ref to the mm_struct before dereferncing it. Suggested-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
SQUASHME: mm: asi: keep ASI domains initialized until the mm_struct i… …s destroyed BJ: In this branch, this is a bugfix. Otherwise we're taking a mutex from asi_destroy() when we are atomic. Currently, ASI domains are initialized individually when needed, and destroyed when no longer being used. Multiple initializations are allowed, and an init_count is used to keep track of when the domain should be destroyed. This design allows for an ASI domain of a specific class to be initialized, destroyed, and re-initialized again by the same process. However, the common case is that once an ASI domain is initialized, it will remain in use until the end of the process lifetime (or shortly before then). Remove this complexity by keeping initialized ASI domains alive until the containing mm_struct is being destroyed (after mm->mm_count drops to zero). __asi_destroy() is mostly emptied, we no longer need to increment the TLB gen to make sure the TLB is flushed if the ASI domain is re-initialized. asi_destroy() is replaced with asi_destroy_mm_state(), which destroys all ASI domains in an mm_struct. asi_destroy_mm_state() can be called from unsleepable contexts and cannot hold mm->asi_init_lock, so make sure asi_ini() could only be called from a process where current->mm == asi->mm. This guarantees that we cannot race with asi_destroy_mm_state(), which is only executed after all users of the mm_struct are gone. Ideally, asi_destroy_mm_state() is cheap enough that it doesn't impact the process exit path. Keeping the initialized ASI domains around has two effects: (a) In a following change these domains will be dynamically allocated, by delaying their destruction until the mm_struct is destroyed we miss a chance to free their memory earlier. However, the size of struct asi is trivial, and the window between an ASI domain going out of use and the destruction of mm_struct is expected to be small. (b) Keeping mm->asi[*] initialized when the ASI domain is no longer used means that we will unnecessarily flush the TLB in that ASI domain in asi_tlb_flush_one_user() -> asi_invpcid_nonsensitive_one(). However, this is probably fine because it is a single address flush, and the window between an ASI domain going out of use and the destruction of mm_struct is expected to be small. The goal of this is beyond code simplification. Incoming changes will support context switching and exiting to userspace without exiting ASI in some cases, which means that arbitrary kernel code can be run in an ASI domain. This requires a protection mechanism to make sure that ASI domains are not destroyed while they are being used or referenced. Tying the lifetime to ASI domains sidesteps this problem. As long as a process is running in an ASI domain, it naturally holds a ref to the containing mm_struct (through task->mm or task->active_mm). This means that the ASI domain cannot be destroyed. To enforce this, add a warning in asi_enter() if asi->mm is not the same as current->mm. This applies to ASI domains retrieved with asi_get_current() as long as preemption is disabled. If preemption is enabled, task->active_mm may change (e.g. for kthreads), so the ASI domain may be destroyed if the containing mm_struct is destroyed. Add a comment to document this and a warning to enforce it. Stop using asi_get_current() in asi_in_nonsensitive() to avoid the warning if preemption is enabled. asi_in_nonsensitive() is inherently racy and does not access the ASI domain. It also applies naturally to ASI domains retrieved through mm->asi[*], assuming the retriever is naturally holding a ref to the mm_struct before dereferncing it. Suggested-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
SQUASHME: mm: asi: keep ASI domains initialized until the mm_struct i… …s destroyed BJ: In this branch, this is a bugfix. Otherwise we're taking a mutex from asi_destroy() when we are atomic. Currently, ASI domains are initialized individually when needed, and destroyed when no longer being used. Multiple initializations are allowed, and an init_count is used to keep track of when the domain should be destroyed. This design allows for an ASI domain of a specific class to be initialized, destroyed, and re-initialized again by the same process. However, the common case is that once an ASI domain is initialized, it will remain in use until the end of the process lifetime (or shortly before then). Remove this complexity by keeping initialized ASI domains alive until the containing mm_struct is being destroyed (after mm->mm_count drops to zero). __asi_destroy() is mostly emptied, we no longer need to increment the TLB gen to make sure the TLB is flushed if the ASI domain is re-initialized. asi_destroy() is replaced with asi_destroy_mm_state(), which destroys all ASI domains in an mm_struct. asi_destroy_mm_state() can be called from unsleepable contexts and cannot hold mm->asi_init_lock, so make sure asi_ini() could only be called from a process where current->mm == asi->mm. This guarantees that we cannot race with asi_destroy_mm_state(), which is only executed after all users of the mm_struct are gone. Ideally, asi_destroy_mm_state() is cheap enough that it doesn't impact the process exit path. Keeping the initialized ASI domains around has two effects: (a) In a following change these domains will be dynamically allocated, by delaying their destruction until the mm_struct is destroyed we miss a chance to free their memory earlier. However, the size of struct asi is trivial, and the window between an ASI domain going out of use and the destruction of mm_struct is expected to be small. (b) Keeping mm->asi[*] initialized when the ASI domain is no longer used means that we will unnecessarily flush the TLB in that ASI domain in asi_tlb_flush_one_user() -> asi_invpcid_restricted_one(). However, this is probably fine because it is a single address flush, and the window between an ASI domain going out of use and the destruction of mm_struct is expected to be small. The goal of this is beyond code simplification. Incoming changes will support context switching and exiting to userspace without exiting ASI in some cases, which means that arbitrary kernel code can be run in an ASI domain. This requires a protection mechanism to make sure that ASI domains are not destroyed while they are being used or referenced. Tying the lifetime to ASI domains sidesteps this problem. As long as a process is running in an ASI domain, it naturally holds a ref to the containing mm_struct (through task->mm or task->active_mm). This means that the ASI domain cannot be destroyed. To enforce this, add a warning in asi_enter() if asi->mm is not the same as current->mm. This applies to ASI domains retrieved with asi_get_current() as long as preemption is disabled. If preemption is enabled, task->active_mm may change (e.g. for kthreads), so the ASI domain may be destroyed if the containing mm_struct is destroyed. Add a comment to document this and a warning to enforce it. Stop using asi_get_current() in asi_is_restricted() to avoid the warning if preemption is enabled. asi_is_restricted() is inherently racy and does not access the ASI domain. It also applies naturally to ASI domains retrieved through mm->asi[*], assuming the retriever is naturally holding a ref to the mm_struct before dereferncing it. Suggested-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com>