-
Notifications
You must be signed in to change notification settings - Fork 163
nvme_driver: update driver to use Arc / Weak semantics to enforce unique ownership for namespaces by nsid
#2657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvme_driver: update driver to use Arc / Weak semantics to enforce unique ownership for namespaces by nsid
#2657
Conversation
|
In general, |
Like you, I also have some concerns around the brittleness of this approach. Although in the classical sense of memory safety and availability I think rust should have us covered! |
| return Ok(handle.namespace.clone()); | ||
| } | ||
| pub async fn namespace(&mut self, nsid: u32) -> Result<NamespaceHandle, NamespaceError> { | ||
| if let Some(namespace) = self.namespaces.get_mut(&nsid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a more idiomatic way to do this ..... it looks a little off to me.
Arc / Weak semantics to enforce single-owner semantics for namespaces
Arc / Weak semantics to enforce single-owner semantics for namespacesArc / Weak semantics to enforce unique ownership for namespaces
Arc / Weak semantics to enforce unique ownership for namespacesArc / Weak semantics to enforce unique ownership for namespaces by nsid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the NVMe driver to use Arc/Weak semantics to address two critical issues: (1) namespace persistence after removal from the controller, and (2) namespace reuse after disk removal from VTL2 settings. The solution introduces a non-cloneable NamespaceHandle wrapper around Arc<Namespace> to enforce single-ownership, while the driver stores only Weak references internally.
Changes:
- Introduced
NamespaceHandlewrapper to prevent cloning and enforce single ownership of namespaces - Implemented
WeakOrStrongenum to manage namespace lifecycle during normal operation and save/restore - Updated all callsites to use
NamespaceHandleinstead ofArc<Namespace>
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/devices/storage/disk_nvme/nvme_driver/src/namespace.rs | Added NamespaceHandle wrapper with delegated methods; made check_active() public; renamed error variant |
| vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs | Implemented WeakOrStrong enum; updated namespace() method with duplicate detection; modified save() logic |
| vm/devices/storage/disk_nvme/nvme_driver/src/lib.rs | Updated exports to use NamespaceHandle instead of Namespace |
| vm/devices/storage/disk_nvme/src/lib.rs | Updated NvmeDisk to use NamespaceHandle |
| openhcl/underhill_core/src/nvme_manager/*.rs | Updated all interfaces to use NamespaceHandle |
| let is_weak = namespace.is_weak(); // This value will change after invokeing get_arc(). | ||
| if let Some(ns) = namespace.get_arc() | ||
| && ns.check_active().is_ok() | ||
| && is_weak |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition is_weak in the save logic appears incorrect. When saving right after a restore (before any namespace() calls), all namespaces will be Strong, causing is_weak to be false for all of them. This means none of the restored namespaces will be saved, which breaks save-restore cycles.
The logic should save any namespace that can be successfully obtained via get_arc() and passes the check_active() test, regardless of whether it started as Weak or Strong. Consider removing the && is_weak condition or inverting it to && !is_weak if the intent is to only save Strong references (pre-handout namespaces).
| let is_weak = namespace.is_weak(); // This value will change after invokeing get_arc(). | |
| if let Some(ns) = namespace.get_arc() | |
| && ns.check_active().is_ok() | |
| && is_weak | |
| if let Some(ns) = namespace.get_arc() | |
| && ns.check_active().is_ok() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point so I am keeping this comment open. The idea behind NOT saving strong ref namespaces is that they were never requested for by a driver and hence there is no reasonable expectation that they should be available post restore.
Current code cuts down the size of saved data. As copilot pointed out, repeated back-to-back servicing where the namespace has not yet been requested could cause missed saves. However, I don't think we have a reasonable expectation for hammering a system with servicing operations .
…ferences to the Namespace object uncloneable
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
e903762 to
0af769f
Compare
Arc / Weak semantics to enforce unique ownership for namespaces by nsidArc / Weak semantics to enforce unique ownership for namespaces by nsid
alandau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! All tests passed! Perhaps do a dummy change (or trigger another test run without a change if there's a way to do it) to make sure it's not flaky and won't hunt us down the road.
I triggered a new test run by merging in main. Once you're comfy with this, @alandau, I'll approve and this can go in. Thanks for taking on the review of this change :) |
|
@mattkur, it was consistently failing on every run, and now it passed on 2 runs in a row, and the branch is out of date again, so another run will be needed before merging. I think this is good to go, unless @gurasinghMS has reservations. |
|
|
… unique ownership for namespaces by nsid (microsoft#2657) This PR fixes a couple of problems with the driver. ## 1. `Namespace` persisting and being reused after namespace removal from the controller . It addresses the situation where a namespace is removed from the controller (and therefore from the disk exposed via VTL0), but the driver continues to hold a reference to that namespace. In this state, the driver would return a stale or invalid Namespace object—one that can no longer perform valid I/O. The approach in this PR is to avoid keeping a strong Arc<Namespace> in the driver. Instead, the driver stores only a Weak<Namespace> for any namespace it hands out. When the disk is dropped, the strong reference disappears, causing the driver’s weak reference to naturally invalidate via normal Arc/Weak semantics. This eliminates the dangling reference issue without requiring additional bookkeeping. As discussed during office hours, the window for failure with this approach is expected to be extremely narrow, as the driver relies on strict single-ownership of the Namespace object. To further enforce this single-ownership model, this PR introduces a non-cloneable wrapper around Arc<Namespace>. This wrapper exposes the same interface as Namespace and is consumed by the disk, but it prevents cloning of the underlying Arc<Namespace> anywhere along the creation path. This ensures the driver remains the sole authority over Namespace ownership and lifecycle. ## 2. `Namespace` being reused after disk removal from VTL2 settings. <img width="7096" height="2979" alt="image" src="https://github.com/user-attachments/assets/f519db98-447c-4a95-a7a2-c70536e07ae0" /> This is the issue being described in microsoft#2656 . This PR brings back the check to verify single-ownership of the `Namespace` objects. This time around it is possible to verify using the above described `Arc`/`Weak` semantics. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Matt LaFayette (Kurjanowicz) <mattkur@microsoft.com> Co-authored-by: Alex Landau <alexlandau@microsoft.com> (cherry picked from commit 34cc2bb)
… unique ownership for namespaces by nsid (#2657) (#2697) Clean cherry pick of PR #2657 This PR fixes a couple of problems with the driver. ## 1. `Namespace` persisting and being reused after namespace removal from the controller . It addresses the situation where a namespace is removed from the controller (and therefore from the disk exposed via VTL0), but the driver continues to hold a reference to that namespace. In this state, the driver would return a stale or invalid Namespace object—one that can no longer perform valid I/O. The approach in this PR is to avoid keeping a strong Arc<Namespace> in the driver. Instead, the driver stores only a Weak<Namespace> for any namespace it hands out. When the disk is dropped, the strong reference disappears, causing the driver’s weak reference to naturally invalidate via normal Arc/Weak semantics. This eliminates the dangling reference issue without requiring additional bookkeeping. As discussed during office hours, the window for failure with this approach is expected to be extremely narrow, as the driver relies on strict single-ownership of the Namespace object. To further enforce this single-ownership model, this PR introduces a non-cloneable wrapper around Arc<Namespace>. This wrapper exposes the same interface as Namespace and is consumed by the disk, but it prevents cloning of the underlying Arc<Namespace> anywhere along the creation path. This ensures the driver remains the sole authority over Namespace ownership and lifecycle. ## 2. `Namespace` being reused after disk removal from VTL2 settings. <img width="7096" height="2979" alt="image" src="https://github.com/user-attachments/assets/f519db98-447c-4a95-a7a2-c70536e07ae0" /> This is the issue being described in #2656 . This PR brings back the check to verify single-ownership of the `Namespace` objects. This time around it is possible to verify using the above described `Arc`/`Weak` semantics. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Matt LaFayette (Kurjanowicz) <mattkur@microsoft.com> Co-authored-by: Alex Landau <alexlandau@microsoft.com>
|
Backported to release/1.7.2511 in #2697 |
This PR fixes a couple of problems with the driver.
1.
Namespacepersisting and being reused after namespace removal from the controllerIt addresses the situation where a namespace is removed from the controller (and therefore from the disk exposed via VTL0), but the driver continues to hold a reference to that namespace. In this state, the driver would return a stale or invalid Namespace object—one that can no longer perform valid I/O.
The approach in this PR is to avoid keeping a strong Arc in the driver. Instead, the driver stores only a Weak for any namespace it hands out. When the disk is dropped, the strong reference disappears, causing the driver’s weak reference to naturally invalidate via normal Arc/Weak semantics. This eliminates the dangling reference issue without requiring additional bookkeeping.
As discussed during office hours, the window for failure with this approach is expected to be extremely narrow, as the driver relies on strict single-ownership of the Namespace object. To further enforce this single-ownership model, this PR introduces a non-cloneable wrapper around Arc. This wrapper exposes the same interface as Namespace and is consumed by the disk, but it prevents cloning of the underlying Arc anywhere along the creation path. This ensures the driver remains the sole authority over Namespace ownership and lifecycle.
2.
Namespacebeing reused after disk removal from VTL2 settings.This is the issue being described in #2656 . This PR brings back the check to verify single-ownership of the
Namespaceobjects. This time around it is possible to verify using the above describedArc/Weaksemantics.