Conversation
Add lspci.expected.out.5 to accommodate output format differences in newer versions of the lspci utility (e.g., on Fedora 42). The test already supported multiple expected outputs for different lspci versions; this adds another variant. Signed-off-by: Ben Walker <ben@nvidia.com>
| } __attribute__((packed)); | ||
|
|
||
| /* VFIO_USER_IOMMUFD_BIND */ | ||
| struct vfio_user_iommufd_bind { |
There was a problem hiding this comment.
I think in the real iommufd flow you'd be selecting the device to bind to here. But in vfio-user the device is implied by the domain socket we connected to. So all this is really doing is giving us this devid value to use later (and we don't really need it later, because there's only one device). I debated whether to keep this to make the flow stay the same as when using the real iommufd, or if I should just drop it and simplify.
| uint32_t __reserved; | ||
| uint64_t user_va; /* Userspace virtual address (for tracking) */ | ||
| uint64_t length; | ||
| uint64_t iova; /* Output: server-allocated IOVA */ |
There was a problem hiding this comment.
In the real iommufd there's an extra flag defined that tells the system whether the client specifies the iova here or whether the server fills it in. I've only implemented the mode where the server fills it in. There's a few considerations here:
- If the client chooses the iova, then it's almost certainly not going to be the real DMA address that needs to be sent to the hardware. The system will assume that the server process intercepts and translates all commands.
- If the server chooses the iova then we can return the real DMA address and the server doesn't need to intercept anything (so in theory, clients could get more direct access to the NVMe queues some day, if we gain additional security features in other areas). But the client needs to translate from its own vaddr to the server returned iova on every I/O.
The biggest challenge comes in knowing when the server should do a translation. By only allowing server-choose mode with iommufd, then I know if we're in iommufd mode the server doesn't have to translate. If I allow both modes like in the real iommufd ioctl, I don't have a great way to figure out if the server should translate an address or not.
Add support for iommufd-based memory management to the vfio-user protocol as an alternative to the legacy VFIO DMA_MAP/UNMAP model. Key changes: - Add capability negotiation for iommufd support in VFIO_USER_VERSION - Implement new protocol commands for iommufd operations: * VFIO_USER_IOMMUFD_ALLOC_IOAS - Allocate I/O Address Space * VFIO_USER_IOMMUFD_DESTROY_IOAS - Destroy I/O Address Space * VFIO_USER_IOMMUFD_BIND - Bind device to iommufd context * VFIO_USER_IOMMUFD_ATTACH_PT - Attach device to IOAS * VFIO_USER_IOMMUFD_DETACH_PT - Detach device from IOAS * VFIO_USER_IOMMUFD_IOAS_MAP - Map memory region (server-allocated IOVA) * VFIO_USER_IOMMUFD_IOAS_UNMAP - Unmap memory region - Add server-side iommufd controller implementation (lib/iommufd.c) - Update client sample to support iommufd capability negotiation - Fix build dependencies in samples and tests The iommufd model uses server-controlled IOVA allocation, where the server performs mmap() and returns the mapped virtual address as the IOVA. This simplifies address management and avoids the need for separate IOVA allocators. When both client and server advertise iommufd support during version negotiation, iommufd mode is enabled for the connection. Otherwise, the connection falls back to legacy VFIO DMA_MAP/UNMAP. Signed-off-by: Ben Walker <ben@nvidia.com>
tmakatos
left a comment
There was a problem hiding this comment.
Thanks for the patch, just to be clear there's no interaction with /dev/iommufd but this add iommufd-like functionality, right? I need to familiarise myself first with iommufd before I can meaningfully review this.
Right - just like this library doesn't interact with vfio but uses the same ioctls. Portions of the functionality covered previously by vfio in Linux have moved to this new iommufd API (and those parts of vfio are now just a thin wrapper around iommufd) and I wanted the new fancy features in iommufd to be available in vfio-user. |
Makes sense, I think using |
tmakatos
left a comment
There was a problem hiding this comment.
@benlwalker this is a very superficial, partial review, I haven't had a chance to read about iommufd just yet.
| /* | ||
| * vfio-user protocol commands | ||
| * | ||
| * Commands 1-18 are part of the base protocol. Commands 19+ implement |
There was a problem hiding this comment.
Migration read/write isn't supported in practise if the device isn't migrateable, so not sure we need to state it here.
| */ | ||
| typedef struct iommufd_ioas { | ||
| uint32_t ioas_id; | ||
| TAILQ_HEAD(, iommufd_mapping) mappings; |
There was a problem hiding this comment.
So an IOAS has multiple IOVA mappings? Just checking whether a list is suitable, how many such mappings do we generally expect?
| typedef struct iommufd_controller { | ||
| struct vfu_ctx *vfu_ctx; | ||
| bool enabled; /* iommufd mode negotiated */ | ||
| bool device_bound; /* Device bound to iommufd */ |
There was a problem hiding this comment.
Since there's no /dev/iommufd, what does device_bound mean in practise?
| } iommufd_controller_t; | ||
|
|
||
| /* | ||
| * Create a new iommufd controller. |
There was a problem hiding this comment.
I suppose returns NULL on error and sets errno?
(Here and wherever a pointer is returned?)
| /* | ||
| * Destroy an IOAS and all its mappings. | ||
| */ | ||
| int |
There was a problem hiding this comment.
-1 on error and sets errno? (elsewher as well?)
| return ERROR_INT(ERANGE) - 1; /* Need at least 1 SG entry */ | ||
| } | ||
|
|
||
| /* Use INT_MAX as special region index to mark iommufd mapping */ |
There was a problem hiding this comment.
Isn't it cleaner to use a dedicated field for that?
| for (i = 0; i < cnt; i++, sg++) { | ||
| if (sg->region == INT_MAX) { | ||
| /* This is an iommufd mapping - IOVA equals the server's virtual address */ | ||
| if (vfu_ctx->iommufd == NULL || !iommufd_is_enabled(vfu_ctx->iommufd)) { |
There was a problem hiding this comment.
That should be an assert right?
| iov[i].iov_len = sg->length; | ||
| } else { | ||
| /* Legacy DMA mapping - use existing dma_sgl_get logic */ | ||
| if (sg->region >= vfu_ctx->dma->nregions) { |
| return ERROR_INT(EINVAL); | ||
| } | ||
|
|
||
| dma_memory_region_t *region = &vfu_ctx->dma->regions[sg->region]; |
There was a problem hiding this comment.
Need to check why dma_sgl_get went away.
| return ret; | ||
| } | ||
|
|
||
| /* Return a dummy device ID (not used in vfio-user) */ |
There was a problem hiding this comment.
AFAIK vfio-user doesn't use device IDs at all, check.
Linux added a new uapi called IOMMUFD for managing the IOMMU. In recent Linux releases, many of the vfio ioctls are now just legacy wrappers around this new interface. However, the new interface has a couple of important features that I'd like to use with vfio-user. Namely:
I've tried to make this backward compatible by adding feature negotiation. Only if both client and server report support will it be used.