Skip to content

Add support for IOMMUFD IOCTLs#833

Open
benlwalker wants to merge 2 commits intonutanix:masterfrom
benlwalker:iommufd
Open

Add support for IOMMUFD IOCTLs#833
benlwalker wants to merge 2 commits intonutanix:masterfrom
benlwalker:iommufd

Conversation

@benlwalker
Copy link
Contributor

Linux added a new uapi called IOMMUFD for managing the IOMMU. In recent Linux releases, many of the vfio ioctls are now just legacy wrappers around this new interface. However, the new interface has a couple of important features that I'd like to use with vfio-user. Namely:

  1. Support for allowing the server to allocate the IOVA. This is important because the server can return the true IOVA to the client directly, which can be used as an actual DMA address by hardware. If the client puts this address into a command, they don't have to rely on the server process to intercept the command and translate it.
  2. Support for registering memory described by a dma-buf (e.g. memory on another PCI device)

I've tried to make this backward compatible by adding feature negotiation. Only if both client and server report support will it be used.

Add lspci.expected.out.5 to accommodate output format differences in
newer versions of the lspci utility (e.g., on Fedora 42). The test
already supported multiple expected outputs for different lspci
versions; this adds another variant.

Signed-off-by: Ben Walker <ben@nvidia.com>
} __attribute__((packed));

/* VFIO_USER_IOMMUFD_BIND */
struct vfio_user_iommufd_bind {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the real iommufd flow you'd be selecting the device to bind to here. But in vfio-user the device is implied by the domain socket we connected to. So all this is really doing is giving us this devid value to use later (and we don't really need it later, because there's only one device). I debated whether to keep this to make the flow stay the same as when using the real iommufd, or if I should just drop it and simplify.

uint32_t __reserved;
uint64_t user_va; /* Userspace virtual address (for tracking) */
uint64_t length;
uint64_t iova; /* Output: server-allocated IOVA */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the real iommufd there's an extra flag defined that tells the system whether the client specifies the iova here or whether the server fills it in. I've only implemented the mode where the server fills it in. There's a few considerations here:

  • If the client chooses the iova, then it's almost certainly not going to be the real DMA address that needs to be sent to the hardware. The system will assume that the server process intercepts and translates all commands.
  • If the server chooses the iova then we can return the real DMA address and the server doesn't need to intercept anything (so in theory, clients could get more direct access to the NVMe queues some day, if we gain additional security features in other areas). But the client needs to translate from its own vaddr to the server returned iova on every I/O.

The biggest challenge comes in knowing when the server should do a translation. By only allowing server-choose mode with iommufd, then I know if we're in iommufd mode the server doesn't have to translate. If I allow both modes like in the real iommufd ioctl, I don't have a great way to figure out if the server should translate an address or not.

Add support for iommufd-based memory management to the vfio-user
protocol as an alternative to the legacy VFIO DMA_MAP/UNMAP model.

Key changes:
- Add capability negotiation for iommufd support in VFIO_USER_VERSION
- Implement new protocol commands for iommufd operations:
  * VFIO_USER_IOMMUFD_ALLOC_IOAS - Allocate I/O Address Space
  * VFIO_USER_IOMMUFD_DESTROY_IOAS - Destroy I/O Address Space
  * VFIO_USER_IOMMUFD_BIND - Bind device to iommufd context
  * VFIO_USER_IOMMUFD_ATTACH_PT - Attach device to IOAS
  * VFIO_USER_IOMMUFD_DETACH_PT - Detach device from IOAS
  * VFIO_USER_IOMMUFD_IOAS_MAP - Map memory region (server-allocated IOVA)
  * VFIO_USER_IOMMUFD_IOAS_UNMAP - Unmap memory region
- Add server-side iommufd controller implementation (lib/iommufd.c)
- Update client sample to support iommufd capability negotiation
- Fix build dependencies in samples and tests

The iommufd model uses server-controlled IOVA allocation, where the
server performs mmap() and returns the mapped virtual address as the
IOVA. This simplifies address management and avoids the need for
separate IOVA allocators.

When both client and server advertise iommufd support during version
negotiation, iommufd mode is enabled for the connection. Otherwise,
the connection falls back to legacy VFIO DMA_MAP/UNMAP.

Signed-off-by: Ben Walker <ben@nvidia.com>
Copy link
Collaborator

@tmakatos tmakatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch, just to be clear there's no interaction with /dev/iommufd but this add iommufd-like functionality, right? I need to familiarise myself first with iommufd before I can meaningfully review this.

@benlwalker
Copy link
Contributor Author

Thanks for the patch, just to be clear there's no interaction with /dev/iommufd but this add iommufd-like functionality, right? I need to familiarise myself first with iommufd before I can meaningfully review this.

Right - just like this library doesn't interact with vfio but uses the same ioctls. Portions of the functionality covered previously by vfio in Linux have moved to this new iommufd API (and those parts of vfio are now just a thin wrapper around iommufd) and I wanted the new fancy features in iommufd to be available in vfio-user.

@tmakatos
Copy link
Collaborator

tmakatos commented Feb 6, 2026

Thanks for the patch, just to be clear there's no interaction with /dev/iommufd but this add iommufd-like functionality, right? I need to familiarise myself first with iommufd before I can meaningfully review this.

Right - just like this library doesn't interact with vfio but uses the same ioctls. Portions of the functionality covered previously by vfio in Linux have moved to this new iommufd API (and those parts of vfio are now just a thin wrapper around iommufd) and I wanted the new fancy features in iommufd to be available in vfio-user.

Makes sense, I think using IOMMUFD in the protocol specification is consistent with what we've been doing so far. Not sure about whether it should be called iommufd internally though, I'll defer this to @jlevon (who has more bikesheddy inclinations).
I suppose you're going to post the protocol changes to qemu-devel?
Also, do you have any relevant QEMU patches to share?

Copy link
Collaborator

@tmakatos tmakatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benlwalker this is a very superficial, partial review, I haven't had a chance to read about iommufd just yet.

/*
* vfio-user protocol commands
*
* Commands 1-18 are part of the base protocol. Commands 19+ implement
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration read/write isn't supported in practise if the device isn't migrateable, so not sure we need to state it here.

*/
typedef struct iommufd_ioas {
uint32_t ioas_id;
TAILQ_HEAD(, iommufd_mapping) mappings;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an IOAS has multiple IOVA mappings? Just checking whether a list is suitable, how many such mappings do we generally expect?

typedef struct iommufd_controller {
struct vfu_ctx *vfu_ctx;
bool enabled; /* iommufd mode negotiated */
bool device_bound; /* Device bound to iommufd */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's no /dev/iommufd, what does device_bound mean in practise?

} iommufd_controller_t;

/*
* Create a new iommufd controller.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose returns NULL on error and sets errno?
(Here and wherever a pointer is returned?)

/*
* Destroy an IOAS and all its mappings.
*/
int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on error and sets errno? (elsewher as well?)

return ERROR_INT(ERANGE) - 1; /* Need at least 1 SG entry */
}

/* Use INT_MAX as special region index to mark iommufd mapping */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it cleaner to use a dedicated field for that?

for (i = 0; i < cnt; i++, sg++) {
if (sg->region == INT_MAX) {
/* This is an iommufd mapping - IOVA equals the server's virtual address */
if (vfu_ctx->iommufd == NULL || !iommufd_is_enabled(vfu_ctx->iommufd)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be an assert right?

iov[i].iov_len = sg->length;
} else {
/* Legacy DMA mapping - use existing dma_sgl_get logic */
if (sg->region >= vfu_ctx->dma->nregions) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return ERROR_INT(EINVAL);
}

dma_memory_region_t *region = &vfu_ctx->dma->regions[sg->region];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check why dma_sgl_get went away.

return ret;
}

/* Return a dummy device ID (not used in vfio-user) */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK vfio-user doesn't use device IDs at all, check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants