Skip to content

Commit e1cac37

Browse files
committed
content: add virtio file system device
The work-in-progress virtio file system device transports Linux FUSE requests between a FUSE daemon running on the host and the FUSE driver inside the guest. This is an early version of the spec that maps FUSE requests to virtqueues. No changes are needed to the FUSE request format. Multiqueue is supported for normal requests. FUSE_INTERRUPT and FUSE_FORGET requests are only sent on the dedicated hiprio queue. Notifications are sent on the notifications queue. The FUSE driver currently works in a "pull" model where userspace reads requests from /dev/fuse one at a time. Virtqueues are a "push" model where the FUSE driver will need to enqueue requests onto a specific virtqueue and wait for the guest to process them. The request queue buffers are completed by the device when the request has been processed and struct fuse_out_header has been filled out. The FUSE driver then picks up the completed request and processes it as if the FUSE daemon had written to /dev/fuse. Notifications involve device-to-driver communication. Since virtqueues live in guest RAM, the device cannot initiate communication. Instead the notifications queue is populated with empty buffers by the FUSE driver (similar to a NIC rx queue). The device then "completes" a buffer when it wishes to notify the driver. Replies to the notification are place in a normal request queue, they do not go via the notifications queue. Note that this design assumes that the driver knows the required buffer size for each request. My understanding is that this is true in FUSE. The only exception is FUSE_NOTIFY_STORE, and even there the FUSE implementation has a limit of 32 pages, which makes for a natural buffer size limit for the notifications queue. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
1 parent 9e57474 commit e1cac37

File tree

3 files changed

+214
-0
lines changed

3 files changed

+214
-0
lines changed

content.tex

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2528,6 +2528,8 @@ \chapter{Device Types}\label{sec:Device Types}
25282528
\hline
25292529
24 & Memory device \\
25302530
\hline
2531+
26 & file system device \\
2532+
\hline
25312533
\end{tabular}
25322534

25332535
Some of the devices above are unspecified by this document,
@@ -5432,6 +5434,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
54325434
\input{virtio-gpu.tex}
54335435
\input{virtio-input.tex}
54345436
\input{virtio-crypto.tex}
5437+
\input{virtio-fs.tex}
54355438

54365439
\chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
54375440

introduction.tex

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,9 @@ \section{Normative References}
6060
\phantomsection\label{intro:SCSI MMC}\textbf{[SCSI MMC]} &
6161
SCSI Multimedia Commands,
6262
\newline\url{http://www.t10.org/cgi-bin/ac.pl?t=f&f=mmc6r00.pdf}\\
63+
\phantomsection\label{intro:FUSE}\textbf{[FUSE]} &
64+
Linux FUSE interface,
65+
\newline\url{https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h}\\
6366

6467
\end{longtable}
6568

virtio-fs.tex

Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
\section{File System Device}\label{sec:Device Types / File System Device}
2+
3+
The virtio file system device provides file system access. The device may
4+
directly manage a file system or act as a gateway to a remote file system. The
5+
details of how files are accessed are hidden by the device interface, allowing
6+
for a range of use cases.
7+
8+
Unlike block-level storage devices such as virtio block and SCSI, the virtio
9+
file system device provides file-level access to data. The device interface
10+
therefore contains the following file system concepts:
11+
\begin{itemize}
12+
\item Regular files are named objects that contain data. They can be resized
13+
and auxiliary data can be stored in so-called extended attributes.
14+
\item Directories are containers for files and sub-directories.
15+
\item Symbolic links store a path which is traversed to resolve the link.
16+
\item Device nodes are special files whose behavior is determined by device
17+
drivers.
18+
\end{itemize}
19+
20+
The device interface is based on the Linux Filesystem in Userspace (FUSE)
21+
interface. This consists of file system requests that traverse the file system
22+
and access the files and directories within it. The request structure is
23+
defined by \hyperref[intro:FUSE]{FUSE}. The virtio file system device acts as
24+
a transport for FUSE requests and is analogous to the /dev/fuse device.
25+
26+
TODO table explaining how FUSE concepts are mapped. "The virtio device has the role of the FUSE daemon."
27+
28+
The request types are as follows:
29+
\begin{itemize}
30+
\item Normal requests are submitted by the driver and completed by the device.
31+
\item Interrupt requests are submitted by the driver to abort requests that the
32+
device may have yet to complete.
33+
\item Notifications are submitted by the device and completed by the driver.
34+
\end{itemize}
35+
36+
This section relies on definitions from \hyperref[intro:FUSE]{FUSE}.
37+
38+
\subsection{Device ID}\label{sec:Device Types / File System Device / Device ID}
39+
26
40+
41+
\subsection{Virtqueues}\label{sec:Device Types / File System Device / Virtqueues}
42+
43+
\begin{description}
44+
\item[0] notifications
45+
\item[1] hiprio
46+
\item[2\ldots n] request queues
47+
\end{description}
48+
49+
\subsection{Feature bits}\label{sec:Device Types / File System Device / Feature bits}
50+
51+
There are currently no feature bits defined.
52+
53+
\subsection{Device configuration layout}\label{sec:Device Types / File System Device / Device configuration layout}
54+
55+
All fields of this configuration are always available.
56+
57+
\begin{lstlisting}
58+
struct virtio_fs_config {
59+
char tag[36];
60+
le32 num_queues;
61+
};
62+
\end{lstlisting}
63+
64+
\begin{description}
65+
\item[\field{tag}] is the name associated with this file system. The tag is
66+
encoded in UTF-8 and padded with NUL bytes if shorter than the
67+
available space. This field is not NUL-terminated if the encoded bytes
68+
take up the entire field.
69+
\item[\field{num_queues}] is the total number of request virtqueues exposed by
70+
the device. The driver MAY use only one request queue,
71+
or it can use more to achieve better performance.
72+
\end{description}
73+
74+
\drivernormative{\subsubsection}{Device configuration layout}{Device Types / File System Device / Device configuration layout}
75+
76+
The driver MUST NOT write to device configuration fields.
77+
78+
\devicenormative{\subsubsection}{Device configuration layout}{Device Types / File System Device / Device configuration layout}
79+
80+
\devicenormative{\subsection}{Device Initialization}{Device Types / File System Device / Device Initialization}
81+
82+
On initialization the driver MUST first discover the
83+
device's virtqueues.
84+
85+
If the driver uses the notifications queue, the driver SHOULD place at least
86+
one buffer in the notifications queue.
87+
88+
TODO how is the notifications buffer size determined?
89+
90+
\subsection{Device Operation}\label{sec:Device Types / File System Device / Device Operation}
91+
92+
Device operation consists of operating the virtqueues to facilitate file system
93+
access.
94+
95+
\subsubsection{Device Operation: Request Queues}\label{sec:Device Types / File System Device / Device Operation / Device Operation: Request Queues}
96+
97+
The driver enqueues requests on an arbitrary request queue, and
98+
they are used by the device on that same queue. It is the
99+
responsibility of the driver to ensure strict request ordering
100+
for commands placed on different queues, because they will be
101+
consumed with no order constraints.
102+
103+
Requests have the following format:
104+
105+
\begin{lstlisting}
106+
struct virtio_fs_req {
107+
// Device-readable part
108+
struct fuse_in_header in;
109+
u8 datain[];
110+
111+
// Device-writable part
112+
struct fuse_out_header out;
113+
u8 dataout[];
114+
};
115+
\end{lstlisting}
116+
117+
Note that the words "in" and "out" follow the FUSE meaning and do not indicate
118+
the direction of data transfer under VIRTIO. "In" means input to a request and
119+
"out" means output from processing a request.
120+
121+
\field{in} is the common header for all types of FUSE requests.
122+
123+
\field{datain} consists of request-specific data, if any. This is identical to
124+
the data read from the /dev/fuse device by a FUSE daemon.
125+
126+
\field{out} is the completion header common to all types of FUSE requests.
127+
128+
\field{dataout} consists of request-specific data, if any. This is identical
129+
to the data written to the /dev/fuse device by a FUSE daemon.
130+
131+
For example, the full layout of a FUSE_READ request is as follows:
132+
133+
\begin{lstlisting}
134+
struct virtio_fs_read_req {
135+
// Device-readable part
136+
struct fuse_in_header in;
137+
union {
138+
struct fuse_read_in readin;
139+
u8 datain[sizeof(struct fuse_read_in)];
140+
};
141+
142+
// Device-writable part
143+
struct fuse_out_header out;
144+
u8 dataout[out.len - sizeof(struct fuse_out_header)];
145+
};
146+
\end{lstlisting}
147+
148+
\devicenormative{\paragraph}{Device Operation: Request Queues}{Device Types / File System Device / Device Operation / Device Operation: Request Queues}
149+
150+
\drivernormative{\paragraph}{Device Operation: Request Queues}{Device Types / File System Device / Device Operation / Device Operation: Request Queues}
151+
152+
\subsubsection{Device Operation: High Priority Queue}\label{sec:Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
153+
154+
The hiprio queue follows the same request format as the requests queue. This
155+
queue only contains FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET
156+
requests.
157+
158+
Interrupt and forget requests have a higher priority than normal requests. In
159+
order to ensure that they can always be delivered, even if all request queues
160+
are full, a separate queue is used.
161+
162+
\devicenormative{\paragraph}{Device Operation: High Priority Queue}{Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
163+
164+
The device SHOULD attempt to process the hiprio queue promptly.
165+
166+
The device MAY process request queues concurrently with the hiprio queue.
167+
168+
\drivernormative{\paragraph}{Device Operation: High Priority Queue}{Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
169+
170+
The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET requests solely on the hiprio queue.
171+
172+
The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
173+
174+
\subsubsection{Device Operation: Notifications Queue}\label{sec:Device Types / File System Device / Device Operation / Device Operation: Notifications Queue}
175+
176+
The notifications queue is used for notification requests from the device to
177+
the driver. The request queues cannot be used since they only work in the
178+
direction of the driver to the device.
179+
180+
Notifications are different from normal requests because they only contain
181+
device writable fields. The driver sends notification replies on one of the
182+
request queues. The format of notification requests is as follows:
183+
184+
\begin{lstlisting}
185+
struct virtio_fs_notification_req {
186+
// Device-writable part
187+
struct fuse_out_header out;
188+
u8 dataout[];
189+
};
190+
\end{lstlisting}
191+
192+
\field{out} is the completion header common to all types of FUSE requests. The
193+
\field{out.unique} field is 0 and the \field{out.error} field contains a
194+
FUSE_NOTIFY_* code.
195+
196+
\field{dataout} consists of request-specific data, if any. This is identical
197+
to the data written to the /dev/fuse device by a FUSE daemon.
198+
199+
\devicenormative{\paragraph}{Device Operation: Notifications Queue}{Device Types / File System Device / Device Operation / Device Operation: Notifications Queue}
200+
201+
The device MUST set \field{out.unique} to 0 and set \field{out.error} to a FUSE_NOTIFY_* code.
202+
203+
\drivernormative{\paragraph}{Device Operation: Notifications Queue}{Device Types / File System Device / Device Operation / Device Operation: Notifications Queue}
204+
205+
The driver MUST verify that \field{out.unique} is 0.
206+
207+
TODO how to size buffers?
208+

0 commit comments

Comments
 (0)