Need help to figure out fio behavior (missing iops log entries) #1734

it800 · 2024-03-18T14:58:53Z

it800
Mar 18, 2024

We have just finished testing a SAN storage. There was a bunch of stress-tests done to it with the fio running on a dedicated Windows VM. Some tests were supposed to affect IO on the host, so we expected to see IOPS log entries with 0 value while those test were executed. However I found that there were no records at all instead. Moreover, after IO recovered we faced higher IOPS level of write operations (about 110k) for a period of time after which IOPS got back to normal (meaning configured level - 90k). I figured out, that it might be happening because all the IO which could not be written to the storage during IO paths outage was presumably delayed/queued. And after IO became normal again, all delayed write operations were sent out among with the "normal" operations.
We've got a lot of questions about testing results, but I will focus on the below 2 most important ones:

Why didn't fio record 0 IOPS log entries, so we could find the IO outage only by finding gaps between consequent log entries in the log?
How can we configure fio to avoid sending out "delayed" IO operations and make it always keep the IOPS level according to the configuration?

fo configuration file:

[global]
iodepth=128
direct=1
ioengine=windowsaio
group_reporting=1
time_based
runtime=36000
numjobs=1
rw=randrw
write_lat_log=test1
log_avg_msec=1000
write_iops_log=test1
iopsavgtime=1000
disable_slat=1
disable_clat=1
log_unix_epoch=1

[job1]
size=3G
filename=\.\PhysicalDrive1
bs=8k
rate_iops=10k,90k

vincentkfu · 2024-03-18T20:30:01Z

vincentkfu
Mar 18, 2024
Collaborator

If the measurement is zero in a time interval then fio does not emit a log entry. The relevant code is in stat.c:__add_samples(). You can just assume that log entries are zero if they do not appear at the expected time. This reduces the amount of memory consumed by log entries which is beneficial for long jobs.

As for your second question, I do not have an immediate answer. Fio inserts a delay as it sets up each I/O for submission. This happens when it chooses a data direction. It does seem strange that fio is to attain the specified rate target for reads but not writes. Can you come up with a small, simple job to reproduce this issue?

0 replies

it800 · 2024-03-18T20:51:37Z

it800
Mar 18, 2024
Author

Thanks for your feedback As per my understanding it seems that all the time that IO paths are down, fio keeps generating IO operations as if IO was normal and somehow puts them into the growing queue. Once IO is back to normal fio starts sending out all the queued operations. Just in my case read rate is not that high and so all the delayed read operations get sent to the storage relatively fast. But 90k write rate is big enough so it takes time for the delayed operations to be sent out. It is clear to me now about recording logs, thanks for your explanation. What remains unclear is why does fio queue IO when the drive (storage) is unavailable for read-write, and is there any way to make it not do like that Your opinion and advices will be greatly appreciated Br, Ilya Пн, 18 марта 2024 г. в 23:30, Vincent Kang Fu ***@***.***>:

…

If the measurement is zero in a time interval then fio does not emit a log entry. The relevant code is in stat.c:__add_samples(). You can just assume that log entries are zero if they do not appear at the expected time. This reduces the amount of memory consumed by log entries which is beneficial for long jobs. As for your second question, I do not have an immediate answer. Fio inserts a delay as it sets up each I/O for submission. This happens when it chooses a data direction. It does seem strange that fio is to attain the specified rate target for reads but not writes. Can you come up with a small, simple job to reproduce this issue? — Reply to this email directly, view it on GitHub <#1734 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BCZ6XO26QYLUNWBQ5SEF3KTYY5FF5AVCNFSM6AAAAABE3YLHR2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DQMZSHA2DQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

vincentkfu Mar 18, 2024
Collaborator

You can test your hypothesis by observing fio's output when it is run with --debug=io. There are no timestamps in the debug output but you can observe the console when your SAN is unresponsive. I would expect that once fio reaches the specified IO depth it will stop submitting requests.

it800 · 2024-03-18T21:20:04Z

it800
Mar 18, 2024
Author

During one of the tests the storage was unavailable for about 2 minutes. After recovering write IO rate jumped to ~110k IOPS (despite 90k configured) and lasted at that level for about 9 minutes after which write rate stabilized at the expected mark of 90k iops. 110k mean 90k “normal” operations plus 20k delayed ops. Now if you multiply 90000 by 120, then divide by 20000, you will get 540 seconds which absolutely meet the above mentioned 9 minutes of high write rate duration. It's really strange why fio goes beyond the configured rate Вт, 19 марта 2024 г. в 00:05, Vincent Kang Fu ***@***.***>:

…

You can test your hypothesis by observing fio's output when it is run with --debug=io. There are no timestamps in the debug output but you can observe the console when your SAN is unresponsive. I would expect that once fio reaches the specified IO depth it will stop submitting requests. — Reply to this email directly, view it on GitHub <#1734 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BCZ6XO4X6E55BGBZ2EPF67LYY5JLLAVCNFSM6AAAAABE3YLHR2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DQMZTGEYTG> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

it800 · 2024-03-19T06:34:37Z

it800
Mar 19, 2024
Author

@vincentkfu

Please find below another example of unexpected IOPS level. The "high" IO is confusing, because we expect a stable and predictable IOPS/bandwidth level on the storage side. When IOPS is lower than expected it's quite understandable because the storage might be not capable of processing IO because of some reasons (some paths missing, CPU overload etc). When the traffic comes at the higher rate, it can't be clearly explained, because fio is configured to access the LUN (presented as the drive in VM) directly, no filesystem is involved and the volume is not even initialized in Windows

Storage perfromance monitoring data:

Decoded fio log with the IO outage time point:

Same log, the point when write IO raised to high level, fio delivers "delayed" traffic on the top of the configured 90k write rate

The time point at wich IO went back to the expected IOPS level:

0 replies

vincentkfu · 2024-03-19T15:15:25Z

vincentkfu
Mar 19, 2024
Collaborator

Ok that's pretty convincing. I have two suggestions:

try --rate_process=poisson
carry out experiments to simplify your job to the minimum set of options needed to reproduce the issue

0 replies

it800 · 2024-03-20T07:58:09Z

it800
Mar 20, 2024
Author

@vincentkfu
Yet another observation. We rolled out a new VM with Ubuntu Linux on it, launched fio with the same options in the config file. And got quite different situation:

It looks like either Linux IO subsystem or libaio engine work differently than windows/windowsaio: after the outage there is a short spike due to pending operations, and yet there is no "smeared" traffic from pending operations at the 20k IOPS level that was added to the 90k IOPS test traffic as we observed in Windows. One more thing to say - fio in linux makes almost no load to vCPUs, whereas windows VM really works hard consuming a lot of CPU time - under the same configuration scenario

0 replies

it800 · 2024-03-21T14:16:32Z

it800
Mar 21, 2024
Author

We keep doing tetst and the worst thing is that after each path recovery we have to wait until write IO stabilizes. It does look like fio wants at any cost to make average iops level to be equal to the the configured rated IOPS. So if we have configured write IOPS rate = 90k and the storage can only process 60k, then after the storage is OK again, fio just makes the write IOPS as big as possible to deliver all "undelivered" IO ops.
It's really not expected and we don't know how to avoid fio of doing that.

0 replies

it800 · 2024-03-21T14:31:08Z

it800
Mar 21, 2024
Author

This is how it looks.

0 replies

it800 · 2024-03-25T11:15:04Z

it800
Mar 25, 2024
Author

@vincentkfu Could you please explain, what fio is supposed to do in case queue size of the target device matches iodepth value or goes beyond that value? Will fio decrease IOPS until the number of ongoing IO ops decreases to be less or equal to iodepth?

5 replies

it800 Mar 25, 2024
Author

As for the "spikes" or what I called "unexpected IOPS level". We ran a lot of tests, including iostat and esxtop, to compare data from fio and from VM/hypervisor, and I think the only explanation for the higher IO level after storage recovery is that fio is doing its best to make the average IOPS level match the value of the rate_iops parameter. This is fine in principle, but this behavior has a negative impact on the tests - after all, fio is expected to generate a tuned and predictable IO ops level. Instead, we had to wait for it to catch up to the average IOPS value and then return to the level defined in rate_iops.

I'm studying the documentation thoroughly, but so far I haven't found any settings that would change the described scenario in favor of a stable IOPS value regardless of storage state or I/O paths. From the description rate_ignore_thinktime looks like what I need, but as I understand it, it's more about thinktime options, which I don't use. Any help with my question would be greatly appreciated.

vincentkfu Mar 25, 2024
Collaborator

Fio will never have more than iodepth requests in flight.

I made a couple suggestions for troubleshooting last week. Have you tried them?

it800 Mar 25, 2024
Author

Apparently, I didn't express myself very accurately. I didn't mean that fio can do more parallel IO operations than the iodepth value. As I see from the tests, and from the documentation rate_iops is just one of the limiters, which acts together with others. And if queue=iodepth, it is also a limiter that makes fio reduce IOPS.
In my case, testing storage leads to the fact that it can degrade performance. This causes the device queue to grow in the OS and we see a decrease in IOPS on the graphs as fio is limited by iodepth and reduces IOPS. Could you please confirm that our logic is correct
As for your recommendations, yes I applied them, but I didn't see any difference in the results I got.

it800 Mar 25, 2024
Author

In any case rate_iops does not seem to work as described, since it does not limit the number of IOPS in some specific scenarios. Instead, it seems that fio uses the rate_iops value as an average target IOPS value. I have no other explanation for the effects we observe during our tests.

it800 Apr 4, 2024
Author

@vincentkfu After all, do you think that ignoring rate_iops value by fio is normal under the described conditions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need help to figure out fio behavior (missing iops log entries) #1734

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Need help to figure out fio behavior (missing iops log entries) #1734

it800 Mar 18, 2024

Replies: 9 comments · 6 replies

vincentkfu Mar 18, 2024 Collaborator

it800 Mar 18, 2024 Author

vincentkfu Mar 18, 2024 Collaborator

it800 Mar 18, 2024 Author

it800 Mar 19, 2024 Author

vincentkfu Mar 19, 2024 Collaborator

it800 Mar 20, 2024 Author

it800 Mar 21, 2024 Author

it800 Mar 21, 2024 Author

it800 Mar 25, 2024 Author

it800 Mar 25, 2024 Author

vincentkfu Mar 25, 2024 Collaborator

it800 Mar 25, 2024 Author

it800 Mar 25, 2024 Author

it800 Apr 4, 2024 Author

it800
Mar 18, 2024

Replies: 9 comments 6 replies

vincentkfu
Mar 18, 2024
Collaborator

it800
Mar 18, 2024
Author

vincentkfu Mar 18, 2024
Collaborator

it800
Mar 18, 2024
Author

it800
Mar 19, 2024
Author

vincentkfu
Mar 19, 2024
Collaborator

it800
Mar 20, 2024
Author

it800
Mar 21, 2024
Author

it800
Mar 21, 2024
Author

it800
Mar 25, 2024
Author

it800 Mar 25, 2024
Author

vincentkfu Mar 25, 2024
Collaborator

it800 Mar 25, 2024
Author

it800 Mar 25, 2024
Author

it800 Apr 4, 2024
Author