Skip to content

DLPX-78812 Disk IO analytics collector not running on aws #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 21, 2021

Conversation

brad-lewis
Copy link
Contributor

@brad-lewis brad-lewis commented Dec 21, 2021

DESCRIPTION

This commit fixes the disk io analytics collector an aws engines. The stbtrace io script fails with the following error:

$ sudo cmd/stbtrace.py io
cannot attach kprobe, probe entry may not exist
Failed to attach BPF program b'disk_io_done' to kprobe b'blk_account_io_completion'

Though the probe cannot attach, this function is still called in the linux_kernel_aws code:

git grep blk_account_io_completion
block/blk-core.c:static void blk_account_io_completion(struct request *req, unsigned int bytes)
block/blk-core.c:       blk_account_io_completion(req, nr_bytes);

The call is from blk_update_request() which is called by blk_mq_end_request().

In the generic code we have the same lines plus an additional declaration in an include file (does removing this lead to the function not being probable?). Interestingly, there are a few sample bpf scripts that probe the function.

block/blk-core.c:void blk_account_io_completion(struct request *req, unsigned int bytes)
block/blk-core.c:       blk_account_io_completion(req, nr_bytes);
block/blk.h:void blk_account_io_completion(struct request *req, unsigned int bytes);
samples/bpf/task_fd_query_kern.c:SEC("kretprobe/blk_account_io_completion")
samples/bpf/task_fd_query_user.c:       CHECK_AND_RET(test_debug_fs_kprobe(1, "blk_account_io_completion",
samples/bpf/tracex3_kern.c:SEC("kprobe/blk_account_io_completion")

SUGGESTED FIX

A fix is suggested by the sample scripts. The aws repo uses blk_account_io_done() for the examples.

$ git grep blk_account_io_done
block/blk-core.c:void blk_account_io_done(struct request *req, u64 now)
block/blk-mq.c: blk_account_io_done(rq, now);
block/blk.h:void blk_account_io_done(struct request *req, u64 now);
samples/bpf/task_fd_query_kern.c:SEC("kretprobe/blk_account_io_done")
samples/bpf/task_fd_query_user.c:       CHECK_AND_RET(test_debug_fs_kprobe(1, "blk_account_io_done",
samples/bpf/tracex3_kern.c:SEC("kprobe/blk_account_io_done")

It appears blk_account_io_done() is called shortly after blk_acount_io_complete() and should provide similar latency results. The difference be the time needed to updated stats.

inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
{
        u64 now = 0;

        if (blk_mq_need_time_stamp(rq))
                now = ktime_get_ns();

        if (rq->rq_flags & RQF_STATS) {
                blk_mq_poll_stats_start(rq->q);
                blk_stat_add(rq, now);
        }

        blk_mq_sched_completed_request(rq, now);

        blk_account_io_done(rq, now);                                                                                                            

        if (rq->end_io) {
                rq_qos_done(rq->q, rq);
                rq->end_io(rq, error);
        } else {
                blk_mq_free_request(rq);
        }
}
EXPORT_SYMBOL(__blk_mq_end_request);

void blk_mq_end_request(struct request *rq, blk_status_t error)
{
        if (blk_update_request(rq, error, blk_rq_bytes(rq)))
                BUG();
        __blk_mq_end_request(rq, error);
}
EXPORT_SYMBOL(blk_mq_end_request);

TESTING
With the fix the stbtrace script runs on aws and returns reasonable data:

$ sudo cmd/stbtrace.py io
{"t":"1640043077", "op":"write", "device":"nvme0n1", "error":"0", "count":"5", "avgLatency":"758892", "throughput":"57344", "latency":"{700000,2},{800000,2},{900000,1}", "size":"{16383,5}"}

{"t":"1640043078", "op":"write", "device":"nvme0n1", "error":"0", "count":"6", "avgLatency":"991483", "throughput":"141824", "latency":"{800000,3},{1000000,2},{2000000,1}", "size":"{16383,2},{32767,3},{65535,1}"}

{"t":"1640043079", "op":"write", "device":"nvme0n1", "error":"0", "count":"324", "avgLatency":"1254042", "throughput":"1144832", "latency":"{500000,1},{600000,12},{700000,29},{800000,61},{900000,39},{1000000,32},{2000000,111},{3000000,18},{4000000,15},{5000000,6}", "size":"{1023,115},{2047,64},{4095,65},{8191,45},{16383,23},{32767,7},{65535,5}"}

{"t":"1640043080", "op":"write", "device":"nvme0n1", "error":"0", "count":"4", "avgLatency":"859223", "throughput":"73728", "latency":"{800000,1},{900000,2},{2000000,1}", "size":"{16383,2},{32767,2}"}

{"t":"1640043081", "op":"write", "device":"nvme0n1", "error":"0", "count":"4", "avgLatency":"988386", "throughput":"73728", "latency":"{800000,2},{1000000,1},{2000000,1}", "size":"{16383,1},{32767,3}"}

{"t":"1640043082", "op":"write", "device":"nvme0n1", "error":"0", "count":"4", "avgLatency":"885308", "throughput":"57344", "latency":"{700000,1},{800000,1},{900000,1},{2000000,1}", "size":"{16383,3},{32767,1}"}

{"t":"1640043083", "op":"write", "device":"nvme0n1", "error":"0", "count":"7", "avgLatency":"994749", "throughput":"110592", "latency":"{800000,2},{1000000,1},{2000000,4}", "size":"{16383,4},{32767,3}"}

{"t":"1640043084", "op":"write", "device":"nvme0n1", "error":"0", "count":"303", "avgLatency":"1443829", "throughput":"1190400", "latency":"{500000,1},{600000,17},{700000,35},{800000,55},{900000,24},{1000000,23},{2000000,121},{3000000,14},{4000000,11},{50000000,1},{60000000,1}", "size":"{1023,102},{2047,65},{4095,59},{8191,40},{16383,24},{32767,6},{65535,7}"}

{"t":"1640043085", "op":"write", "device":"nvme0n1", "error":"0", "count":"4", "avgLatency":"820526", "throughput":"49152", "latency":"{700000,1},{800000,1},{900000,1},{2000000,1}", "size":"{16383,4}"}

This also works on a dcol1 vmware engine:

$ sudo cmd/stbtrace.py io
{"t":"1640043155", "op":"write", "device":"sdc", "error":"0", "count":"1", "avgLatency":"10891716", "throughput":"12288", "latency":"{20000000,1}", "size":"{16383,1}"}

{"t":"1640043155", "op":"write", "device":"sdd", "error":"0", "count":"1", "avgLatency":"5501376", "throughput":"20480", "latency":"{6000000,1}", "size":"{32767,1}"}

{"t":"1640043156", "op":"write", "device":"sdb", "error":"0", "count":"3", "avgLatency":"10371131", "throughput":"110592", "latency":"{2000000,1},{20000000,2}", "size":"{16383,1},{65535,2}"}

{"t":"1640043156", "op":"write", "device":"sdc", "error":"0", "count":"2", "avgLatency":"8967005", "throughput":"139264", "latency":"{4000000,1},{20000000,1}", "size":"{16383,1},{131071,1}"}

{"t":"1640043156", "op":"write", "device":"sdd", "error":"0", "count":"2", "avgLatency":"10287815", "throughput":"81920", "latency":"{8000000,1},{20000000,1}", "size":"{16383,1},{131071,1}"}

{"t":"1640043156", "op":"read", "device":"", "error":"0", "count":"1", "avgLatency":"200380", "throughput":"8", "latency":"{300000,1}", "size":"{15,1}"}

{"t":"1640043156", "op":"write", "device":"sda", "error":"0", "count":"14", "avgLatency":"8799258", "throughput":"230912", "latency":"{3000000,1},{4000000,1},{5000000,1},{6000000,1},{8000000,3},{9000000,1},{10000000,1},{20000000,5}", "size":"{1023,1},{4095,4},{8191,3},{16383,4},{131071,2}"}

{"t":"1640043156", "op":"write", "device":"sdd", "error":"0", "count":"1", "avgLatency":"5206128", "throughput":"12288", "latency":"{6000000,1}", "size":"{16383,1}"}

{"t":"1640043157", "op":"read", "device":"sda", "error":"0", "count":"66", "avgLatency":"961235", "throughput":"68096", "latency":"{300000,2},{400000,3},{500000,4},{600000,6},{700000,2},{800000,11},{900000,7},{1000000,9},{2000000,19},{3000000,3}", "size":"{1023,61},{4095,1},{8191,1},{16383,3}"}

Estat backend-io also now works on aws:

sudo cmd/estat.py backend-io 5
12/21/21 - 16:38:17 UTC

 Tracing enabled... Hit Ctrl-C to end.
   microseconds                                                   write 
value range                 count ------------- Distribution ------------- 
[400, 500)                      1 |@                                       
[500, 600)                     18 |@@@                                     
[600, 700)                     43 |@@@@@@                                  
[700, 800)                     73 |@@@@@@@@@@@                             
[800, 900)                     39 |@@@@@@                                  
[900, 1000)                    29 |@@@@                                    
[1000, 2000)                   73 |@@@@@@@@@@@                             
[2000, 3000)                    5 |@                                       

                                       iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
write                                        56              931           123398              266


                                       iops(/s)  throughput(k/s)
total                                        56              266

Copy link
Contributor

@sebroy sebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not also a problem for the estat backend-io.c program?

@brad-lewis
Copy link
Contributor Author

Yes, estat has the same issue. The estat framework(with probes designated in comment lines like "@@ kprobe|blk_account_io_completion|disk_io_done") doesn't support conditional probes. We could expand that at some point but using blk_account_io_done in all environments seems fine.

@sebroy
Copy link
Contributor

sebroy commented Dec 21, 2021

We could expand that at some point but using blk_account_io_done in all environments seems fine.

Ok, in that case, if blk_account_io_done() is the correct place to trace an I/O completion in all cases, then why maintain the conditional in stbtrace?

Copy link
Contributor

@sebroy sebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this works on all of our kernel versions and platforms, this looks good.

@brad-lewis
Copy link
Contributor Author

I also tested on the generic kernel on a dcol1 engine. By inspection, the azure, gcp, and oracle kernels follow the aws code.

@brad-lewis brad-lewis merged commit 729786d into delphix:master Dec 21, 2021
@brad-lewis brad-lewis deleted the DLPX-78812 branch December 21, 2021 20:41
brad-lewis added a commit to brad-lewis/performance-diagnostics that referenced this pull request Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants