Update estat iscsi, zvol, and zpl scripts. #55

brad-lewis · 2021-02-01T23:24:04Z

closes #53, closes #54

The zvol script would not compile:

/usr/src/zfs-5.4.0-64-dx2021012602-7a44e1b99-generic/include/sys/zvol_impl.h:53:2: error: unknown type name 'dataset_kstats_t'
        dataset_kstats_t        zv_kstat;       /* zvol kstats */
        ^
/virtual/main.c:168:3: warning: 'memcpy' will always overflow; destination buffer has size 5, but size argument is 6 [-Wfortify-source]
                __builtin_memcpy(&axis, "sync", WRITE_LENGTH);
                ^
/virtual/main.c:171:3: warning: 'memcpy' will always overflow; destination buffer has size 5, but size argument is 6 [-Wfortify-source]
                __builtin_memcpy(&axis, "async", WRITE_LENGTH);

There is an undefined symbol fixed by adding an include statment. Additionally there are some warning messages involving the length of strings being passed to __builtin_memcpy. The script compiles and runs on trunk bits:

$ sudo cmd/estat.py zvol  -m 5
01/29/21 - 19:33:19 UTC

 Tracing enabled... Hit Ctrl-C to end.


   microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[70, 80)                        1 |@                                       
[100, 200)                      1 |@                                       

                                       iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                          0               87              256                7


                                       iops(/s)  throughput(k/s)
total                                         0                7

The zpl script would not compile:

$ sudo estat zpl -m 5
/virtual/main.c:94:57: error: unknown type name 'uio_t'
zfs_read_write_entry(io_info_t *info, struct inode *ip, uio_t *uio, int flags)
                                                        ^
/virtual/main.c:120:55: error: unknown type name 'uio_t'
zfs_read_entry(struct pt_regs *ctx, struct inode *ip, uio_t *uio, int flags)
                                                      ^
/virtual/main.c:129:56: error: unknown type name 'uio_t'
zfs_write_entry(struct pt_regs *ctx, struct inode *ip, uio_t *uio, int flags)
                                                       ^
/virtual/main.c:139:60: error: unknown type name 'uio_t'
zfs_read_write_exit(struct pt_regs *ctx, struct inode *ip, uio_t *uio)
                                                           ^
4 errors generated.
Traceback (most recent call last):
  File "/usr/bin/estat", line 402, in <module>
    b = BPF(text=bpf_text, cflags=cflags, debug=debug_level)
  File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 364, in __init__
    raise Exception("Failed to compile BPF module %s" % (src_file or "<text>"))
Exception: Failed to compile BPF module <text>

The main issues is that all uio_t arguments have been replaced by zfs_uio_t arguments. After fixing that issue the script wouldn't compile seeming to hit a limit on stackspace. I rewrote the string copying to use global constants and few local variables and that is working.

02/01/21 - 22:54:52 UTC

 Tracing enabled... Hit Ctrl-C to end.

The estat iscsi script was missing reads. For a refresh of an engine from a snapshot I observed only one read being recorded:

01/28/21 - 23:43:26 UTC

   microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[0, 10)                         1 |@                                       

                                       iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                          0                9                0           117964


                                       iops(/s)  throughput(k/s)
total                                         0           117964

Examining the code in iscsi_target.c, it seems there is another path of execution for reads that does not involve calling iscsit_build_rsp_pdu() but instead calling iscsit_build_datain_pdu. With this extra exit point define the reads seem to be accounted for. There are some conditions involving sense data where both functions would be called for a single read. In that case the call to iscsit_build_datain_pdu() would go first and remove the hashed entry point record before the call to iscsit_build_rsp_pdu(). That seems appropriate as iscsit_build_rsp_pdu() is only adding the sense data. Additionally I noticed that the direction field in the iscsi_cmd structure provides a way to tell if an operation is a read or a write eliminating the need to store the header flags at the start of the operation.

01/29/21 - 21:09:20 UTC

   microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[0, 10)                        54 |@@@@@@@@@@@@@@@                         
[10, 20)                       45 |@@@@@@@@@@@@@                           
[20, 30)                        8 |@@                                      
[30, 40)                        2 |@                                       
[50, 60)                        2 |@                                       
[60, 70)                        1 |@                                       
[70, 80)                        2 |@                                       
[80, 90)                        2 |@                                       
[90, 100)                       3 |@                                       
[100, 200)                     24 |@@@@@@@                                 
[200, 300)                      1 |@                                       
[800, 900)                      1 |@                                       

   microseconds                                              write, sync
value range                 count ------------- Distribution ------------- 
[0, 10)                        21 |@@@@                                    
[10, 20)                        3 |@                                       
[1000, 2000)                   27 |@@@@@                                   
[2000, 3000)                   42 |@@@@@@@                                 
[3000, 4000)                   42 |@@@@@@@                                 
[4000, 5000)                   24 |@@@@                                    
[5000, 6000)                   19 |@@@                                     
[6000, 7000)                   21 |@@@@                                    
[7000, 8000)                   10 |@@                                      
[8000, 9000)                   10 |@@                                      
[9000, 10000)                   5 |@                                       
[10000, 20000)                 32 |@@@@@                                   
[20000, 30000)                  4 |@                                       

                                       iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                         29               43             6756             1304
write, sync                                  51             5452         24905069             1836


                                       iops(/s)  throughput(k/s)
total                                        81             3141


01/29/21 - 21:09:25 UTC

   microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[0, 10)                        66 |@@@@@@@@@@                              
[10, 20)                       73 |@@@@@@@@@@@                             
[20, 30)                       31 |@@@@@                                   
[30, 40)                        9 |@@                                      
[40, 50)                       15 |@@@                                     
[50, 60)                        7 |@                                       
[60, 70)                       14 |@@                                      
[70, 80)                        5 |@                                       
[80, 90)                        9 |@@                                      
[90, 100)                      13 |@@                                      
[100, 200)                     22 |@@@@                                    
[200, 300)                      4 |@                                       
[300, 400)                      1 |@                                       

   microseconds                                              write, sync
value range                 count ------------- Distribution ------------- 
[0, 10)                        34 |@@@@@@@@                                
[10, 20)                        2 |@                                       
[1000, 2000)                   27 |@@@@@@                                  
[2000, 3000)                   43 |@@@@@@@@@@                              
[3000, 4000)                   21 |@@@@@                                   
[4000, 5000)                    5 |@                                       
[5000, 6000)                   11 |@@@                                     
[6000, 7000)                    2 |@                                       
[7000, 8000)                    5 |@                                       
[8000, 9000)                    5 |@                                       
[9000, 10000)                   1 |@                                       
[10000, 20000)                 17 |@@@@                                    
[20000, 30000)                  1 |@                                       

                                       iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                         53               41             2264             1441
write, sync                                  35             3702         15902775             1673


                                       iops(/s)  throughput(k/s)
total                                        88             3114

closes delphix#53, closes delphix#54

prakashsurya · 2021-02-01T23:43:55Z

bpf/estat/zpl.c

@@ -42,7 +46,7 @@ equal_to_pool(char *str)
 }

 static inline int
-zfs_read_write_entry(io_info_t *info, struct inode *ip, uio_t *uio, int flags)
+zfs_read_write_entry(io_info_t *info, struct inode *ip, zfs_uio_t *uio, int flags)


Does this change work for all previous kernel versions? E.g. I'm curious about the case where a customer may upgrade, but continue running a kernel from a prior released (i.e. no reboot), would this script continue to work?

No, that would be a problem.

We could deliver separate C scripts for each kernel version and then have the estat python script run the correct one. Is that a problem we want to solve?

Is that a problem we want to solve?

I'd like to defer this question to the team.

Today, AFAIK, we do support deferred/non-reboot upgrades from any prior 6.0-based release to the latest one. So it's possible that we could have a system running the kernel bits from the 6.0.0.0 release, but the userland bits from the most recent 6.0.6.0 release.

In that case, I think these scripts would no longer work, since by design we only (currently) support running the scripts on the matching kernel for that release, right? So, we'd need to decide as a team, if it's OK for the scripts not to work in such a scenario.

If the current architecture of these scripts is to only work when run on the kernel version of the matching release (e.g. 6.0.6.0 scripts only work with the 6.0.6.0 kernel and modules), then I'll approve this, since it's consistent with our existing design decisions, even though I feel like that design is lacking and prone to failure on deferred upgrades.

I agree, the current perf-diag design does not address deferred upgrade. It is something we should give some thought too. Changes to stbtrace scripts could be more problematic since they are used in analytics.

It does seem to be outside the scope of this PR. At a miniumum, the scripts should run on matching kernel versions.

Brad, from my understanding those scripts are only run manually by support, is that right?

If that's the case then having them work for deferred upgrade is probably not a P1, but we should definitely create a bug and allocate time for it.

While this specific script may only be run by support, as Brad mentioned, there's other scripts that are used by the product (for analytics) and may suffer from this same problem on deferred upgrade.

Seb mentioned to me that we already have a bug tracking this. I was just unsuccessful finding it but I'll make sure there is a jira issue.

Yeah, for other scripts that are used by analytics, I think supporting deferred upgrade is a must.

closes delphix#53, closes delphix#54

Update estat iscsi, zvol, and zpl scripts.

c7ff886

closes delphix#53, closes delphix#54

brad-lewis assigned sebroy, prakashsurya and brad-lewis and unassigned sebroy Feb 1, 2021

prakashsurya reviewed Feb 1, 2021

View reviewed changes

prakashsurya approved these changes Feb 2, 2021

View reviewed changes

brad-lewis assigned pzakha Feb 2, 2021

pzakha approved these changes Feb 3, 2021

View reviewed changes

brad-lewis merged commit b7c347d into delphix:master Feb 3, 2021

brad-lewis deleted the iscsi_read branch February 3, 2021 17:34

brad-lewis added a commit to brad-lewis/performance-diagnostics that referenced this pull request Feb 3, 2021

Update estat iscsi, zvol, and zpl scripts. (delphix#55)

2c79d81

closes delphix#53, closes delphix#54

brad-lewis mentioned this pull request Feb 3, 2021

Backport of estat updates for 6.0.7.0 #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update estat iscsi, zvol, and zpl scripts. #55

Update estat iscsi, zvol, and zpl scripts. #55

Uh oh!

brad-lewis commented Feb 1, 2021

Uh oh!

prakashsurya Feb 1, 2021

Uh oh!

brad-lewis Feb 2, 2021

Uh oh!

prakashsurya Feb 2, 2021

Uh oh!

brad-lewis Feb 2, 2021

Uh oh!

pzakha Feb 3, 2021

Uh oh!

prakashsurya Feb 3, 2021

Uh oh!

brad-lewis Feb 3, 2021

Uh oh!

pzakha Feb 3, 2021

Uh oh!

Uh oh!

Update estat iscsi, zvol, and zpl scripts. #55

Update estat iscsi, zvol, and zpl scripts. #55

Uh oh!

Conversation

brad-lewis commented Feb 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!