Skip to content

Update estat iscsi, zvol, and zpl scripts. #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 3, 2021

Conversation

brad-lewis
Copy link
Contributor

closes #53, closes #54

  1. The zvol script would not compile:
/usr/src/zfs-5.4.0-64-dx2021012602-7a44e1b99-generic/include/sys/zvol_impl.h:53:2: error: unknown type name 'dataset_kstats_t'
        dataset_kstats_t        zv_kstat;       /* zvol kstats */
        ^
/virtual/main.c:168:3: warning: 'memcpy' will always overflow; destination buffer has size 5, but size argument is 6 [-Wfortify-source]
                __builtin_memcpy(&axis, "sync", WRITE_LENGTH);
                ^
/virtual/main.c:171:3: warning: 'memcpy' will always overflow; destination buffer has size 5, but size argument is 6 [-Wfortify-source]
                __builtin_memcpy(&axis, "async", WRITE_LENGTH);

There is an undefined symbol fixed by adding an include statment. Additionally there are some warning messages involving the length of strings being passed to __builtin_memcpy. The script compiles and runs on trunk bits:

$ sudo cmd/estat.py zvol  -m 5
01/29/21 - 19:33:19 UTC

 Tracing enabled... Hit Ctrl-C to end.


   microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[70, 80)                        1 |@                                       
[100, 200)                      1 |@                                       

                                       iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                          0               87              256                7


                                       iops(/s)  throughput(k/s)
total                                         0                7
  1. The zpl script would not compile:
$ sudo estat zpl -m 5
/virtual/main.c:94:57: error: unknown type name 'uio_t'
zfs_read_write_entry(io_info_t *info, struct inode *ip, uio_t *uio, int flags)
                                                        ^
/virtual/main.c:120:55: error: unknown type name 'uio_t'
zfs_read_entry(struct pt_regs *ctx, struct inode *ip, uio_t *uio, int flags)
                                                      ^
/virtual/main.c:129:56: error: unknown type name 'uio_t'
zfs_write_entry(struct pt_regs *ctx, struct inode *ip, uio_t *uio, int flags)
                                                       ^
/virtual/main.c:139:60: error: unknown type name 'uio_t'
zfs_read_write_exit(struct pt_regs *ctx, struct inode *ip, uio_t *uio)
                                                           ^
4 errors generated.
Traceback (most recent call last):
  File "/usr/bin/estat", line 402, in <module>
    b = BPF(text=bpf_text, cflags=cflags, debug=debug_level)
  File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 364, in __init__
    raise Exception("Failed to compile BPF module %s" % (src_file or "<text>"))
Exception: Failed to compile BPF module <text>

The main issues is that all uio_t arguments have been replaced by zfs_uio_t arguments. After fixing that issue the script wouldn't compile seeming to hit a limit on stackspace. I rewrote the string copying to use global constants and few local variables and that is working.

02/01/21 - 22:54:52 UTC

 Tracing enabled... Hit Ctrl-C to end.
  1. The estat iscsi script was missing reads. For a refresh of an engine from a snapshot I observed only one read being recorded:
01/28/21 - 23:43:26 UTC

   microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[0, 10)                         1 |@                                       

                                       iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                          0                9                0           117964


                                       iops(/s)  throughput(k/s)
total                                         0           117964

Examining the code in iscsi_target.c, it seems there is another path of execution for reads that does not involve calling iscsit_build_rsp_pdu() but instead calling iscsit_build_datain_pdu. With this extra exit point define the reads seem to be accounted for. There are some conditions involving sense data where both functions would be called for a single read. In that case the call to iscsit_build_datain_pdu() would go first and remove the hashed entry point record before the call to iscsit_build_rsp_pdu(). That seems appropriate as iscsit_build_rsp_pdu() is only adding the sense data. Additionally I noticed that the direction field in the iscsi_cmd structure provides a way to tell if an operation is a read or a write eliminating the need to store the header flags at the start of the operation.

01/29/21 - 21:09:20 UTC

   microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[0, 10)                        54 |@@@@@@@@@@@@@@@                         
[10, 20)                       45 |@@@@@@@@@@@@@                           
[20, 30)                        8 |@@                                      
[30, 40)                        2 |@                                       
[50, 60)                        2 |@                                       
[60, 70)                        1 |@                                       
[70, 80)                        2 |@                                       
[80, 90)                        2 |@                                       
[90, 100)                       3 |@                                       
[100, 200)                     24 |@@@@@@@                                 
[200, 300)                      1 |@                                       
[800, 900)                      1 |@                                       

   microseconds                                              write, sync
value range                 count ------------- Distribution ------------- 
[0, 10)                        21 |@@@@                                    
[10, 20)                        3 |@                                       
[1000, 2000)                   27 |@@@@@                                   
[2000, 3000)                   42 |@@@@@@@                                 
[3000, 4000)                   42 |@@@@@@@                                 
[4000, 5000)                   24 |@@@@                                    
[5000, 6000)                   19 |@@@                                     
[6000, 7000)                   21 |@@@@                                    
[7000, 8000)                   10 |@@                                      
[8000, 9000)                   10 |@@                                      
[9000, 10000)                   5 |@                                       
[10000, 20000)                 32 |@@@@@                                   
[20000, 30000)                  4 |@                                       

                                       iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                         29               43             6756             1304
write, sync                                  51             5452         24905069             1836


                                       iops(/s)  throughput(k/s)
total                                        81             3141


01/29/21 - 21:09:25 UTC

   microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[0, 10)                        66 |@@@@@@@@@@                              
[10, 20)                       73 |@@@@@@@@@@@                             
[20, 30)                       31 |@@@@@                                   
[30, 40)                        9 |@@                                      
[40, 50)                       15 |@@@                                     
[50, 60)                        7 |@                                       
[60, 70)                       14 |@@                                      
[70, 80)                        5 |@                                       
[80, 90)                        9 |@@                                      
[90, 100)                      13 |@@                                      
[100, 200)                     22 |@@@@                                    
[200, 300)                      4 |@                                       
[300, 400)                      1 |@                                       

   microseconds                                              write, sync
value range                 count ------------- Distribution ------------- 
[0, 10)                        34 |@@@@@@@@                                
[10, 20)                        2 |@                                       
[1000, 2000)                   27 |@@@@@@                                  
[2000, 3000)                   43 |@@@@@@@@@@                              
[3000, 4000)                   21 |@@@@@                                   
[4000, 5000)                    5 |@                                       
[5000, 6000)                   11 |@@@                                     
[6000, 7000)                    2 |@                                       
[7000, 8000)                    5 |@                                       
[8000, 9000)                    5 |@                                       
[9000, 10000)                   1 |@                                       
[10000, 20000)                 17 |@@@@                                    
[20000, 30000)                  1 |@                                       

                                       iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                         53               41             2264             1441
write, sync                                  35             3702         15902775             1673


                                       iops(/s)  throughput(k/s)
total                                        88             3114

@@ -42,7 +46,7 @@ equal_to_pool(char *str)
}

static inline int
zfs_read_write_entry(io_info_t *info, struct inode *ip, uio_t *uio, int flags)
zfs_read_write_entry(io_info_t *info, struct inode *ip, zfs_uio_t *uio, int flags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change work for all previous kernel versions? E.g. I'm curious about the case where a customer may upgrade, but continue running a kernel from a prior released (i.e. no reboot), would this script continue to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would be a problem.

We could deliver separate C scripts for each kernel version and then have the estat python script run the correct one. Is that a problem we want to solve?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a problem we want to solve?

I'd like to defer this question to the team.

Today, AFAIK, we do support deferred/non-reboot upgrades from any prior 6.0-based release to the latest one. So it's possible that we could have a system running the kernel bits from the 6.0.0.0 release, but the userland bits from the most recent 6.0.6.0 release.

In that case, I think these scripts would no longer work, since by design we only (currently) support running the scripts on the matching kernel for that release, right? So, we'd need to decide as a team, if it's OK for the scripts not to work in such a scenario.

If the current architecture of these scripts is to only work when run on the kernel version of the matching release (e.g. 6.0.6.0 scripts only work with the 6.0.6.0 kernel and modules), then I'll approve this, since it's consistent with our existing design decisions, even though I feel like that design is lacking and prone to failure on deferred upgrades.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the current perf-diag design does not address deferred upgrade. It is something we should give some thought too. Changes to stbtrace scripts could be more problematic since they are used in analytics.

It does seem to be outside the scope of this PR. At a miniumum, the scripts should run on matching kernel versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brad, from my understanding those scripts are only run manually by support, is that right?

If that's the case then having them work for deferred upgrade is probably not a P1, but we should definitely create a bug and allocate time for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this specific script may only be run by support, as Brad mentioned, there's other scripts that are used by the product (for analytics) and may suffer from this same problem on deferred upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seb mentioned to me that we already have a bug tracking this. I was just unsuccessful finding it but I'll make sure there is a jira issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for other scripts that are used by analytics, I think supporting deferred upgrade is a must.

@brad-lewis brad-lewis merged commit b7c347d into delphix:master Feb 3, 2021
@brad-lewis brad-lewis deleted the iscsi_read branch February 3, 2021 17:34
brad-lewis added a commit to brad-lewis/performance-diagnostics that referenced this pull request Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

estat iscsi under-reporting reads estat zpl and zvol scripts not compiling
4 participants