Skip to content

DLPX-77532 The iscsi estat script needs to be fixed so it reports dat… #69

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 13, 2021

Conversation

brad-lewis
Copy link
Contributor

@brad-lewis brad-lewis commented Sep 30, 2021

The script had two issues that were fixed.

  1. The size of the requests was wrong. Instead of using hdr->data_length, cmd->se_cmd.data_length is being used.
  2. The endpoint for latency calculations was wrong for large reads. The script previously look to find a function that
    was called when a request was completed. The function iscsit_response_queue() handles read and writes. It is
    simpler conceptually to use this as the completion of the function as the endpoint for read and write latency
    calculations. However; it adds some complexity as there is no common hash value to use in the entry and exit
    probes. The thread id is the only value that a kretprobe has. So a third probe function is introduced on entry
    to iscsit_response_queue() to get the entry timestamp hashed by the cmd pointer and rehash it under the thread id.

There are known issues with standard deviation that will be addressed separately.

In testing it produced consistent results that matched the zvol estat output for different sizes. Here is one example with an 80/20 read/write work and 1M size operations.

ISCSI

sudo estat iscsi -m 30
...
09/30/21 - 20:12:28 UTC

  microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[40, 50)                        1 |@                                       
[50, 60)                        5 |@                                       
[60, 70)                        9 |@                                       
[70, 80)                        4 |@                                       
[80, 90)                       15 |@                                       
[90, 100)                       8 |@                                       
[100, 200)                    275 |@                                       
[200, 300)                    521 |@                                       
[300, 400)                    663 |@                                       
[400, 500)                   1236 |@                                       
[500, 600)                   2220 |@                                       
[600, 700)                   3082 |@                                       
[700, 800)                   3852 |@                                       
[800, 900)                   4381 |@                                       
[900, 1000)                  5057 |@                                       
[1000, 2000)                52889 |@@@@@@@@@@                              
[2000, 3000)                40679 |@@@@@@@@                                
[3000, 4000)                27359 |@@@@@@                                  
[4000, 5000)                16649 |@@@@                                    
[5000, 6000)                11580 |@@@                                     
[6000, 7000)                 9110 |@@                                      
[7000, 8000)                 7998 |@@                                      
[8000, 9000)                 6945 |@@                                      
[9000, 10000)                5818 |@@                                      
[10000, 20000)              15969 |@@@                                     
[20000, 30000)               1431 |@                                       
[30000, 40000)                487 |@                                       
[40000, 50000)                179 |@                                       
[50000, 60000)                126 |@                                       
[60000, 70000)                 95 |@                                       
[70000, 80000)                 33 |@                                       
[80000, 90000)                 32 |@                                       
[90000, 100000)                13 |@                                       
[100000, 200000)               47 |@                                       

  microseconds                                                    write
value range                 count ------------- Distribution ------------- 
[400, 500)                      6 |@                                       
[500, 600)                    103 |@                                       
[600, 700)                    364 |@                                       
[700, 800)                    633 |@                                       
[800, 900)                   1019 |@                                       
[900, 1000)                  1267 |@                                       
[1000, 2000)                16323 |@@@@@@@@@@@@                            
[2000, 3000)                10948 |@@@@@@@@@                               
[3000, 4000)                 6286 |@@@@@                                   
[4000, 5000)                 3824 |@@@                                     
[5000, 6000)                 2497 |@@                                      
[6000, 7000)                 1757 |@@                                      
[7000, 8000)                 1344 |@                                       
[8000, 9000)                 1108 |@                                       
[9000, 10000)                 862 |@                                       
[10000, 20000)               5157 |@@@@                                    
[20000, 30000)                741 |@                                       
[30000, 40000)                138 |@                                       
[40000, 50000)                 38 |@                                       
[50000, 60000)                 35 |@                                       
[60000, 70000)                 11 |@                                       
[70000, 80000)                 10 |@                                       
[80000, 90000)                  2 |@                                       
[90000, 100000)                10 |@                                       
[100000, 200000)               81 |@                                       
[200000, 300000)              116 |@                                       

                                      iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                       7291             4339         20707354         14004544
write                                      1822             5234         24915778          3502208


                                      iops(/s)  throughput(k/s)
total                                      9112         17496512

ZVOL

sudo estat zvol -m 30
...
09/30/21 - 20:13:14 UTC

Tracing enabled... Hit Ctrl-C to end.
  microseconds                                                     read
value range                 count ------------- Distribution ------------- 
[20, 30)                       83 |@                                       
[30, 40)                       94 |@                                       
[40, 50)                       47 |@                                       
[50, 60)                       25 |@                                       
[60, 70)                       16 |@                                       
[70, 80)                       41 |@                                       
[80, 90)                      260 |@                                       
[90, 100)                     551 |@                                       
[100, 200)                   4336 |@                                       
[200, 300)                   1021 |@                                       
[300, 400)                   1240 |@                                       
[400, 500)                   5001 |@                                       
[500, 600)                   7032 |@@                                      
[600, 700)                   7210 |@@                                      
[700, 800)                   7158 |@@                                      
[800, 900)                   7149 |@@                                      
[900, 1000)                  6981 |@@                                      
[1000, 2000)                51253 |@@@@@@@@@@                              
[2000, 3000)                32153 |@@@@@@                                  
[3000, 4000)                18974 |@@@@                                    
[4000, 5000)                12196 |@@@                                     
[5000, 6000)                 9428 |@@                                      
[6000, 7000)                 7886 |@@                                      
[7000, 8000)                 7057 |@@                                      
[8000, 9000)                 6505 |@@                                      
[9000, 10000)                5504 |@@                                      
[10000, 20000)              14956 |@@@                                     
[20000, 30000)               1130 |@                                       
[30000, 40000)                214 |@                                       
[40000, 50000)                 56 |@                                       
[50000, 60000)                 25 |@                                       
[60000, 70000)                 12 |@                                       
[70000, 80000)                  1 |@                                       
[100000, 200000)                7 |@                                       

  microseconds                                              write, sync
value range                 count ------------- Distribution ------------- 
[300, 400)                      6 |@                                       
[400, 500)                    342 |@                                       
[500, 600)                   1124 |@                                       
[600, 700)                   1662 |@@                                      
[700, 800)                   2085 |@@                                      
[800, 900)                   2247 |@@                                      
[900, 1000)                  2383 |@@                                      
[1000, 2000)                16980 |@@@@@@@@@@@@@                           
[2000, 3000)                 7575 |@@@@@@                                  
[3000, 4000)                 4364 |@@@@                                    
[4000, 5000)                 2874 |@@@                                     
[5000, 6000)                 2092 |@@                                      
[6000, 7000)                 1681 |@@                                      
[7000, 8000)                 1258 |@                                       
[8000, 9000)                 1001 |@                                       
[9000, 10000)                 812 |@                                       
[10000, 20000)               4327 |@@@@                                    
[20000, 30000)                600 |@                                       
[30000, 40000)                 64 |@                                       
[40000, 50000)                 10 |@                                       
[50000, 60000)                  2 |@                                       
[100000, 200000)              105 |@                                       
[200000, 300000)               81 |@                                       

                                      iops(/s)  avg latency(us)       stddev(us)  throughput(k/s)
read                                       7186             3698         16010449         13797696
write, sync                                1789             4478         20810217          3435200


                                      iops(/s)  throughput(k/s)
total                                      8978         17239168

Copy link
Contributor

@sebroy sebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does iscsi.st used by our analytics subsystem have the same bug?

@brad-lewis
Copy link
Contributor Author

Yes, iscsi.st needs some attention too. I thought I'd review this first and then once an approach is approved apply it to the stbtrace script.


u64 *tsp = iscsi_start_ts.lookup((u64 *) &cmd);
if (tsp == 0) {
return (0); // missed issue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried recording that metric to see if we were hitting this case after the initial few seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the experiment @pzakha suggested and didn't see any misses after the startup period.

@brad-lewis
Copy link
Contributor Author

brad-lewis commented Oct 11, 2021

The force push made one small change. Fixed a small but obvious error in a conditional. I don't think it really affected the output from the script much as almost all of the time if the code gets to the else if, it will be true.

@@ -109,7 +109,7 @@ iscsi_target_end(struct pt_regs *ctx)
 
        if (data->direction == DMA_FROM_DEVICE) {
                aggregate_data(data, ts, READ_STR);
-       } else if (data->direction & DMA_TO_DEVICE) {
+       } else if (data->direction == DMA_TO_DEVICE) {
                aggregate_data(data, ts, WRITE_STR);
        }
        iscsi_base_data.delete(&tid);

@brad-lewis brad-lewis merged commit a9e54a1 into delphix:master Oct 13, 2021
@brad-lewis brad-lewis deleted the estat_iscsi branch October 13, 2021 14:58
brad-lewis added a commit to brad-lewis/performance-diagnostics that referenced this pull request Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants