Scalability issue visible when the number of concurrent requests > 128


```
# on latte master branch (cassandra_cpp):
$ target/release/latte run workloads/basic/read.rn -d 1000000 -t 2 --tag cassandra_cpp

# on scylla_driver_rebase branch:
$ target/release/latte run workloads/basic/read.rn -d 1000000 -t 2 --tag scylla -b read.Test_Cluster.4.0.1-SNAPSHOT.cassandra_cpp.p384.t2.c1.20211126.163116.json


CONFIG ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
                            ───────────── A ─────────────  ────────────── B ────────────     Change     
            Date            Fri, 26 Nov 2021               Fri, 26 Nov 2021                           
            Time            16:31:08 +0100                 16:33:03 +0100                             
         Cluster            Test Cluster                   Test Cluster                               
      C* version            4.0.1-SNAPSHOT                 4.0.1-SNAPSHOT                             
            Tags            cassandra_cpp                  scylla                                     
        Workload            read.rn                        read.rn                                    
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
         Threads                    2                              2                          +0.0%     
     Connections                    1                              1                          +0.0%     
     Concurrency     [req]        384                            384                          +0.0%     
        Max rate    [op/s]                                                                            
          Warmup       [s]                                                                            
              └─      [op]          1                              1                          +0.0%     
        Run time       [s]                                                                            
              └─      [op]    1000000                        1000000                          +0.0%     
        Sampling       [s]        1.0                            1.0                          +0.0%     

LOG ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
    Time  ───── Throughput ─────  ────────────────────────────────── Response times [ms] ───────────────────────────────────
     [s]      [op/s]     [req/s]         Min        25        50        75        90        95        99      99.9       Max
   0.000      137155      137155       0.645     3.923     4.895     6.227     7.915     9.391    12.663    17.263    23.727
   1.000      140517      140517       0.637     3.773     4.735     6.151     7.703     9.055    12.271    21.007    40.799
   2.000      137920      137920       1.092     3.715     4.707     6.167     8.159     9.503    15.359    28.223    39.071
   3.000      135219      135219       0.964     3.863     4.879     6.223     8.003     9.655    16.335    28.127    33.567
   4.000      138897      138897       0.780     3.799     4.819     6.367     8.031     9.055    11.487    18.319    31.967
   5.000      133410      133410       0.331     3.843     5.011     6.575     8.327     9.751    14.055    20.511    27.663
   6.000      137751      137751       0.587     3.799     4.787     6.227     8.163     9.431    12.511    27.423    42.399
   7.000      144696      144696       0.914     3.909     4.827     6.055     7.519     8.287    10.071    11.375    23.007

SUMMARY STATS ══════════════════════════════════════════════════════════════════════════════════════════════════════════════
                            ───────────── A ─────────────  ────────────── B ────────────     Change     P-value  Signif.
    Elapsed time       [s]      6.942                          7.271                          +4.7%     
        CPU time       [s]     12.746                         12.985                          +1.9%     
 CPU utilisation       [%]       23.0                           22.3                          -2.7%     
           Calls      [op]    1000000                        1000000                          +0.0%     
          Errors      [op]          0                              0                          +0.0%     
              └─       [%]        0.0                            0.0                          +0.0%     
        Requests     [req]    1000000                        1000000                          +0.0%     
              └─  [req/op]        1.0                            1.0                          +0.0%     
            Rows     [row]     500425                         499581                          -0.2%     
              └─ [row/req]        0.5                            0.5                          -0.2%     
         Samples                    7                              8                         +14.3%     
Mean sample size      [op]     142857                         125000                         -12.5%     
              └─     [req]     142857                         125000                         -12.5%     
     Concurrency     [req]      297.8                          360.0                         +20.9%     
              └─       [%]       77.5                           93.8                         +20.9%     
      Throughput    [op/s]     144068 ± 2123                  137543 ± 3159                   -4.5%     0.00011  **   
              ├─   [req/s]     144068 ± 2123                  137543 ± 3159                   -4.5%     0.00011  **   
              └─   [row/s]      72095 ± 1229                   68714 ± 1363                   -4.7%     0.00004  ***  
  Mean call time      [ms]      4.627 ± 0.103                  5.333 ± 0.120                 +15.2%     0.00000  *****
 Mean resp. time      [ms]      4.622 ± 0.103                  5.330 ± 0.120                 +15.3%     0.00000  *****


RESPONSE TIMES [ms] ════════════════════════════════════════════════════════════════════════════════════════════════════════
                            ───────────── A ─────────────  ────────────── B ────────────     Change     P-value  Signif.
          Min                   0.344 ± 0.108                  0.728 ± 0.286                +111.7%     0.00260  *    
           25                   2.498 ± 0.090                  3.820 ± 0.079                 +52.9%     0.00000  *****
           50                   3.781 ± 0.107                  4.832 ± 0.117                 +27.8%     0.00000  *****
           75                   5.621 ± 0.189                  6.267 ± 0.175                 +11.5%     0.00000  **** 
           90                   8.105 ± 0.369                  8.021 ± 0.258                  -1.0%     0.55184       
           95                  10.405 ± 0.543                  9.359 ± 0.405                 -10.1%     0.00031  **   
           98                  14.847 ± 1.472                 11.281 ± 0.747                 -24.0%     0.00006  ***  
           99                  19.435 ± 3.130                 13.376 ± 2.164                 -31.2%     0.00028  **   
           99.9                28.216 ± 6.287                 22.519 ± 6.067                 -20.2%     0.05167       
           99.99               30.291 ± 3.934                 25.367 ± 6.235                 -16.3%     0.04922       
          Max                  34.476 ± 5.021                 33.786 ± 8.295                  -2.0%     0.81925       

```

So I ported current master of latte to scylladb driver, and I noticed a weird issue - minimum and average latencies are much higher than cassandra_cpp, and throughput is still a bit behind in this test (even though this test is short, this phenomenon exist also when running it for longer). I've run it a few times, also in reversed order and it is very repeatable. The difference is larger on single thread.

Are there any tuning options for the buffering thing you've added recently? 

BTW: not sure why the number of returned rows is only half the number of queries, but that looks like a bug in the workload generation / definition; that's exactly same code on both sides, so shouldn't change the validity of this test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scalability issue visible when the number of concurrent requests > 128 #362

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scalability issue visible when the number of concurrent requests > 128 #362

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions