Distributed bench #1661

oxade · 2022-04-28T21:42:49Z

This PR allows us to provision load generators running anywhere.
These load gens can then be used to benchmark validators.
There are many intertwined parts of the PR, but at the core, here's how it works:

Generate bench configs for a bunch of IPs and a bunch of load gens

./bench_configure --host-port-stake-triplets "0.0.0.1:5000:1" "1.1.1.1:5000:1" "2.2.2.2:5000:1" "3.3.3.3:5000:1" --number-of-generators 12

This generates config files for the validators (genesis.conf) and for load gens.

ubuntu@ip-172-31-82-122:~/sui/target/release$ ls *.conf
distributed_bench_genesis_0.conf  distributed_bench_genesis_2.conf  load_gen_0.conf  load_gen_10.conf  load_gen_2.conf  load_gen_4.conf  load_gen_6.conf  load_gen_8.conf
distributed_bench_genesis_1.conf  distributed_bench_genesis_3.conf  load_gen_1.conf  load_gen_11.conf  load_gen_3.conf  load_gen_5.conf  load_gen_7.conf  load_gen_9.conf

Copy the files to the machines
load_gen_x.conf goes to load gen machine x
distributed_bench_genesis_y.conf goes to validator machine y
Start the validators
Example

./validator --genesis-config-path distributed_bench_genesis_0.conf --force-genesis --address d22fbff52bccd733067cf8304cb4bfde8355588c  --listen-address 0.0.0.0:5000

Run the load gens
Example

./remote_load_generator --remote-config load_gen_0.conf --chunk-size 800 --period-us 100000 --num-chunks 500

I typically run one load gen as a probe when measuring latency

Once the load gens exhaust their load, they will dump the results.

Few other minor features:

Bulk load objects into authorities in genesis?
Genesis config objects by object id range
Optionally only provision one validator in genesis
use-move is now default

Next steps:
0. Finalize script to automate this process. Due to recent breaking config changes, the old script needs to be improved

Improve quorum logic measurement
Add utility for generating plots
Improve usage of gRPC
Add benchmarks for reads

bmwill · 2022-05-03T15:20:16Z

sui/src/config/mod.rs

+impl AuthorityPrivateInfo {
+    pub fn copy(&self) -> Self {
+        Self {
+            address: self.address,
+            host: self.host.clone(),
+            port: self.port,
+            db_path: self.db_path.clone(),
+            stake: self.stake,
+            consensus_address: self.consensus_address,
+            public_key: self.public_key,
+        }
+    }
+}


I'm not sure i understand the need for this over using clone

Ah this is a vestige from when AuthorityPrivateInfo had keypairs, which were not clone-able.
I forgot to remove this when I merged with main, which removes keypairs recently.
Will delete this.

gdanezis

Nice work. You can simplify quite a bit the sending and receiving logic as suggested if you want.

gdanezis · 2022-05-03T14:56:15Z

sui_types/src/base_types.rs

@@ -553,6 +553,66 @@ impl ObjectID {
            .map_err(|_| ObjectIDParseError::TryFromSliceError)
            .map(ObjectID::from)
    }
+
+    /// Incremenent the ObjectID by usize IDs, assuming the ObjectID hex is a number represented as an array of bytes
+    pub fn advance(&self, step: usize) -> Result<ObjectID, anyhow::Error> {


This is probably not something we want to expose anywhere away to where it is used, and even there -- why are we doing this? Could we: (1) generate IDs randomly or (2) generate IDs using a u64 counter and https://doc.rust-lang.org/beta/std/primitive.u64.html#method.to_be_bytes

Random ID generation will lead to huge genesis config files. Ranges allow me compress and partition load gens to only work on specific objects.

So an entry of 1 million object ids looks like this instead of 1M lines of text

{ "address": "57ce23f7b759259f2831500f637baa59a15a7023", "gas_objects": [], "gas_object_ranges": [ { "offset": "00000000000000000000000000100000007a1200", "count": 1000000, "gas_value": 18446744073709551615 } ] },

But you're right. I should try to localize it to the files where its used

gdanezis · 2022-05-03T14:57:55Z

sui_core/src/authority.rs

@@ -664,6 +665,12 @@ impl AuthorityState {
            .expect("TODO: propagate the error")
    }

+    pub async fn insert_genesis_objects_bulk_unsafe(&self, objects: &[&Object]) {


Lets add a comment this must not be used away from bench / test code. I would even feel better if we use something to not make it visible outside this context.

gdanezis · 2022-05-03T14:59:44Z

sui/src/benchmark/bench_types.rs

+                tick_period_us,
+                latencies,
+            } => {
+                let tracer_avg = latencies.iter().sum::<u128>() as f64 / latencies.len() as f64;


We could report here the p10 p90 or std?

gdanezis · 2022-05-03T15:00:18Z

sui/src/benchmark/load_generator.rs

+    notif.notified().await;
+    let r = send_tx_chunks(tx_chunk, net_client.clone(), conn).await;
+
+    match result_chann_tx.send((r.0, stake)).await {


What about if let Err(...) = ...

gdanezis · 2022-05-03T15:01:21Z

sui/src/benchmark/load_generator.rs

+
+    let _: Vec<_> =
+        r.1.par_iter()
+            .map(|q| check_transaction_response(deserialize_message(&(q.as_ref().unwrap())[..])))


Should we make things fail or print errors is there are errors here? (althrough I think the check_transaction_response does that?)

check_transaction_response already does that

gdanezis · 2022-05-03T15:21:29Z

sui/src/benchmark/transaction_creator.rs

+        // Objects for payment
+        let next_offset = objects[objects.len() - 1].id();
+
+        ObjectID::in_range(next_offset.next_increment().unwrap(), tx_count as u64)


Could we not just create random object IDs? Or is the issue that they need to exist over all authorities so we need a deterministic generation process? This arithmetic on object IDs is very error prone and misleading.

The reason I have the range arithmetic is so that load gen TXes do not clash. I want them to operate in specific object id ranges. This makes debugging much easier

gdanezis · 2022-05-03T15:23:05Z

sui/src/bin/bench_configure.rs

+        account_private_info.push((account_keypair, obj_id_offset));
+
+        // Ensure no overlap
+        obj_id_offset = obj_id_offset


Here again, can we avoid doing arithmetic on object IDs and just generate them at random? I assume this config happens centrally?

If we generate a bunch of random object ids, we end up with a genesis config file millions of lines long which then has to be shipped around and parsed by each validator. It slows down startup time.
My solution keeps this simply by using clearly defined ranges

gdanezis · 2022-05-03T15:25:05Z

sui/src/config/mod.rs

@@ -69,7 +69,7 @@ pub struct AuthorityInfo {
    pub base_port: u16,
 }

-#[derive(Serialize, Debug)]
+#[derive(Serialize, Debug, Clone)]


It seems this is no more the authority private info, since it has no private key? Traditionally we have tried to keep stuff with private keys non-Clone, but this is not an issue any more?

Agreed. Will rename

we should further clean "private" info out such as the db_path

gdanezis · 2022-05-03T15:26:36Z

sui/src/config/mod.rs

@@ -80,6 +80,19 @@ pub struct AuthorityPrivateInfo {
    pub consensus_address: SocketAddr,
 }

+impl AuthorityPrivateInfo {


If it is clone, do we need an explicit copy?

This is a vestige from when AuthorityPrivateInfo had keypairs, which were not clone-able.
I forgot to remove this when I merged with main, which removes keypairs recently.
Will delete this.

gdanezis · 2022-05-03T15:29:12Z

sui/src/sui_commands.rs

@@ -380,11 +383,15 @@ impl SuiNetwork {

 pub async fn genesis(


Should we separate the bench genesis from a clean genesis? It seems we are overreaching into production code quite a bit here to support bench.

I pondered this too, but this means we will have a special validator flow for benchmarking and a different one for prod.

Is this a road we want to go down?

we actually don't need this new parameter single_address for genesis, the GenesisConfig has a keypair struct that is supposed to represent the single_address

created an issue: #1758

after some thoughts, I think we could have different handling for normal genesis and benchmark genesis. They are all the same with the only exception of populating pre-generated objects. We do this by giving different genesis config.

oxade · 2022-05-03T16:20:50Z

Will have a fast follow enhancement PR to address the issues raised in this PR
Thanks guys

longbowlu · 2022-05-03T21:34:41Z

sui/src/sui_commands.rs

@@ -380,11 +383,15 @@ impl SuiNetwork {

 pub async fn genesis(


created an issue: #1758

longbowlu · 2022-05-03T21:48:54Z

sui/src/benchmark/load_generator.rs

+    // Confirmation step
+    let (conf_chann_tx, mut conf_chann_rx) = MpscChannel(net_clients.len() * 2);
+


there is an opportunity to extract the common logic (between intent & conf) into a function

longbowlu · 2022-05-03T21:58:44Z

sui/src/bin/bench_configure.rs

+        let db_path = format!("DB_{}", validator_address);
+        let path = Path::new(&db_path);
+
+        let host_bytes: Vec<u8> = host


longbowlu · 2022-05-03T22:04:44Z

sui/src/bin/bench_configure.rs

+    let host = tokens[0].clone();
+
+    #[allow(clippy::needless_collect)]
+    let host_bytes = host


same thing for from_str

longbowlu · 2022-05-03T22:11:15Z

sui/src/config/mod.rs

@@ -69,7 +69,7 @@ pub struct AuthorityInfo {
    pub base_port: u16,
 }

-#[derive(Serialize, Debug)]
+#[derive(Serialize, Debug, Clone)]


we should further clean "private" info out such as the db_path

longbowlu · 2022-05-03T22:16:48Z

sui/src/sui_commands.rs

@@ -380,11 +383,15 @@ impl SuiNetwork {

 pub async fn genesis(


after some thoughts, I think we could have different handling for normal genesis and benchmark genesis. They are all the same with the only exception of populating pre-generated objects. We do this by giving different genesis config.

longbowlu · 2022-05-03T22:20:18Z

Looking good! Echoing with @gdanezis , we can split out benchmarking-specific stuff out of prod. Created a couple of issues to follow up

* Remote benchmarking

oxade added 9 commits April 27, 2022 02:14

Remote benchmarking

3d49479

Merge branch 'main' of https://github.com/MystenLabs/fastnft

9272783

use u64 max gas

1e0b423

use u64 max gas

5f6aaf6

Remove prints

aa01684

SImpler genesis

0992526

Fix out of order bug

664f5ac

Not fail on disconnecr

43e4e45

improve genesis objectload

d2d4b84

oxade requested a review from longbowlu April 28, 2022 21:42

oxade marked this pull request as draft April 28, 2022 21:43

oxade added 12 commits April 28, 2022 19:13

Error expl

2251d0a

simplify print

43f08fb

authority fix

76e4633

Revert error

ef64273

Sync to main

a70ca6b

Synced to main

49b40c0

Sync to main

fb8e429

Update to main

28c71d1

Update to main

610833b

Merge branch 'main' of https://github.com/MystenLabs/fastnft

7b7edc8

Merge branch 'main' into distributed_bench

f99e1b5

Synced to main

c67b42b

oxade changed the title ~~WIP: Distributed bench~~ Distributed bench May 3, 2022

oxade marked this pull request as ready for review May 3, 2022 04:06

oxade added 2 commits May 3, 2022 00:42

license

a35f59d

fix test

009eb7f

oxade requested review from patrickkuo and velvia May 3, 2022 07:09

oxade marked this pull request as draft May 3, 2022 07:13

Code cleanup

9e110b7

oxade marked this pull request as ready for review May 3, 2022 07:52

oxade requested review from bmwill and gdanezis May 3, 2022 07:58

bmwill approved these changes May 3, 2022

View reviewed changes

gdanezis approved these changes May 3, 2022

View reviewed changes

oxade merged commit 5c44cd2 into main May 3, 2022

oxade deleted the distributed_bench branch May 3, 2022 16:21

longbowlu reviewed May 3, 2022

View reviewed changes

longbowlu mentioned this pull request May 3, 2022

Have different gas object generation logic for production & benchmarking #1763

Open

longbowlu mentioned this pull request May 3, 2022

remove address param from genesis function as we can get the address from genesis config #1758

Closed

longbowlu pushed a commit that referenced this pull request May 12, 2022

Distributed bench (#1661)

e9fb20c

* Remote benchmarking

punwai pushed a commit that referenced this pull request Jul 27, 2022

Distributed bench (#1661)

799b6b4

* Remote benchmarking

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed bench #1661

Distributed bench #1661

oxade commented Apr 28, 2022 •

edited

Loading

bmwill May 3, 2022

oxade May 3, 2022 •

edited

Loading

gdanezis left a comment

gdanezis May 3, 2022

oxade May 3, 2022

oxade May 3, 2022

gdanezis May 3, 2022

gdanezis May 3, 2022

gdanezis May 3, 2022

gdanezis May 3, 2022

oxade May 3, 2022

gdanezis May 3, 2022

oxade May 3, 2022

gdanezis May 3, 2022

oxade May 3, 2022

gdanezis May 3, 2022

oxade May 3, 2022

longbowlu May 3, 2022

gdanezis May 3, 2022

oxade May 3, 2022

gdanezis May 3, 2022

oxade May 3, 2022

longbowlu May 3, 2022 •

edited

Loading

longbowlu May 3, 2022

longbowlu May 3, 2022

longbowlu May 3, 2022

oxade commented May 3, 2022

longbowlu May 3, 2022

longbowlu May 3, 2022

longbowlu May 3, 2022

longbowlu May 3, 2022

longbowlu May 3, 2022

longbowlu May 3, 2022

longbowlu commented May 3, 2022

		@@ -380,11 +383,15 @@ impl SuiNetwork {

		pub async fn genesis(

		// Confirmation step
		let (conf_chann_tx, mut conf_chann_rx) = MpscChannel(net_clients.len() * 2);

Distributed bench #1661

Distributed bench #1661

Conversation

oxade commented Apr 28, 2022 • edited Loading

Choose a reason for hiding this comment

oxade May 3, 2022 • edited Loading

Choose a reason for hiding this comment

gdanezis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

longbowlu May 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxade commented May 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

longbowlu commented May 3, 2022

oxade commented Apr 28, 2022 •

edited

Loading

oxade May 3, 2022 •

edited

Loading

longbowlu May 3, 2022 •

edited

Loading