Skip to content

Commit

Permalink
[docs] Update docs and add narwhal dashboard for orchestrator (#13327)
Browse files Browse the repository at this point in the history
## Description 

Updating the orchestrator documentation and adding a Narwhal dashboard
to be used when running the benchmarks.

## Test Plan 

How did you test the new or updated feature?

---
If your changes are not user-facing and not a breaking change, you can
skip the following section. Otherwise, please indicate what changed, and
then add to the Release Notes section as highlighted during the release
process.

### Type of Change (Check all that apply)

- [ ] protocol change
- [ ] user-visible impact
- [ ] breaking change for a client SDKs
- [ ] breaking change for FNs (FN binary must upgrade)
- [ ] breaking change for validators or node operators (must upgrade
binaries)
- [ ] breaking change for on-chain data layout
- [ ] necessitate either a data wipe or data migration

### Release notes
  • Loading branch information
akichidis authored Sep 12, 2023
1 parent 8001f2e commit fc0f37e
Show file tree
Hide file tree
Showing 3 changed files with 6,078 additions and 2 deletions.
44 changes: 42 additions & 2 deletions crates/sui-aws-orchestrator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Create a file called `settings.json` that contains all the configuration paramet
"specs": "m5d.8xlarge",
"repository": {
"url": "https://github.com/MystenLabs/sui.git",
"commit": "orchestrator"
"commit": "main"
},
"results_directory": "./results",
"logs_directory": "./logs"
Expand Down Expand Up @@ -78,6 +78,9 @@ cargo run --bin sui-aws-orchestrator testbed status

Instances listed with a green number are available and ready for use, while instances listed with a red number are stopped.

Also keep in mind that there is nothing stopping you from running the `deploy` command multiple times if you find your self
needing more instances down the line.

## Step 4. Running benchmarks

Running benchmarks involves installing the specified version of the codebase on the remote machines and running one validator and one load generator per instance. For example, the following command benchmarks a committee of 10 validators under a constant load of 200 tx/s for 3 minutes:
Expand All @@ -90,4 +93,41 @@ In a network of 10 validators, each with a corresponding load generator, each lo

## Step 5. Monitoring

The orchestrator provides facilities to monitor metrics on clients and nodes. The orchestrator deploys a [Prometheus](https://prometheus.io) instance and a [Grafana](https://grafana.com) instance on a dedicated remote machine. Grafana is then available on the address printed on stdout (e.g., `http://3.83.97.12:3000`) with the default username and password both set to `admin`. You can either create a [new dashboard](https://grafana.com/docs/grafana/latest/getting-started/build-first-dashboard/) or [import](https://grafana.com/docs/grafana/latest/dashboards/manage-dashboards/#import-a-dashboard) the example dashboard located in the `./assets` folder.
The orchestrator provides facilities to monitor metrics on clients and nodes. The orchestrator deploys a [Prometheus](https://prometheus.io) instance and a [Grafana](https://grafana.com) instance on a dedicated remote machine. Grafana is then available on the address printed on stdout (e.g., `http://3.83.97.12:3000`) with the default username and password both set to `admin`. You can either create a [new dashboard](https://grafana.com/docs/grafana/latest/getting-started/build-first-dashboard/) or [import](https://grafana.com/docs/grafana/latest/dashboards/manage-dashboards/#import-a-dashboard) the example dashboards located in the `./assets` folder.

## Destroy a testbed
After you have found yourself that you don't need the deployed testbed anymore you can simply run

```
cargo run --bin sui-aws-orchestrator -- testbed destroy
```

that will terminate all the deployed EC2 instances. Keep in mind that AWS is not immediately deleting the terminated instances - this could take a few hours - so in case you want to immediately deploy a new testbed it would be advised
to use a different `testbed_id` in the `settings.json` to avoid any later conflicts (see the FAQ section for more information).

## FAQ

### I am getting an error "Failed to read settings file '"crates/sui-aws-orchestrator/assets/settings.json"': No such file or directory"
To run the tool a `settings.json` file with the deployment configuration should be under the directory `crates/sui-aws-orchestrator/assets`. Also, please make sure
that you run the orchestrator from the top level repo folder, ex `/sui $ cargo run --bin sui-aws-orchestrator`

### I am getting an error "IncorrectInstanceState" with message "The instance 'i-xxxxxxx' is not in a state from which it can be started."" when I try to run a benchmark
When a testbed is deployed the EC2 instances are tagged with the `testbed_id` as dictated in the `settings.json` file. When trying to run a benchmark the tool will try to list
all the EC2 instances on the dictated by the configuration regions. To successfully run the benchmark all the listed instances should be in status
`Running`. If there is any instance in different state , ex `Terminated` , then the above error will arise. Please pay attention that if you `destroy` a deployment
and then immediately `deploy` a new one under the same `testbed_id`, then it is possible to have a mix of instances with status `Running` and `Terminated`, as AWS does not immediately
delete the `Terminated` instances. That can eventually cause the above false positive error as well. It is advised in this case to use a different `testbed_id` to ensure that
there is no overlap between instances.

### I am getting an error "Not enough instances: missing X instances" when running a benchmark
In the common case to successfully run a benchmark we need to have enough instances available to run
* the required validators
* the grafana dashboard
* the benchmarking clients

for example when running the command `cargo run --bin sui-aws-orchestrator -- benchmark --committee 4 fixed-load --loads 500 --duration 500`, we'll need the following amount of instances available:
* `4 instances` to run the validators (since we set `--committee 4`)
* `1 instance` to run the grafana dashboard (by default only 1 is needed)
* no additional instances to run the benchmarking clients, as those will be co-deployed on the validator nodes

so in total we must have deployed a testbed of at least `5 instances`. If we attempt to run with fewer, then the above error will be thrown.
Loading

1 comment on commit fc0f37e

@vercel
Copy link

@vercel vercel bot commented on fc0f37e Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.