The ./net/
directory in the monorepo contains scripts useful for creation and manipulation of a test network.
The test network allows you to run a fully isolated set of validators and clients on a configurable hardware setup.
It's intended to be both dev and CD friendly.
The test networks to be created can run in GCP, AWS or colo. Whichever cloud provider you choose, you will need the credentials set up on your machine.
You will need a working gcloud
command from google SDK,
if you do not have it follow the guide in https://cloud.google.com/sdk?hl=en
Before running any scripts, authenticate with
$ gcloud auth login
If you are running the scripts on a headless machine, you can use curl to issue requests to confirm your auth.
If you are doing it the first time, you might need to set up project
gcloud config set project principal-lane-200702
Obtain your credentials from the AWS IAM Console and configure the AWS CLI with
$ aws configure
More information on AWS CLI configuration can be found here
Metrics collection relies on 2 environment variables that are patched to the remote nodes by net.sh:
RUST_LOG
to enable metrics reporting in principleSOLANA_METRICS_CONFIG
to tell agave where to log the metrics
Note
Anza employees should follow the guide in notion to set up the influxDB account.
- Ensure that
${host}
is the host name of the InfluxDB you can access, for examplehttps://internal-metrics.solana.com:8086
- Ensure that
${user}
is the name of an InfluxDB user account with enough rights to create a new InfluxDB database, for examplesolana
.
You will normally only need to do this once. Once this is done, you will be able to save the metrics configuration and load it later from the environment.
- Go to ./net/ in agave repo
- Run
./init-metrics.sh -c testnet-dev-${user} ${user}
- Script will ask for a password, it is the same one you’ve created when making a user in the InfluxDB UI
- Put the username you have used in preparation, not your login user name
- If you need to set influxDb host, edit the script
- The script will configure the database (recreating one if necessary) and append a config line in the very end of
net/config/config
file like the following:export SOLANA_METRICS_CONFIG="host=${host},db=testnet-dev-${user},u=${user},p=some_secret"
- You can store that line somewhere and append it to the config file when you need to reuse the database.
- You can also store it into your shell’s environment so you can run
./init-metrics.sh -e
to quickly load it - Alternatively, you'll need to run
./init-metrics.sh
with appropriate arguments every time you set up a new cluster
- Assuming no errors, your influxDB setup is now done.
- For simple cases, storing
SOLANA_METRICS_CONFIG
in your env is appropriate, but you may want to use different databases for different runs of net.sh- You can call ./init-metrics.sh before you call net.sh start, this will change the metrics config for a particular run.
- You can manually write
SOLANA_METRICS_CONFIG
in the./net/config/config
file
- By default, metrics are only logged by agave if
RUST_LOG
is set toinfo
or higher. You can provide it as environment for./net.sh start
command, or set this in your shell environment.RUST_LOG="info,solana_runtime=debug"
Note: this only works if you store SOLANA_METRICS_CONFIG
in your shell environment
cd ./scripts/
source ./configure-metrics.sh
INFLUX_HOST=https://internal-metrics.solana.com:8086
INFLUX_DATABASE=testnet-dev-solana
INFLUX_USERNAME=solana
INFLUX_PASSWORD=********
./metrics-write-datapoint.sh "testnet-deploy net-create-begin=1"
- All commands should complete with no errors, this indicates your influxDB config is usable
- Ensure that
RUST_LOG
is set toinfo
ordebug
NOTE: This example uses GCE. If you are using AWS EC2, replace ./gce.sh
with
./ec2.sh
in the commands.
# In Agave repo
cd net/
# Create a GCE testnet with 4 additional validator nodes (beyond the bootstrap node) and 1 client (billing starts here)
./gce.sh create -n 4 -c 1
# Configure the metrics database and validate credentials using environment variable `SOLANA_METRICS_CONFIG` (skip this if you are not using metrics)
./init-metrics.sh -c testnet-dev-${USER} ${USER}
# Deploy the network from the local workspace and start processes on all nodes including bench-tps on the client node
RUST_LOG=info ./net.sh start
# Show a help to ssh into any testnet node to access logs/etc
./ssh.sh
# Stop running processes on all nodes
./net.sh stop
# Dispose of the network (billing stops here)
./gce.sh delete
- If you expect metrics to work, make sure you have configured them before proceeding
- Go to
./net/
directory in agave repo ./gce.sh
command controls creation and destruction of the nodes in the test net. It does not actually run any software../gce.sh create \-n 4 \-c 2
creates cluster with 4 validators and 1 node for load generation, this is minimal viable setup for all solana features to work- If the creation succeeds,
net/config/config
will contain the config file of the testnet just created - If you do not have
SOLANA_METRICS_CONFIG
set in your shell env,gce.sh
may complain about metrics not being configured, this is perfectly fine
- If the creation succeeds,
./gce.sh info
lists active test cluster nodes, this allows you to get their IP addresses for SSH access and/or debugging./gce.sh delete
destroys the nodes (save the electricity and $$$ - destroy your test nets the moment you no longer need them).- On GCE, if you do not delete nodes, they will self-destruct in 8 hours anyway, you can configure self-destruct timer by supplying
--self-destruct-hours=N
argument togce.sh
- On other cloud platforms the testnet will not self-destruct!
- To enable metrics in the testnet, at this point you need to either:
./init-metrics.sh -c testnet-dev-${user} ${user}
to create a new metrics database from scratch- Manually set
SOLANA_METRICS_CONFIG
in./net/config/config
(which is exactly whatinit-metrics.sh
does for you) ./init-metrics.sh -e
to load metrics config fromSOLANA_METRICS_CONFIG
into the testnet config file or
./net.sh
controls the payload on the testnet nodes, i.e. bootstrapping, the validators and bench-tps. In principle, you can run everything by hand, but./net.sh
makes it easier../net.sh start
to actually run the test network.- This will actually upload your current sources to the bootstrap host, build them there and upload the result to all the nodes
- The script will take 5-10 of minutes to run, in the end it should print something like
--- Deployment Successful Bootstrap validator deployment took 164 seconds `Additional validator deployment (5 validators, 0 blockstreamer nodes) took 120 seconds Client deployment (1 instances) took 11 seconds Network start logs in /home/sol/agave/net/log
- You can also make sure it logs successful test transfers:
✅ 1 lamport(s) transferred: seq=0 time= 402ms signature=33uJtPJM6ekBGrWCgWHKw1TTQJVrLxYMe3sp2PUmSRVb21LyXn3nDbQmzsgQyihE7VP2zD2iR66Du8aDUnSSd6pb
./net.sh start bench-tps=2="--tx_count 2500"
will start 2 clients with bench-tps workload sending 2500 transactions per batch.- --tx_count argument is passed to the bench-tps program, see its manual for more options
./net.sh sanity
to test the deployment, it is also run by start command./net.sh stop
to stop the validators and client. This does not kill the machines, so you can study the logs etc../net.sh start --nobuild
will skip the source compilation, you will generally want that if you are only changing configuration files rather than code, or just want to re-run the last test.
- To connect to the nodes:
./gce.sh info
to get the public IPs./ssh.sh <IP>
to get a shell on the nodesudo su
will give you root access on the nodes- Nodes run latest ubuntu LTS image
- You can also interact with the nodes using solana cli:
# source ip list use as ${validatorIpList[4]}
source net/config/config
# airdrop
../target/release/solana -u http://${validatorIpList[1]}:8899 airdrop 1
# check feature
../target/release/solana -u http://${validatorIpList[1]}:8899 feature status
# activate a feature
../target/release/solana -u http://${validatorIpList[1]}:8899 feature activate <path to .json>
# check the stakes on current validators
../target/release/solana --url http://${validatorIpList[0]}:8899 validators
You will want to have a script like this pretty much immediately to avoid making mistakes in the init process:
# Create the testnet with reasonable node sizes for a small test
./gce.sh create -n4 -c2 --custom-machine-type "--machine-type n1-standard-16" --client-machine-type "--machine-type n1-standard-4"
# Patch metrics config from env into config file
./init-metrics.sh -e
# Enable metrics and start the network (this will also build software)
RUST_LOG=info ./net.sh start -c bench-tps=2="--tx_count 25000"
Note that net.sh and gce.sh info
commands do not actually check if all the nodes are still alive in gcloud,
they just assume the config file information is correct. So if your nodes got killed/timed out they will lie to you. In such case, just use gce.sh delete
to reset.
By default private IP addresses are used with all instances in the same availability zone to avoid GCE network egress charges. However to run the network over public IP addresses:
$ ./gce.sh create -P ...
or
$ ./ec2.sh create -P ...
To deploy the latest pre-built edge
channel tarball (ie, latest from the master
branch), once the testnet has been created run:
$ ./net.sh start -t edge
Note
CUDA is currently not available on GCE First ensure the network instances are created with GPU enabled:
$ ./gce.sh create -g ...
or
$ ./ec2.sh create -g ...
If deploying a tarball-based network nothing further is required, as GPU presence is detected at runtime and the CUDA build is auto selected.
To induce the partition net.sh netem --config-file <config file path>
To remove partition net.sh netem --config-file <config file path> --netem-cmd cleanup
The partitioning is also removed if you do net.sh stop
or restart
.
An example config that produces 3 almost equal partitions:
{
"partitions":[
34,
33,
33
],
"interconnects":[
{
"a":0,
"b":1,
"config":"loss 15% delay 25ms"
},
{
"a":1,
"b":0,
"config":"loss 15% delay 25ms"
},
{
"a":0,
"b":2,
"config":"loss 10% delay 15ms"
},
{
"a":2,
"b":0,
"config":"loss 10% delay 15ms"
},
{
"a":2,
"b":1,
"config":"loss 5% delay 5ms"
},
{
"a":1,
"b":2,
"config":"loss 5% delay 5ms"
}
]
}