Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

validator: add set-public-tpu-address command #30452

Merged
merged 8 commits into from
Apr 12, 2023

Conversation

diman-io
Copy link
Contributor

@diman-io diman-io commented Feb 22, 2023

Problem

Only one way to change this now is to restart a validator.
Also it could prove useful for the watchdog in the event of a remote service failure.

Also I have a question about tpu-forward #30451

Summary of Changes

Add set-public-tpu-address subcommand to solana-validator.

@mergify mergify bot added community Community contribution need:merge-assist labels Feb 22, 2023
@mergify mergify bot requested a review from a team February 22, 2023 21:33
@joncinque
Copy link
Contributor

This overall seems reasonable to me, but to be honest, I'm not an expert in this area.

The one issue I see is the invalidation of the previous TLS certificate created at startup at

let (cert, priv_key) = new_self_signed_tls_certificate(identity_keypair, gossip_host)
-- @pgarg66 will this cause other issues later on? Does this new endpoint need to reissue the certificate as well?

@pgarg66
Copy link
Contributor

pgarg66 commented Mar 8, 2023

The QUIC connection cache currently fills in the IP address, but it is unused for connection lookup on the server side. That being said, it'll be best to update the client cert to make it future proof.

This new argument will be available for a non RPC (full) node as well. But looks like it will not actually set the TPU address. I find it a bit confusing.

@diman-io
Copy link
Contributor Author

diman-io commented Mar 8, 2023

This argument isn't new. There's a --tpu-host-addr option. I only add the subcommand to change this value on the fly.

This TPU address is only used to broadcast it through gossip. Other nodes use it to send packets to, kind of like a balancer before the node. A listen service on this address has to forward those packets to the node, possibly after performing some work on them. I didn't think it needs a cert for this purpose.

@pgarg66
Copy link
Contributor

pgarg66 commented Mar 8, 2023

This argument isn't new. There's a --tpu-host-addr option. I only add the subcommand to change this value on the fly.

This TPU address is only used to broadcast it through gossip. Other nodes use it to send packets to, kind of like a balancer before the node. A listen service on this address has to forward those packets to the node, possibly after performing some work on them. I didn't think it needs a cert for this purpose.

The node binds to the address provided via --tpu-host-addr option. This new option, as you also mentioned, is only used by the gossip broadcast to update the TPU address. The name of the option is kind of misleading. For example, a node operator (with no balancer) can set this value assuming the TPU will bind to it.

I agree that the QUIC cert doesn't need to be updated for the balancer case. But, the option name, description and details must be updated to prevent someone from accidentally misconfiguring their node.

@joncinque
Copy link
Contributor

joncinque commented Mar 8, 2023

The node binds to the address provided via --tpu-host-addr option. This new option, as you also mentioned, is only used by the gossip broadcast to update the TPU address. The name of the option is kind of misleading. For example, a node operator (with no balancer) can set this value assuming the TPU will bind to it.

I was surprised to read this, but the node doesn't actually bind to the --tpu-host-addr option, but rather only uses it in gossip, which is why this change should be OK.

Here's where the socket is bound to

let (tpu_port, tpu_sockets) =
multi_bind_in_range(bind_ip_addr, port_range, 32).expect("tpu multi_bind");
and where it's overridden only for gossip
let _ = info.set_tpu(overwrite_tpu_addr.unwrap_or_else(|| SocketAddr::new(addr, tpu_port)));

Either way, I agree that we should be consistent, and probably change the command to set-tpu-host-addr, and add some language similar to tpu-host-addr, like Specify TPU address to advertise in gossip. What do you think?

@pgarg66
Copy link
Contributor

pgarg66 commented Mar 8, 2023

I was surprised to read this, but the node doesn't actually bind to the --tpu-host-addr option, but rather only uses it in gossip, which is why this change should be OK.

Thanks @joncinque for looking into it, looks like I misread the code regarding this.

@pgarg66
Copy link
Contributor

pgarg66 commented Mar 8, 2023

@diman-io, @joncinque and I had a quick sync up on this. So, the issue is that the IP address in the QUIC server cert uses the TPU IP address from the gossip record. Since this option changes the IP in gossip, the value in the cert will not match the IP to with the TPU is actually binding.

Currently the IP field in the cert is unused. But it could change in future. So for consistency sake, we should fill the correct IP in the cert.

I'll file an issue and we can handle it in a separate PR. This PR should be good to go, but let's rename the CLI argument and description with more details. An uninformed node operator could use the option incorrectly and misconfigure their node (for example, when there is no balancer).

@diman-io diman-io changed the title validator: add set-tpu command validator: add set-tpu-host-addr command Mar 9, 2023
@codecov
Copy link

codecov bot commented Mar 9, 2023

Codecov Report

Merging #30452 (70f15fb) into master (557e4c4) will decrease coverage by 0.1%.
The diff coverage is 0.0%.

@@            Coverage Diff            @@
##           master   #30452     +/-   ##
=========================================
- Coverage    81.5%    81.5%   -0.1%     
=========================================
  Files         729      729             
  Lines      206185   206212     +27     
=========================================
- Hits       168235   168188     -47     
- Misses      37950    38024     +74     

Copy link
Contributor

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking close, just a few bits to clean up

@behzadnouri
Copy link
Contributor

I am confused what this actually do.
You change the tpu address to some other address in contact-info.
Isn't there any thread listening on that address that should be updated to listen to the new address?

@diman-io diman-io changed the title validator: add set-tpu-host-addr command validator: add set-tpu-host-address command Mar 11, 2023
@diman-io
Copy link
Contributor Author

I am confused what this actually do.
You change the tpu address to some other address in contact-info.
Isn't there any thread listening on that address that should be updated to listen to the new address?

The validator always listens on its "default" TPU port. However, by using the --set-tpu-addr option, it's possible to specify a different TPU address (even on a different host) when starting the validator. This alternate address will be broadcast to the gossip. The service is expected to forward these packets to the validator after performing certain tasks, such as deduplication for example. The Jito client also works by receiving this traffic instead of the validator.

This pull request adds the capability to change the TPU address while the validator is running, without needing to restart it.

@behzadnouri
Copy link
Contributor

The validator always listens on its "default" TPU port. However, by using the --set-tpu-addr option, it's possible to specify a different TPU address (even on a different host) when starting the validator. This alternate address will be broadcast to the gossip. The service is expected to forward these packets to the validator after performing certain tasks, such as deduplication for example. The Jito client also works by receiving this traffic instead of the validator.

Ok, then this is going to be pretty confusing. Can you put above as comments where needed in the code.
And also change to --set-public-tpu-addr and other places from set_tpu_... to set_public_tpu_..., etc

@diman-io
Copy link
Contributor Author

Ok, then this is going to be pretty confusing. Can you put above as comments where needed in the code. And also change to --set-public-tpu-addr and other places from set_tpu_... to set_public_tpu_..., etc

I looked and found that the validator has the following address-related arguments:

--bind-address <HOST>
--public-rpc-address HOST:PORT
--rpc-bind-address <HOST>
--rpc-faucet-address HOST:PORT
--tpu-host-addr HOST:PORT

Early in this PR I already changed tpu-host-addr to tpu-host-address. Now I changed it to public-tpu-address and removed host from the name. Similarly, I changed the new command to set-public-tpu-address and all the new code. I also renamed overwrite_tpu_addr to public_tpu_addr in a pair places. You were right, the code is now more understandable.

As for the explanations, I think the ones in the validator's help are informative enough:

--public-tpu-address <HOST:PORT>
            Specify TPU address to advertise in gossip [default: ask --entrypoint or localhostwhen --entrypoint is not
            provided]

and

set-public-tpu-address     Specify TPU address to advertise in gossip

@behzadnouri Could you please review?

@diman-io diman-io changed the title validator: add set-tpu-host-address command validator: add set-public-tpu-address command Apr 10, 2023
@diman-io diman-io requested a review from behzadnouri April 11, 2023 00:03
@behzadnouri behzadnouri merged commit 9136f80 into solana-labs:master Apr 12, 2023
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Apr 14, 2023
diman-io added a commit to diman-io/solana that referenced this pull request May 3, 2023
@diman-io diman-io mentioned this pull request May 3, 2023
@diman-io diman-io deleted the diman/set-tpu branch May 13, 2023 21:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
community Community contribution need:merge-assist
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants