-
Notifications
You must be signed in to change notification settings - Fork 103
multi: introduce status server and sub-server manager #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5d1ca50
to
940e413
Compare
940e413
to
ab91ede
Compare
ab91ede
to
809bdf9
Compare
@guggero - as discussed offline, I have requested your review here but am mainly looking for approach comments to begin with. And also, no rush on this as I will be afk for the next 2 weeks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work 💯 This is an important step towards making it possible to unlock lnd
from the UI and also to make everything more user friendly.
Will need to do a second pass with a focus on security but looks pretty good for its WIP status!
config.go
Outdated
@@ -150,6 +150,9 @@ type Config struct { | |||
LetsEncryptDir string `long:"letsencryptdir" description:"The directory where the Let's Encrypt library will store its key and certificate."` | |||
LetsEncryptListen string `long:"letsencryptlisten" description:"The IP:port on which LiT will listen for Let's Encrypt challenges. Let's Encrypt will always try to contact on port 80. Often non-root processes are not allowed to bind to ports lower than 1024. This configuration option allows a different port to be used, but must be used in combination with port forwarding from port 80. This configuration can also be used to specify another IP address to listen on, for example an IPv6 address."` | |||
|
|||
TLSCertPath string `long:"tlscertpath" description:"Path to write the self signed TLS certificate for LiT's RPC and REST proxy service (if Let's Encrypt is not used)."` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explicitly mention this only applies to the HTTP(S)Listen
port (default 8443)? Just to distinguish it more from the 10009 lnd RPC port.
@@ -280,9 +280,9 @@ func (c *Config) lndConnectParams() (string, lndclient.Network, string, | |||
func defaultConfig() *Config { | |||
return &Config{ | |||
HTTPSListen: defaultHTTPSListen, | |||
TLSCertPath: DefaultTLSCertPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also update doc/config-lnd-integrated.md
and perhaps explain the difference (and reasoning) between the lnd 10009
(gRPC only) and the litd 8443
(grpc-web, REST, gRPC proxy, UI) a bit more explicitly?
Also this change should be made very explicit in the release notes (not sure where to put the reminder for us).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good call here, as it would also be a breaking change (CLI/config wise).
rpc_proxy.go
Outdated
// isHandling checks if the specified request is something to be handled by lnd | ||
// or any of the attached sub daemons. If true is returned, the call was handled | ||
// by the RPC proxy and the caller MUST NOT handle it again. If false is | ||
// returned, the request was not handled and the caller MUST handle it. | ||
func (p *rpcProxy) isHandling(resp http.ResponseWriter, | ||
req *http.Request) bool { | ||
|
||
if !p.hasStarted() { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if returning false
here is a good idea, as the request will just be passed to the next handler. Instead I think we should check whether it's a request we would handle (the same conditions as we check below) and return an error page (perhaps 503 service unavailable) if we did and only false
if we didn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point!
terminal.go
Outdated
@@ -1437,6 +1450,20 @@ func (g *LightningTerminal) showStartupInfo() error { | |||
return nil | |||
} | |||
|
|||
// connectLND sets up LiT's LND connection. | |||
func (g *LightningTerminal) connectLND(bufListener *bufconn.Listener) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could turn this into a function by taking the cfg
as a parameter and returning the conn. Then we could place this function and the dialBackend()
and dialBufConnBackend()
into a different file to reduce the size of this one a bit. Maybe lnd_connection.go
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds gooood
@@ -168,21 +166,9 @@ type rpcProxy struct { | |||
} | |||
|
|||
// Start creates initial connection to lnd. | |||
func (p *rpcProxy) Start() error { | |||
func (p *rpcProxy) Start(lndConn *grpc.ClientConn) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// Start the main web server that dispatches requests either to the | ||
// static UI file server or the RPC proxy. This makes it possible to | ||
// unlock lnd through the UI. | ||
if err := g.startMainWebServer(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we start the main web server this early, we have to guard the g.restHandler
from being called if it's nil (. Otherwise a REST request coming in too early would panic. Or we move the call to createRESTProxy()
to before this call even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, no, creating the REST proxy before the main web server is started will probably not work since the REST proxy will attempt to connect to the main web server on creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes me wonder if we could extract the main web server into its own struct that owns the REST proxy and RPC proxy (and probably also the status server)? Then the start/stop would be a bit more abstracted as well. But not sure how large of a diff that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
Ok so looks like, for now, moving the createRESTProxy()
call to before the startMainWebServer
call does actually do the trick. Makes sense cause it has always been called in that order.
state_server.go
Outdated
}, | ||
} | ||
|
||
options := []grpcweb.Option{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to spin up our own grpc-web and gRPC server? Can't we instead have this as a sub component of the rpcProxy
that already does all of this for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realised why I did it this way initially - the reason is that if we fail to start LND, then we never get to the registration and serving of the status server. So I think we need to re-add this actually
state_server.go
Outdated
|
||
// gRPC web requests are easy to identify. Send them to the gRPC | ||
// web proxy. | ||
if s.grpcWebProxy.IsGrpcWebRequest(req) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, it feels like this should just be hooked up in the right order to allow the existing rpcProxy
to do this.
"requests, lnd possibly still starting or " + | ||
"syncing") | ||
} | ||
handled, err := g.subServerMgr.ValidateMacaroon( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to make sure we don't create any authentication by-passes by accident, for example when starting up the individual daemons. Maybe adding a few extra itests around the whole startup process would be a good idea.
809bdf9
to
fc97568
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the initial review @guggero!
I have addressed most of your comments.
I just still need to add the extra documentation& also add more itests around the startup process. Will re-request once I have done that.
rpc_proxy.go
Outdated
// isHandling checks if the specified request is something to be handled by lnd | ||
// or any of the attached sub daemons. If true is returned, the call was handled | ||
// by the RPC proxy and the caller MUST NOT handle it again. If false is | ||
// returned, the request was not handled and the caller MUST handle it. | ||
func (p *rpcProxy) isHandling(resp http.ResponseWriter, | ||
req *http.Request) bool { | ||
|
||
if !p.hasStarted() { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point!
terminal.go
Outdated
@@ -1437,6 +1450,20 @@ func (g *LightningTerminal) showStartupInfo() error { | |||
return nil | |||
} | |||
|
|||
// connectLND sets up LiT's LND connection. | |||
func (g *LightningTerminal) connectLND(bufListener *bufconn.Listener) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds gooood
terminal.go
Outdated
if err != nil { | ||
log.Errorf("Could not start subservers: %v", err) | ||
log.Errorf("Could not set up LND clients: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even for logs? thought it only mattered for error variables
// Start the main web server that dispatches requests either to the | ||
// static UI file server or the RPC proxy. This makes it possible to | ||
// unlock lnd through the UI. | ||
if err := g.startMainWebServer(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
Ok so looks like, for now, moving the createRESTProxy()
call to before the startMainWebServer
call does actually do the trick. Makes sense cause it has always been called in that order.
state_server.go
Outdated
}, | ||
} | ||
|
||
options := []grpcweb.Option{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good point
cb7f9ab
to
ec46889
Compare
@ellemouton, remember to re-request review from reviewers when ready |
!lightninglabs-deploy mute |
muting for the time being. Will pick up again one I have more room on my plate. Although would be cool to get back to this soon as I think it could make the addition of |
@@ -280,9 +280,9 @@ func (c *Config) lndConnectParams() (string, lndclient.Network, string, | |||
func defaultConfig() *Config { | |||
return &Config{ | |||
HTTPSListen: defaultHTTPSListen, | |||
TLSCertPath: DefaultTLSCertPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good call here, as it would also be a breaking change (CLI/config wise).
rpc_proxy.go
Outdated
@@ -165,6 +166,9 @@ type rpcProxy struct { | |||
|
|||
grpcServer *grpc.Server | |||
grpcWebProxy *grpcweb.WrappedGrpcServer | |||
|
|||
started bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use an atomic bool alternatively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still getting thru the diff, but could also use an event type to allow callers to get notifications for when things are fully up vs polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hasStarted
thing is more for the sake of external callers who are making calls to Litd before it has fully started up.
rpc_proxy.go
Outdated
log.Infof("Handling gRPC web request: %s", req.URL.Path) | ||
p.grpcWebProxy.ServeHTTP(resp, req) | ||
if !p.hasStarted() { | ||
resp.WriteHeader(http.StatusServiceUnavailable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this vs a gRPC status/error like we do on the lnd
side?
rpc_proxy.go
Outdated
p.grpcServer.ServeHTTP(resp, req) | ||
if !p.hasStarted() { | ||
resp.WriteHeader(http.StatusServiceUnavailable) | ||
_, _ = resp.Write(make([]byte, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment above may moreso apply here. Another route would be an http middleware chain (gRPC package) that checked this field and returned a response if things weren't fully active yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea! will give it a spin
rpc_proxy.go
Outdated
@@ -61,7 +56,7 @@ func (e *proxyErr) Unwrap() error { | |||
// component. | |||
func newRpcProxy(cfg *Config, validator macaroons.MacaroonValidator, | |||
superMacValidator session.SuperMacaroonValidator, | |||
permsMgr *perms.Manager, bufListener *bufconn.Listener) *rpcProxy { | |||
permsMgr *perms.Manager) *rpcProxy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arg passed but not used anywhere (in this commit at least).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this diff removes an arg
subserver_mgr.go
Outdated
s.cfg.onStartError(err) | ||
return | ||
} | ||
s.integratedStarted = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need a mutex to protect access?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return | ||
} | ||
|
||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be tracked by a wait group.
faradayConn *grpc.ClientConn | ||
loopConn *grpc.ClientConn | ||
poolConn *grpc.ClientConn | ||
lndConn *grpc.ClientConn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason lnd isn't also managed as a sub-server? I guess since it's sort of the "root" sub-server that all the rest depend on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess since it's sort of the "root" sub-server that all the rest depend on?
Yes exactly. It is handled as a special case (see next commit) since Litd cant really continue operation if the start or connection to LND fails
terminal.go
Outdated
lndGrpc *lndclient.GrpcLndServices, | ||
withMacaroonService bool) error { | ||
|
||
return poolServer.StartAsSubserver( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re that comment re converting many of the closures to be an interface, given this is just a pass thru call (same args and return value), a wrapper struct that embeds all the main daemon configs would take care of this nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to interfaces - turned out nicely I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these changes @ellemouton , they'll make my life easier for integrating the taro daemon 💯
rpc_proxy.go
Outdated
@@ -279,17 +298,27 @@ func (p *rpcProxy) isHandling(resp http.ResponseWriter, | |||
if p.grpcWebProxy.IsGrpcWebRequest(req) || | |||
p.grpcWebProxy.IsGrpcWebSocketRequest(req) { | |||
|
|||
log.Infof("Handling gRPC web request: %s", req.URL.Path) | |||
p.grpcWebProxy.ServeHTTP(resp, req) | |||
if !p.hasStarted() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function return false
if p.hasStarted()
is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous discussion on this: #442 (comment)
We dont want to return false here cause then Litd will pass it on to other services. We instead want to exit early
rpc_proxy.go
Outdated
|
||
handled, subServerName := p.subServerMgr.HandledBy(requestURI) | ||
|
||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can simplify this logic if it looked like:
// old checkSubSystemStarted now called
func (p *rpcProxy) hasRegisteredMatchingMethod(uri string) error {
// HandledBy checks for status server, lnd, lit and all the subservers
system, err := p.subServerMgr.HandledBy(uri)
if err != nil {
fmt.Errorf("no matching method ...", err)
}
// getSubServerState returns alwasy true for status + state for the other subsystems
started, startErr := p.statusServer.getSubServerState(system)
if !started {
return fmt.Errorf("%s is not running: %s", system, startErr)
}
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we do want to check lit's other parts (LND and Lit) if the sub-server manager does not handle the request URI. So not sure I follow?
subserver_mgr.go
Outdated
// onStartError will be called if there is a startup error (in | ||
// integrated mode) or if the remote connection could not be made in | ||
// remote mode. | ||
onStartError func(err error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan onStartError
, onStartSuccess
to do more things in the future beyond setting the status for this subserver? If not we can directly set the status using subServerCfg.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point - removed
subserver_mgr.go
Outdated
|
||
// remote must be set if the sub-server is running remotely and so must | ||
// not be started in integrated mode. | ||
remote bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here remote == (remoteCfg != nil)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no - remoteCfg is always set by default.
subserver_mgr.go
Outdated
|
||
// StartRemoteSubServers creates connections to all the manager's sub-servers | ||
// that are running remotely. | ||
func (s *subServerMgr) StartRemoteSubServers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The func name can be more accurate if we use something like ConnectRemoteSubServers
given that we are not starting anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! done!
In this commit, we remove Lit's dependency on LND's tls cert in integrated mode. With this commit, even if Lit is started in integrated mode, it will create its own tls cert to use for serving the webserver.
In this commit, a new LitService proto is added and the StopDaemon method is added to the service. This is added due to the fact that in an upcoming commit, we want Litd to continue running even if LND cannot be connected to and so this means that Lit will no longer shutdown if LNDs StopDaemon method is called.
Add a `started` variable to the rpcProxy that is used to indicate if the proxy is ready to handle requests. This is because currently the webserver is dependent on the rpcProxy to start and we want to be able to start the webserver without being dependent on the rpcProxy so that it can be used to handle status requests in a future commit. So with this commit, we can now saftely start the webserver earlier on and then if requests come through for the rpcProxy, an error will be displayed to the user.
Remove the responsibility of creating an LND connection from the rpcProxy and instead let the main LightningTerminal struct handle it. All the lnd-connection specific functions are also moved into their own file.
A pure refactor commit. It just splits various parts of the main Run function out into helper functions in order to more clearly show the steps taken to set up LiT.
Since all the relevant dependancies have been accouted for, we know move the starting of the main webserver to earlier on in the Run function. This is in preparation for later when the webserver will be "always-on" and will be used to server status information about the rest of the LiT subservers.
Add the protos for a new Status server that can be used to query the status of a number of LiT's subservers.
In this commit, the various IsURI methods of the permissions manager are removed and replaced with a single IsSubServerURI method that gets passed both the URI and the subserver that should be checked. This will greatly simplifiy implementing a subserver manager in an upcoming commit.
Define a SubServer interface and a subServer manager
ec46889
to
7edcc68
Compare
With this commit, we now also track the start-up state of LiT and LiT's connection to LND. These differ from the other sub-servers (loop, pool & faraday) because a failure to start LiT or LND is fatal and so should stop the rest of the start-process, however, we still want the webserver to continue serving the new Status server so that the UI can query the start-up status of LND and LiT. So: if any errors occur while starting/connecting to LND or any other errors occur while starting any of LiTs other processes, then we throw an error but we dont kill the main LiT process. The main process is only killed upon receiving a shutdown signal.
7edcc68
to
486fae3
Compare
I need to switch away from this for a bit now so just making a note of some issues that the current version still has:
|
This PR does a few things:
subServerMgr
that handles the connections to Loop, Pool & Faraday. This will also make it easy for us to add another sub server in future. This is also nice because it makes it easier in the next steps to treat these sub-servers ,& any errors we may get from them, differently to how we would treat errors from LND or LiTd.Status
server that is "always-on". This will allow the UI to query info about the status of any of LiTs sub-servers (including LND, Loop, Pool, Faraday and LiT itself).Status
server can always be reached.TODO:
Fixes #421
Fixes #420
Fixes #290