Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xud doesn't try to connect to lnd if it's syncing #1039

Closed
kilrau opened this issue Jun 17, 2019 · 9 comments · Fixed by #1045
Closed

xud doesn't try to connect to lnd if it's syncing #1039

kilrau opened this issue Jun 17, 2019 · 9 comments · Fixed by #1045
Assignees
Labels
bug Something isn't working P1 top priority

Comments

@kilrau
Copy link
Contributor

kilrau commented Jun 17, 2019

Urgent off shoot from #684 since this makes things quite complicated for https://github.com/ExchangeUnion/xud-docker.

Issue

If LND is not synced:

Attaching to testnet_lndltc_1
lndltc_1     | 2019-06-17 13:36:33.097 [WRN] LTND: open /root/.lnd/lnd.conf: no such file or directory
lndltc_1     | 2019-06-17 13:36:33.097 [INF] LTND: Version: 0.6.1-beta commit=v0.6.1-beta, build=production, logging=default
lndltc_1     | 2019-06-17 13:36:33.097 [INF] LTND: Active chain: Litecoin (network=testnet)
lndltc_1     | 2019-06-17 13:36:33.098 [INF] CHDB: Checking for schema update: latest_version=8, db_version=8
lndltc_1     | 2019-06-17 13:36:33.108 [INF] RPCS: password gRPC proxy started at 0.0.0.0:8080
lndltc_1     | 2019-06-17 13:36:33.109 [INF] RPCS: password RPC server listening on 0.0.0.0:10009
lndltc_1     | 2019-06-17 13:36:33.109 [INF] LTND: Waiting for wallet encryption password. Use `lncli create` to create a wallet, `lncli unlock` to unlock an existing wallet, or `lncli changepassword` to change the password of an existing wallet and unlock it.
lndltc_1     | spawn lncli -n testnet -c litecoin create
lndltc_1     | Input wallet password: 
lndltc_1     | Confirm wallet password: 
lndltc_1     | 
lndltc_1     | Do you have an existing cipher seed mnemonic you want to use? (Enter y/n): n
lndltc_1     | 
lndltc_1     | Your cipher seed can optionally be encrypted.
lndltc_1     | Input your passphrase if you wish to encrypt it (or press enter to proceed without a cipher seed passphrase): 
lndltc_1     | 
lndltc_1     | Generating fresh cipher seed...
lndltc_1     | 
lndltc_1     | [lncli] unable to generate seed: rpc error: code = Unknown desc = wallet already exists
lndltc_1     | spawn lncli -n testnet -c litecoin unlock
lndltc_1     | Input wallet password: 12345678
lndltc_1     | 
lndltc_1     | 2019-06-17 13:36:37.325 [INF] LNWL: Opened wallet
lndltc_1     | 
lndltc_1     | lnd successfully unlocked!
lndltc_1     | lnd --rpclisten=0.0.0.0:10009 --listen=0.0.0.0:9735 --restlisten=0.0.0.0:8080 $@
lndltc_1     | 2019-06-17 13:36:37.386 [INF] LTND: Primary chain is set to: litecoin
lndltc_1     | 2019-06-17 13:36:37.391 [INF] LTND: Initializing litecoind backed fee estimator
lndltc_1     | 2019-06-17 13:36:37.391 [INF] LNWL: Started listening for bitcoind block notifications via ZMQ on tcp://litecoind:29332
lndltc_1     | 2019-06-17 13:36:37.391 [INF] LNWL: Started listening for bitcoind transaction notifications via ZMQ on tcp://litecoind:29333
lndltc_1     | 2019-06-17 13:36:37.392 [DBG] LNWL: Using minimum fee rate of 253 sat/kw
lndltc_1     | 2019-06-17 13:36:38.518 [INF] LNWL: The wallet has been unlocked without a time limit
lndltc_1     | 2019-06-17 13:36:38.623 [INF] LTND: LightningWallet opened
lndltc_1     | 2019-06-17 13:36:38.630 [WRN] LNWL: Unable to synchronize wallet to chain: did not find a suitable birthday block with a timestamp greater than 2019-06-07 18:15:05 +0000 UTC
lndltc_1     | 2019-06-17 13:36:38.634 [INF] HSWC: Restoring in-memory circuit state from disk
lndltc_1     | 2019-06-17 13:36:38.641 [INF] HSWC: Payment circuits loaded: num_pending=0, num_open=0
lndltc_1     | 2019-06-17 13:36:38.649 [INF] LTND: Channel backup proxy channel notifier starting
lndltc_1     | 2019-06-17 13:36:38.650 [INF] ATPL: Instantiating autopilot with cfg: (*lnd.autoPilotConfig)(0xc00003b2c0)({
lndltc_1     |  Active: (bool) false,
lndltc_1     |  Heuristic: (map[string]float64) (len=1) {
lndltc_1     |   (string) (len=12) "preferential": (float64) 1
lndltc_1     |  },
lndltc_1     |  MaxChannels: (int) 5,
lndltc_1     |  Allocation: (float64) 0.6,
lndltc_1     |  MinChannelSize: (int64) 20000,
lndltc_1     |  MaxChannelSize: (int64) 16777215,
lndltc_1     |  Private: (bool) false,
lndltc_1     |  MinConfs: (int32) 0
lndltc_1     | })
lndltc_1     | 
lndltc_1     | 2019-06-17 13:36:38.650 [DBG] LTND: Populating dependencies for sub RPC server: SignRPC
lndltc_1     | 2019-06-17 13:36:38.650 [DBG] LTND: Populating dependencies for sub RPC server: WalletKitRPC
lndltc_1     | 2019-06-17 13:36:38.650 [DBG] LTND: Populating dependencies for sub RPC server: AutopilotRPC
lndltc_1     | 2019-06-17 13:36:38.650 [DBG] LTND: Populating dependencies for sub RPC server: ChainRPC
lndltc_1     | 2019-06-17 13:36:38.650 [DBG] LTND: Populating dependencies for sub RPC server: InvoicesRPC
lndltc_1     | 2019-06-17 13:36:38.650 [DBG] LTND: Populating dependencies for sub RPC server: RouterRPC
lndltc_1     | 2019-06-17 13:36:38.650 [INF] RPCS: RPC server listening on 0.0.0.0:10009
lndltc_1     | 2019-06-17 13:36:38.650 [INF] RPCS: gRPC proxy started at 0.0.0.0:8080
lndltc_1     | 2019-06-17 13:36:38.651 [INF] LTND: Waiting for chain backend to finish sync, start_height=0

xud does not connect to lnd and also does not try to reconnect:

      "LTC",
      {
        "error": "lnd is not available",
        "chainsList": [],
        "blockheight": 0,
        "urisList": [],
        "version": "",
        "alias": ""
      }

Restarting xud while lnd is syncing ("synced_to_chain": false)does not make it connect.
Restarting xud after after lnd is synced ("synced_to_chain": true) makes it connect correctly.

Tested with BTC & LTC on testnet.

How it should be

xud connects to lnd, recognizes that it's "synced_to_chain": false and prints waiting for sync in xucli getinfo. Regularly pulls "synced_to_chain" value (every 10s) until true and then enables lnd. No need for reading block height or any other sync status for now.

@kilrau kilrau added bug Something isn't working P1 top priority labels Jun 17, 2019
@kilrau
Copy link
Contributor Author

kilrau commented Jun 17, 2019

Here another log showing both lnd's up, but xud not connecting to them after fully synced (fixed by restarting xud):

> lndbtc-lncli getinfo
{
	"version": "0.6.1-beta commit=v0.6.1-beta",
	"identity_pubkey": "03aa43ec9de129585386f6b06bb2a0843647a8d7db352c58ed0ad5de7d208761d4",
	"alias": "lndbtc",
	"num_pending_channels": 0,
	"num_active_channels": 0,
	"num_inactive_channels": 0,
	"num_peers": 0,
	"block_height": 1544126,
	"block_hash": "000000000000018189fbf24e84b26b93723174b3af67bf37989d61ce48623a87",
	"best_header_timestamp": 1560783986,
	"synced_to_chain": false,
	"testnet": true,
	"chains": [
		{
			"chain": "bitcoin",
			"network": "testnet"
		}
	],
	"uris": null
}
> lndltc-lncli getinfo
{
	"version": "0.6.1-beta commit=v0.6.1-beta",
	"identity_pubkey": "030b6d7156e26660bc2cdeefdc6ab238ca5a337bc1c3553d89d052ceda2e55b81b",
	"alias": "lndltc",
	"num_pending_channels": 0,
	"num_active_channels": 0,
	"num_inactive_channels": 0,
	"num_peers": 0,
	"block_height": 71559,
	"block_hash": "b580c77953eddaf2fae5116418f1327c8d5a49b3886310aa385d10f4622557af",
	"best_header_timestamp": 1528900917,
	"synced_to_chain": false,
	"testnet": true,
	"chains": [
		{
			"chain": "litecoin",
			"network": "testnet"
		}
	],
	"uris": null
}
> getinfo
{
  "version": "1.0.0-alpha.11",
  "nodePubKey": "030c74ebef2068dfa48c8cae1a33b670528cf10d636d11f8f9ee61e89f02aba93e",
  "urisList": [
    "030c74ebef2068dfa48c8cae1a33b670528cf10d636d11f8f9ee61e89f02aba93e@0.0.0.0:8885"
  ],
  "numPeers": 0,
  "numPairs": 1,
  "orders": {
    "peer": 0,
    "own": 0
  },
  "lndMap": [
    [
      "BTC",
      {
        "error": "lnd is not available",
        "chainsList": [],
        "blockheight": 0,
        "urisList": [],
        "version": "",
        "alias": ""
      }
    ],
    [
      "LTC",
      {
        "error": "lnd is not available",
        "chainsList": [],
        "blockheight": 0,
        "urisList": [],
        "version": "",
        "alias": ""
      }
    ]
  ],
  "raiden": {
    "error": "connect ECONNREFUSED 172.18.0.2:5001",
    "address": "",
    "channels": 0,
    "version": ""
  }
}

@ghost
Copy link

ghost commented Jun 18, 2019

Can you also please include xud.log?

@sangaman
Copy link
Collaborator

When I tweak the code to always treat synced_to_chain as false, I get reconnection attempts as expected:

18/06/2019 04:32:30.245 [LND] info: BTC: trying to verify connection to lnd at localhost:10002
18/06/2019 04:32:30.249 [LND] info: LTC: trying to verify connection to lnd at localhost:10001
18/06/2019 04:32:30.261 [LND] info: BTC: LndClient status: OutOfSync
18/06/2019 04:32:30.264 [LND] warn: BTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:30.266 [LND] info: LTC: LndClient status: OutOfSync
18/06/2019 04:32:30.268 [LND] warn: LTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:32.527 [LND] info: BTC: trying to verify connection to lnd at localhost:10002
18/06/2019 04:32:32.539 [LND] info: BTC: LndClient status: OutOfSync
18/06/2019 04:32:32.541 [LND] warn: BTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:35.224 [LND] info: LTC: trying to verify connection to lnd at localhost:10001
18/06/2019 04:32:35.238 [LND] info: LTC: LndClient status: OutOfSync
18/06/2019 04:32:35.241 [LND] warn: LTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:35.264 [LND] info: BTC: trying to verify connection to lnd at localhost:10002
18/06/2019 04:32:35.269 [LND] info: LTC: trying to verify connection to lnd at localhost:10001
18/06/2019 04:32:35.277 [LND] info: BTC: LndClient status: OutOfSync
18/06/2019 04:32:35.279 [LND] warn: BTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:35.284 [LND] info: LTC: LndClient status: OutOfSync
18/06/2019 04:32:35.286 [LND] warn: LTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:37.543 [LND] info: BTC: trying to verify connection to lnd at localhost:10002
18/06/2019 04:32:37.555 [LND] info: BTC: LndClient status: OutOfSync
18/06/2019 04:32:37.558 [LND] warn: BTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:40.241 [LND] info: LTC: trying to verify connection to lnd at localhost:10001
18/06/2019 04:32:40.254 [LND] info: LTC: LndClient status: OutOfSync
18/06/2019 04:32:40.256 [LND] warn: LTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:40.279 [LND] info: BTC: trying to verify connection to lnd at localhost:10002
18/06/2019 04:32:40.286 [LND] info: LTC: trying to verify connection to lnd at localhost:10001
18/06/2019 04:32:40.292 [LND] info: BTC: LndClient status: OutOfSync
18/06/2019 04:32:40.294 [LND] warn: BTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:40.299 [LND] info: LTC: LndClient status: OutOfSync
18/06/2019 04:32:40.301 [LND] warn: LTC: lnd is out of sync with chain, retrying in 5000 ms
18/06/2019 04:32:42.559 [LND] info: BTC: trying to verify connection to lnd at localhost:10002
18/06/2019 04:32:42.562 [LND] info: BTC: LndClient status: OutOfSync
18/06/2019 04:32:42.562 [LND] warn: BTC: lnd is out of sync with chain, retrying in 5000 ms

Seeing the log file output you got would probably help.

@kilrau
Copy link
Contributor Author

kilrau commented Jun 18, 2019

System:
Linux K-Yoga 4.18.0-22-generic #23~18.04.1-Ubuntu SMP Thu Jun 6 08:37:25 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux, xud-docker latest master, Docker version 18.09.6, build 481bc77, docker-compose version 1.24.0, build 0aa59064

Reproduced this again, here the important points in time with relevant log parts each:

t0 = start xud, lndbtc and bitcoind container at the same time.

  "lndMap": [
    [
      "BTC",
      {
        "error": "lnd is not available",
        "chainsList": [],
        "blockheight": 0,
        "urisList": [],
        "version": "",
        "alias": ""
      }
    ],

makes sense because lndbtc is still l "lndMap": [
[
"BTC",
{
"error": "",
"channels": {
"active": 0,
"inactive": 0,
"pending": 0
},
"chainsList": [],
"blockheight": 1549278,
"urisList": [],
"version": "0.6.1-beta commit=v0.6.1-beta",
"alias": "lndbtc"
}
],
ocked:
> lndbtc-lncli getinfo [lncli] Wallet is encrypted. Please unlock using 'lncli unlock', or set password using 'lncli create' if this is the first time starting lnd.

t1 = unlock lndbtc, shows "synced_to_chain": true

> lndbtc-lncli getinfo
{
	"version": "0.6.1-beta commit=v0.6.1-beta",
	"identity_pubkey": "03aa43ec9de129585386f6b06bb2a0843647a8d7db352c58ed0ad5de7d208761d4",
	"alias": "lndbtc",
	"num_pending_channels": 0,
	"num_active_channels": 0,
	"num_inactive_channels": 0,
	"num_peers": 3,
	"block_height": 1548791,
	"block_hash": "0000000000078bb518ffe01525036dc9678d6e0c15eda941f752cd48d98aabd1",
	"best_header_timestamp": 1560863348,
	"synced_to_chain": true,
	"testnet": true,
	"chains": [
		{
			"chain": "bitcoin",
			"network": "testnet"
		}
	],
	"uris": null
}

t2 = xucli getinfo again, still

  "lndMap": [
    [
      "BTC",
      {
        "error": "lnd is not available",
        "chainsList": [],
        "blockheight": 0,
        "urisList": [],
        "version": "",
        "alias": ""
      }
    ],

xud.log shows repetitions of:

xud_1        | 18/06/2019 13:18:53.159 [LND] info: trying to verify connection to lnd with uri: lndltc:10009
xud_1        | 18/06/2019 13:18:53.161 [LND] info: LndClient status: Disconnected
xud_1        | 18/06/2019 13:18:53.161 [LND] error: could not verify connection to lnd at lndltc:10009, error: {"code":12,"metadata":{"_internal_repr":{}},"details":"unknown service lnrpc.Lightning"},
xud_1        |           retrying in 5000 ms
xud_1        | 18/06/2019 13:18:54.279 [LND] info: trying to verify connection to lnd with uri: lndbtc:10009
xud_1        | 18/06/2019 13:18:54.281 [LND] info: LndClient status: Disconnected
xud_1        | 18/06/2019 13:18:54.281 [LND] error: could not verify connection to lnd at lndbtc:10009, error: {"code":12,"metadata":{"_internal_repr":{}},"details":"unknown service lnrpc.Lightning"},
xud_1        |           retrying in 5000 ms

so indeed it's retrying to connect every 5s, but fails because of "code":12,"metadata":{"_internal_repr":{}},"details":"unknown service lnrpc.Lightning"

t3 = docker-compose restart xud, lndbtc gets correctly recognized

  "lndMap": [
    [
      "BTC",
      {
        "error": "",
        "channels": {
          "active": 0,
          "inactive": 0,
          "pending": 0
        },
        "chainsList": [],
        "blockheight": 1549278,
        "urisList": [],
        "version": "0.6.1-beta commit=v0.6.1-beta",
        "alias": "lndbtc"
      }
    ],

xud.log simply shows lndbtc ConnectionVerified in the beginning, nothing thereafter:

xud_1        | 18/06/2019 13:21:28.681 [GLOBAL] info: config loaded
xud_1        | 18/06/2019 13:21:28.855 [DB] info: connected to database /root/.xud/xud.db
xud_1        | 18/06/2019 13:21:28.877 [RAIDEN] info: RaidenClient status: Disconnected
xud_1        | 18/06/2019 13:21:28.877 [RAIDEN] info: trying to verify connection to raiden with uri: raiden:5001
xud_1        | 18/06/2019 13:21:28.900 [RAIDEN] error: Error: connect ECONNREFUSED 172.18.0.3:5001
xud_1        |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1106:14)
xud_1        | 18/06/2019 13:21:28.901 [RAIDEN] error: could not verify connection to raiden at raiden:5001, retrying in 5000 ms
xud_1        | 18/06/2019 13:21:28.904 [LND] info: LndClient status: Disconnected
xud_1        | 18/06/2019 13:21:28.904 [LND] info: trying to verify connection to lnd with uri: lndbtc:10009
xud_1        | 18/06/2019 13:21:28.908 [LND] info: LndClient status: Disconnected
xud_1        | 18/06/2019 13:21:28.908 [LND] info: trying to verify connection to lnd with uri: lndltc:10009
xud_1        | 18/06/2019 13:21:28.922 [LND] info: LndClient status: OutOfSync
xud_1        | 18/06/2019 13:21:28.922 [LND] error: lnd at lndltc:10009 is out of sync with chain, retrying in 5000 ms
xud_1        | 18/06/2019 13:21:28.926 [LND] info: LndClient status: ConnectionVerified
xud_1        | 18/06/2019 13:21:28.928 [GLOBAL] info: Local nodePubKey is 030c74ebef2068dfa48c8cae1a33b670528cf10d636d11f8f9ee61e89f02ab

Summary

So you are right @sangaman , the 5s pull mechanism is there, it just fails because of "code":12,"metadata":{"_internal_repr":{}},"details":"unknown service lnrpc.Lightning" unless xud is restarted.

What one notices is, that after xud was restarted, lndbtc gets correctly recognized as out of sync,
xud_1 | 18/06/2019 13:21:28.922 [LND] info: LndClient status: OutOfSync
whereas before it was also "code":12,"metadata":{"_internal_repr":{}},"details":"unknown service lnrpc.Lightning". My guess would be it got sth to do with lnd not being ready/unlocked when xud starts pulling.

@kilrau
Copy link
Contributor Author

kilrau commented Jun 18, 2019

I think what would be nice to fix with this:

  • change the logging to [LNDBTC] and [LNDLTC] since it's hard to tell apart
  • change state in getinfo to waiting for sync if LndClient status: OutOfSync. ATM it just says lnd is not available which could be anything.

@sangaman
Copy link
Collaborator

sangaman commented Jun 18, 2019

As far as I understand it the unknown service lnrpc.Lightning error results from starting xud while it's running the WalletUnlocker service rather than the Lightning service, so it's odd that we'd see this as a result of being out of sync. @kilrau do you think it's possible you started xud before unlocking lnd? That would make sense of this issue for me, although it is still something we should handle gracefully.

Edit: Never mind, I think the post above explains the steps in enough detail, I'm still just confused as to why it still says unknown service after lnd has been unlocked.

@kilrau
Copy link
Contributor Author

kilrau commented Jun 18, 2019

@kilrau do you think it's possible you started xud before unlocking lnd? That would make sense of this issue for me, although it is still something we should handle gracefully.

Yes xud was started at the same time as lnd and yes lnd was not unlocked yet when starting xud was started, see t0. This issue is about handling this gracefully.

@sangaman
Copy link
Collaborator

It looks like the grpc package is doing some weird caching where, once we make a Lightning call that's unimplemented because the wallet is currently locked, it always comes back as unimplemented even after we unlock the wallet. Recreating the client with new LightningClient() doesn't help, but restarting the entire process does. I might look into opening an issue in the gprc node repo, but for now I'll try making a WalletUnlocker call first to determine whether the wallet is locked or unlocked.

@sangaman
Copy link
Collaborator

It looks like the solution is actually to call grpc.closeClient() which clears that caching issue I mentioned above, I'll have a PR soon.

sangaman added a commit that referenced this issue Jun 19, 2019
This fixes an issue where attempting a `GetInfo` call to lnd while lnd
is locked results in an `UNIMPLEMENTED` grpc error that persists even
after lnd has been unlocked and the `GetInfo` call becomes available. We
close grpc clients after connection verification attempts so that we
have a fresh start for future attempts

Fixes #1039.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1 top priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants