Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xud doesn't refresh "lnd is WaitingUnlock" state #1090

Closed
1 task
kilrau opened this issue Jul 10, 2019 · 16 comments · Fixed by #1224
Closed
1 task

xud doesn't refresh "lnd is WaitingUnlock" state #1090

kilrau opened this issue Jul 10, 2019 · 16 comments · Fixed by #1224
Assignees
Labels
bug Something isn't working lightning Lightning network & lnd integration P2 mid priority

Comments

@kilrau
Copy link
Contributor

kilrau commented Jul 10, 2019

How it is

Docker-setup: All testnet clients ready & synced:
t0: restart lndltc, unlock lndltc via lncli unlock
t1: now lndltc works and shows "synced_to_chain": true. But xucli getinfo shows lndtlc "error": "lnd is WaitingUnlock",which is not true ([lncli] Wallet is already unlocked)
t2: restarted xud container - fixed.

How it should be

xud pulls unlock state in 5s interval, same as other states from lnd to automatically detect when it was unlocked successfully.

TODO

  • fix xud pulling unlock state in 5s interval, same as other states from lnd to automatically detect when it was unlocked successfully
@kilrau kilrau added the bug Something isn't working label Jul 10, 2019
@sangaman
Copy link
Collaborator

This is a really bizarre issue. I can reproduce it locally reliably. Starting off with lnd locked and then unlocking lnd is fine. But following the steps here, where you first start with lnd unlocked, then stop it, then start it again while locked, then unlock it, results in the status within xud getting stuck on locked.

The bizarre part is that the moment I enter a breakpoint in VS code, the error goes away, even if I never hit the break point. On the very next connection attempt, it will properly show that lnd is no longer locked. I can put in log statements that show that it thinks the GetInfo lnd call is still unimplemented, even when it's not. I ensured that I'm properly closing the grpc client between attempts (that was the solution to a similar issue from earlier). This might be a bug with the node grpc implementation, so I may open an issue there.

@kilrau
Copy link
Contributor Author

kilrau commented Jul 23, 2019

As discussed, please open an issue in node grpc. @sangaman

We can track the status in here.

@sangaman
Copy link
Collaborator

Issue is open here: grpc/grpc-node#993

@sangaman
Copy link
Collaborator

It looks like this may be fixed when using a heartbeat to detect when lnd has gone offline - it could also be related to using newer versions of lnd. I'll open a PR that closes this issue - the issue no longer occurs for me when I try to reproduce it in that PR.

sangaman added a commit that referenced this issue Sep 12, 2019
This commit shortens the interval for the outbound capacity check timer
from 60 to 3 seconds and sets the client as disconnected any time a
call fails due to an unreachable server. This makes the capacity checks
act like a heartbeat, checking that the server is reachable every few
seconds even in the absence of any other activity.

Previously, we had used a dummy server -> client streaming call and
listened for the `error` event on the lnd side, however with newer
versions of lnd and grpc this is no longer a reliable way to tell when
lnd has gone down.

This also resolves an issue where lnd would get stuck in the
`WaitingUnlock` state if it is stopped while xud is running and comes
back online in the locked state.

Closes #1090.
sangaman added a commit that referenced this issue Sep 12, 2019
This commit shortens the interval for the outbound capacity check timer
from 60 to 3 seconds and sets the client as disconnected any time a
call fails due to an unreachable server. This makes the capacity checks
act like a heartbeat, checking that the server is reachable every few
seconds even in the absence of any other activity.

Previously, we had used a dummy server -> client streaming call and
listened for the `error` event on the lnd side, however with newer
versions of lnd and grpc this is no longer a reliable way to tell when
lnd has gone down.

This also resolves an issue where lnd would get stuck in the
`WaitingUnlock` state if it is stopped while xud is running and comes
back online in the locked state.

Closes #1090.
@sangaman sangaman added the lightning Lightning network & lnd integration label Sep 13, 2019
@ghost
Copy link

ghost commented Oct 3, 2019

I'm still experiencing similar issue with latest changes with a fresh docker environment during the 2nd start of the environment.

mainnet > getinfo -j
{
  "version": "1.0.0-testnet.1-c559965-dirty",
  ...
  "lndMap": [
    [  
      "BTC",
      {                  
        "status": "lnd is WaitingUnlock",
        ...
      }
    ],
    [  
      "LTC",
      {
        "status": "lnd is WaitingUnlock",
        ...
      }
    ]
  ],
  ...
}

xud seems to be stuck in a loop of trying to verify lnd connections

logs -f xud
xud_1        | 03/10/2019 09:29:42.644 [LND-LTC] info: wallet unlocked
xud_1        | 03/10/2019 09:29:42.669 [LND-BTC] info: wallet unlocked
xud_1        | 03/10/2019 09:29:42.671 [RPC] info: gRPC server completed shutdown
xud_1        | 03/10/2019 09:29:42.697 [P2P] info: Connecting to mainnet XU network
...
xud_1        | 03/10/2019 09:41:02.636 [LND-BTC] info: trying to verify connection to lnd at lndbtc:10009
xud_1        | 03/10/2019 09:41:02.637 [LND-LTC] info: trying to verify connection to lnd at lndltc:10009

After about ~20 minutes the xucli getinfo reported lnds as active.

@sangaman
Copy link
Collaborator

sangaman commented Oct 3, 2019

xud keeps printing those "trying to verify connection" messages? And just to be clear, you observed this after starting xud while lnds were in a locked state, and then performing the unlock command?

@ghost
Copy link

ghost commented Oct 3, 2019

xud keeps printing those "trying to verify connection" messages? And just to be clear, you observed this after starting xud while lnds were in a locked state, and then performing the unlock command?

Yes, this can be reproduced with a fresh testnet/mainnet docker environment.

After the first launch execute down and then start the environment again with bash xud.sh -> xucli unlock.

@sangaman
Copy link
Collaborator

sangaman commented Oct 3, 2019

Thanks I will look into it.

@kilrau kilrau reopened this Oct 3, 2019
@sangaman
Copy link
Collaborator

sangaman commented Oct 3, 2019

I can't reproduce running in my dev environment against a local lnd instanc. @karl will share full logs if this comes up again, which hopefully will shed some light on how lnd went from connected status (which is required for the unlock calls to work) to disconnected.

@kilrau
Copy link
Contributor Author

kilrau commented Oct 9, 2019

I can't reproduce running in my dev environment against a local lnd instanc

It's easily reproduced by running xud-docker @sangaman

@kilrau kilrau closed this as completed Oct 9, 2019
@kilrau kilrau reopened this Oct 9, 2019
@kilrau kilrau added the P2 mid priority label Oct 9, 2019
@kilrau kilrau added P1 top priority and removed P2 mid priority labels Oct 9, 2019
@kilrau
Copy link
Contributor Author

kilrau commented Oct 31, 2019

Could you reproduce @sangaman ? If not, I can demonstrate it on my machine.

@kilrau kilrau added P2 mid priority and removed P1 top priority labels Nov 7, 2019
@kilrau
Copy link
Contributor Author

kilrau commented Nov 12, 2019

Also experienced by @belboo :

LND dropped dead (possibly due to me trying out the whole dictionary of commands on it :slight_smile: ). After docker restart the lndbtc-cli responded ok, allowing me to unlock the LND wallet. xucli, on both status request and getinfo was saying that the LND was still locked.

@kilrau
Copy link
Contributor Author

kilrau commented Dec 4, 2019

I tested several combinations this morning and can't reproduce this. Is your issue still on? @erkarl

@kilrau
Copy link
Contributor Author

kilrau commented Dec 4, 2019

Otherwise, let's close.

@belboo
Copy link

belboo commented Dec 4, 2019

Never had the issue since on our side!

@kilrau
Copy link
Contributor Author

kilrau commented Dec 4, 2019

Thanks @belboo ! Closing for now.

@kilrau kilrau closed this as completed Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lightning Lightning network & lnd integration P2 mid priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants