Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

cfheaders syncing is fragile #1369

Open
brandoncurtis opened this issue Jan 16, 2019 · 1 comment
Open

cfheaders syncing is fragile #1369

brandoncurtis opened this issue Jan 16, 2019 · 1 comment
Labels
scope: sync issues relating to the sync process type: bug 🐛 Something isn't working

Comments

@brandoncurtis
Copy link

brandoncurtis commented Jan 16, 2019

Description

Zap fails to sync its cfheaders, preventing its use.

Expected Behavior

Zap syncs its cfheaders without issues.

Actual Behavior

Sync has been a problem before (see #903, #825), but it appears to be broken in the newest v0.3.3 release.

Despite the fact that Zap provides two btcd peers to sync from, Zap appears to lock onto one of these peers for cfheaders. If this peer is having issues, sync stalls:

2019-01-16 13:41:42.484 [INF] BTCN: Waiting for more block headers, then will start cfheaders sync from height 0...
2019-01-16 13:41:42.933 [INF] BTCN: Starting cfheaders sync for filter_type=0
2019-01-16 13:41:42.933 [DBG] BTCN: Getting filter checkpoints up to height=1000007, hash=00000000001ccb893d8a1f25b70ad173ce955e5f50124261bbbc50379a612ddf
2019-01-16 13:41:45.933 [INF] BTCN: Fetching set of checkpointed cfheaders filters from height=0, hash=21584579b7eb08997773e5aeff3a7f932700042d0ed2a6129012b7d7ae81b750
2019-01-16 13:41:45.933 [INF] BTCN: Starting to query for cfheaders from checkpoint_interval=0
2019-01-16 13:41:45.933 [INF] BTCN: Attempting to query for 2 cfheader batches
2019-01-16 13:41:45.933 [DBG] BTCN: Sending getcfheaders to 35.196.85.107:18333 (outbound)
2019-01-16 13:41:45.933 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)
2019-01-16 13:41:46.037 [DBG] BTCN: Received cfheaders from 35.196.85.107:18333 (outbound)
2019-01-16 13:41:46.038 [DBG] BTCN: Got cfheaders from height=1 to height=1000, prev_hash=21584579b7eb08997773e5aeff3a7f932700042d0ed2a6129012b7d7ae81b750
2019-01-16 13:41:48.934 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)

2019-01-16 13:47:52.681 [DBG] BTCN: Received inv (block 00000000000000fa973c3cd60c1d2772146a42bd249e5c62a8241e6a3de8dc70) from 35.196.85.107:18333 (outbound)
2019-01-16 13:47:52.682 [DBG] BTCN: Sending getheaders (locator 000000000000002a80392ce4c5c35ff5b5cb7038b2123c89ce214b788f6489ec, stop 00000000000000fa973c3cd60c1d2772146a42bd249e5c62a8241e6a3de8dc70) to 35.196.85.107:18333 (outbound)
2019-01-16 13:47:52.737 [DBG] BTCN: Received headers (num 1) from 35.196.85.107:18333 (outbound)
2019-01-16 13:47:52.737 [INF] BTCN: Processed 94706 blocks in the last 4m19.62s (height 1452714, 2019-01-16 13:47:47 -0700 MST)

2019-01-16 13:53:42.588 [DBG] BTCN: Received ping from 35.196.85.107:18333 (outbound)
2019-01-16 13:53:42.588 [DBG] BTCN: Sending pong to 35.196.85.107:18333 (outbound)
2019-01-16 13:53:42.634 [DBG] BTCN: Received pong from 35.231.86.179:18333 (outbound)
2019-01-16 13:53:42.635 [DBG] BTCN: Received pong from 35.196.85.107:18333 (outbound)
2019-01-16 13:53:42.982 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)
2019-01-16 13:53:45.982 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)
2019-01-16 13:53:48.983 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)
2019-01-16 13:53:51.983 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)
2019-01-16 13:53:54.983 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)

After restarting zap-desktop:

2019-01-16 14:08:13.772 [DBG] BTCN: Getting filter checkpoints up to height=1452715, hash=000000000000003cbe2dfa5ca0acf7c6debce8ebc2c26ae4d771ec685cb7e901
2019-01-16 14:08:13.772 [DBG] BTCN: Sending getcfcheckpt to 35.196.85.107:18333 (outbound)
2019-01-16 14:08:13.772 [DBG] BTCN: Sending getcfcheckpt to 35.231.86.179:18333 (outbound)
2019-01-16 14:08:13.927 [DBG] BTCN: Received cfcheckpt from 35.196.85.107:18333 (outbound)
2019-01-16 14:08:13.927 [DBG] BTCN: Received unhandled message of type cfcheckpt from 35.196.85.107:18333 (outbound)
2019-01-16 14:08:16.773 [INF] BTCN: Fetching set of checkpointed cfheaders filters from height=1000, hash=20fdcc13d0563fac7b86e070d5e818a5f68974e4f35855e5937b1ca04e3b3522
2019-01-16 14:08:16.773 [INF] BTCN: Starting to query for cfheaders from checkpoint_interval=1
2019-01-16 14:08:16.787 [INF] BTCN: Attempting to query for 1451 cfheader batches
2019-01-16 14:08:16.787 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)
2019-01-16 14:08:16.787 [DBG] BTCN: Sending getcfheaders to 35.196.85.107:18333 (outbound)
2019-01-16 14:08:16.889 [DBG] BTCN: Received cfheaders from 35.196.85.107:18333 (outbound)
2019-01-16 14:08:16.890 [DBG] BTCN: Got cfheaders from height=2001 to height=3000, prev_hash=e04af40a72bc9313aada57f850c04179a353ca36fc2736ad1f41ea768300abae
2019-01-16 14:08:16.890 [DBG] BTCN: Got response for headers at height=2001, only at height=1000, stashing
2019-01-16 14:08:16.890 [DBG] BTCN: Sending getcfheaders to 35.196.85.107:18333 (outbound)
2019-01-16 14:08:17.283 [DBG] BTCN: Received cfheaders from 35.196.85.107:18333 (outbound)
2019-01-16 14:08:17.286 [DBG] BTCN: Got cfheaders from height=3001 to height=4000, prev_hash=fec8c802ed7b9251d03af464d5e2d3f71276402974c968b66b81ff5786a64b36
2019-01-16 14:08:17.286 [DBG] BTCN: Got response for headers at height=3001, only at height=1000, stashing

After restart, Zap is querying the other (non-broken) btcd peer and is receiving headers. However, Zap believes these headers to be non-contiguous and so it stashes these newly-retrieved headers.

2019-01-16 14:10:00.728 [DBG] BTCN: Got cfheaders from height=951001 to height=952000, prev_hash=ec95d17ddead78935b2d006909cf84ac84b373d91fe9a39e23a4ae1d482b30cd
2019-01-16 14:10:00.728 [DBG] BTCN: Got response for headers at height=951001, only at height=1000, stashing
2019-01-16 14:10:00.728 [DBG] BTCN: Sending getcfheaders to 35.196.85.107:18333 (outbound)
2019-01-16 14:10:00.816 [DBG] BTCN: Received cfheaders from 35.196.85.107:18333 (outbound)
2019-01-16 14:10:00.818 [DBG] BTCN: Got cfheaders from height=952001 to height=953000, prev_hash=8de8bf8c25f79b923ea3aa1f05c088da2f41b2caa18b11d9717dd92fda28823d
2019-01-16 14:10:00.818 [DBG] BTCN: Got response for headers at height=952001, only at height=1000, stashing
2019-01-16 14:10:00.819 [DBG] BTCN: Sending getcfheaders to 35.196.85.107:18333 (outbound)
2019-01-16 14:10:00.910 [DBG] BTCN: Received cfheaders from 35.196.85.107:18333 (outbound)
2019-01-16 14:10:00.913 [DBG] BTCN: Got cfheaders from height=953001 to height=954000, prev_hash=2a7433369691cb2412c457a4bf923263666575492e7c2911098721a5ba882efa
2019-01-16 14:10:00.913 [DBG] BTCN: Got response for headers at height=953001, only at height=1000, stashing

Subsequent restarts suffer the same fate, syncing from (nearly) the beginning and stashing everything that's retrieved:

2019-01-16 14:40:57.605 [DBG] BTCN: Getting filter checkpoints up to height=1452717, hash=00000000000000d82f976e6f325e99e9af84e9a36bfbced61a41708da68d2696
2019-01-16 14:40:57.606 [DBG] BTCN: Sending getcfcheckpt to 35.196.85.107:18333 (outbound)
2019-01-16 14:40:57.606 [DBG] BTCN: Sending getcfcheckpt to 35.231.86.179:18333 (outbound)
2019-01-16 14:40:57.766 [DBG] BTCN: Received cfcheckpt from 35.196.85.107:18333 (outbound)
2019-01-16 14:40:57.766 [DBG] BTCN: Received unhandled message of type cfcheckpt from 35.196.85.107:18333 (outbound)
2019-01-16 14:41:00.606 [INF] BTCN: Fetching set of checkpointed cfheaders filters from height=1000, hash=20fdcc13d0563fac7b86e070d5e818a5f68974e4f35855e5937b1ca04e3b3522
2019-01-16 14:41:00.607 [INF] BTCN: Starting to query for cfheaders from checkpoint_interval=1
2019-01-16 14:41:00.619 [INF] BTCN: Attempting to query for 1451 cfheader batches
2019-01-16 14:41:00.619 [DBG] BTCN: Sending getcfheaders to 35.196.85.107:18333 (outbound)
2019-01-16 14:41:00.619 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)
2019-01-16 14:41:00.892 [DBG] BTCN: Received cfheaders from 35.196.85.107:18333 (outbound)
2019-01-16 14:41:00.895 [DBG] BTCN: Got cfheaders from height=1001 to height=2000, prev_hash=20fdcc13d0563fac7b86e070d5e818a5f68974e4f35855e5937b1ca04e3b3522
2019-01-16 14:41:00.895 [DBG] BTCN: Using offset 0 for initial filter header range (new prev_hash=20fdcc13d0563fac7b86e070d5e818a5f68974e4f35855e5937b1ca04e3b3522)
2019-01-16 14:41:00.903 [DBG] BTCN: Writing filter headers up to height=2000, hash=0000000005bdbddb59a3cd33b69db94fa67669c41d9d32751512b5d7b68c71cf, new_tip=e04af40a72bc9313aada57f850c04179a353ca36fc2736ad1f41ea768300abae
2019-01-16 14:41:00.906 [DBG] BTCN: Sending getcfheaders to 35.196.85.107:18333 (outbound)
2019-01-16 14:41:00.991 [DBG] BTCN: Received cfheaders from 35.196.85.107:18333 (outbound)
2019-01-16 14:41:00.993 [DBG] BTCN: Got cfheaders from height=3001 to height=4000, prev_hash=fec8c802ed7b9251d03af464d5e2d3f71276402974c968b66b81ff5786a64b36
2019-01-16 14:41:00.993 [DBG] BTCN: Got response for headers at height=3001, only at height=2000, stashing
2019-01-16 14:41:00.993 [DBG] BTCN: Sending getcfheaders to 35.196.85.107:18333 (outbound)
2019-01-16 14:41:01.082 [DBG] BTCN: Received cfheaders from 35.196.85.107:18333 (outbound)
2019-01-16 14:41:01.084 [DBG] BTCN: Got cfheaders from height=4001 to height=5000, prev_hash=cbbe9381295ff3ef3de121466e569637f84a272cc4e5f01225dc99fd0c761923
2019-01-16 14:41:01.084 [DBG] BTCN: Got response for headers at height=4001, only at height=2000, stashing

Despite the fact that these headers are being stashed and Zap is not successfully syncing, the Zap UI continues to show the user that sync is going fine:

screenshot from 2019-01-16 14-35-08

Possible Fix

Taking a look at the logs, it looks like Zap hardcodes two btcd peers that it uses to sync block and cf headers:

2019-01-16 13:41:45.933 [DBG] BTCN: Sending getcfheaders to 35.196.85.107:18333 (outbound)
2019-01-16 13:41:45.933 [DBG] BTCN: Sending getcfheaders to 35.231.86.179:18333 (outbound)

testnet1-btcd.zaphq.io 35.196.85.107
testnet2-btcd.zaphq.io 35.231.86.179

When these hardcoded peers break, Zap breaks.

This commit indicates that "Latest version of lnd now supports connecting to and syncing from multiple btcd hosts": 92a7854

To make the syncing process more robust, could you add a BTCD host peering whitelist? It would include your two hosted nodes, and other reputable organizations and individuals could PR additions to the list.

It looks like some additional sync logic may be needed to switch between peers when one fails, and to ensure that retrieved cfheaders are contiguous and that the UI properly reflects the sync status.

Steps to Reproduce

Make a new local wallet in Zap v0.3.3. Attempt to sync.

Your Environment

  • Zap Desktop version: v0.3.3 built from source
  • Operating System and version: Linux Mint 19.1
@mrfelton mrfelton added type: bug 🐛 Something isn't working scope: sync issues relating to the sync process labels Jan 16, 2019
@mrfelton
Copy link
Member

mrfelton commented Jan 16, 2019

Thank you for the detailed report @brandoncurtis .

Indeed, there was an issue with the testnet2-btcd.zaphq.io btcd node which had stalled and was not processing new blocks for some reason. I have now restarted this node and it is back in sync again, so the issue should at least be mitigated for the time being.

We're in a bit of a sticky position currently as btcd is the only node implementation that supports neutrino, and there are not a huge number of these nodes out there. Hopefully, in the not too distant future this will be supported in bitcoind and the set of potential sync candidates will increase significantly. Together with that, improvements in lnd will hopefully make the overall sync process more robust.

I agree that if one of the nodes becomes unavailable then it should fallback to another, but I think this needs to be handled at a lower level - in lnd rather than in zap. All we are doing is providing a UI on top of lnd and that is where all of the sync functionality lives.

We can probably do something to detect this issue though and better communicate to the user what is happening.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
scope: sync issues relating to the sync process type: bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants