Skip to content

Commit a116413

Browse files
committed
docs: add detailed sync troubleshooting guide
Add specialized guide for Base node sync issues based on community reports (base#127, base#251, base#369, base#413, base#419, base#433). Includes: - 4 sync scenarios with Check/Action solutions - Hardware anti-patterns (RAID-5, NAS, SATA warnings) - Reth performance optimization - Monitoring commands and quick reference
1 parent a1357f2 commit a116413

File tree

1 file changed

+210
-0
lines changed

1 file changed

+210
-0
lines changed
Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
---
2+
sidebarTitle: Sync Troubleshooting
3+
title: Detailed Sync Troubleshooting
4+
---
5+
6+
This guide provides detailed solutions for common Base node synchronization issues based on community reports (GitHub issues #127, #251, #369, #413, #419, #433).
7+
8+
## Quick Diagnostic Commands
9+
10+
```bash
11+
# Check sync status
12+
curl -s http://localhost:7545 | jq '.'
13+
14+
# Check current block
15+
curl -X POST -H "Content-Type: application/json" \
16+
--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
17+
http://localhost:8545
18+
19+
# Check peer count
20+
curl -X POST -H "Content-Type: application/json" \
21+
--data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' \
22+
http://localhost:8545
23+
```
24+
25+
---
26+
27+
## Detailed Sync Scenarios
28+
29+
### Node Consistently Behind (12+ Hours)
30+
31+
- **Issue**: Node falls further behind over time, gap keeps growing.
32+
- **Check**: L1 RPC rate limiting:
33+
```bash
34+
docker compose logs node | grep -i "rate limit\|429"
35+
```
36+
- **Check**: Measure lag:
37+
```bash
38+
curl -s http://localhost:7545 | jq '{lag_hours: ((.head_l1.timestamp - .current_l1.timestamp) / 3600)}'
39+
```
40+
- **Root Cause**: L1 RPC endpoint has insufficient throughput or rate limiting.
41+
- **Action**: Upgrade L1 RPC provider:
42+
- Free tier (Infura/Alchemy) insufficient for Base nodes
43+
- Recommended: Alchemy Growth (~$199/mo), QuickNode (~$49/mo), or self-hosted L1 node
44+
- Update `OP_NODE_L1_ETH_RPC` and `OP_NODE_L1_BEACON` in `.env.mainnet`
45+
- Restart: `docker compose down && docker compose up -d`
46+
- **Verify**: Monitor improvement:
47+
```bash
48+
watch -n 10 'curl -s http://localhost:7545 | jq ".current_l1.number, .head_l1.number"'
49+
```
50+
51+
### Node Completely Stuck (No Progress)
52+
53+
- **Issue**: Block height not increasing for 1+ hours, `eth_syncing` returns `false` but node is behind.
54+
- **Check**: Block progression:
55+
```bash
56+
# Record current block, wait 60 seconds, check again
57+
curl -s -X POST -H "Content-Type: application/json" \
58+
--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
59+
http://localhost:8545
60+
```
61+
- **Check**: P2P connectivity (should be 10+ peers):
62+
```bash
63+
curl -X POST -H "Content-Type: application/json" \
64+
--data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' \
65+
http://localhost:8545
66+
```
67+
- **Check**: Port 30303 accessibility:
68+
```bash
69+
sudo netstat -tulpn | grep 30303
70+
# If not listening, check firewall
71+
```
72+
- **Root Cause**: Corrupted database, P2P issues, or lost L1/L2 connection.
73+
- **Action** (try in order):
74+
1. Simple restart: `docker compose restart`
75+
2. Open P2P port if peer count is 0:
76+
```bash
77+
sudo ufw allow 30303/tcp
78+
sudo ufw allow 30303/udp
79+
```
80+
3. If still stuck, consider snapshot restoration (see [Snapshots](/base-chain/node-operators/snapshots)).
81+
82+
### Extremely Slow Initial Sync
83+
84+
- **Issue**: Syncing at < 100 blocks/second, taking weeks instead of days.
85+
- **Check**: Storage type:
86+
```bash
87+
lsblk -d -o NAME,ROTA,TYPE,SIZE,MODEL
88+
# ROTA: 0 = SSD/NVMe (good), 1 = HDD (too slow)
89+
```
90+
- **Check**: Disk performance:
91+
```bash
92+
sudo hdparm -t /dev/nvme0n1 # should show > 1000 MB/s
93+
```
94+
- **Check**: RAID configuration (RAID-5/6 causes 10x slowdown):
95+
```bash
96+
cat /proc/mdstat
97+
```
98+
- **Check**: Disk I/O during sync:
99+
```bash
100+
iostat -x 1 5
101+
# %util > 90% and await > 50ms = disk bottleneck
102+
```
103+
- **Root Cause**: Hardware bottleneck - SATA SSD (3-5x slower), RAID-5/6 (10x penalty), or network-attached storage.
104+
- **Action**:
105+
- **Critical**: If using RAID-5/6, migrate to RAID-0, RAID-10, or single NVMe
106+
- **Critical**: If using network storage (NAS/iSCSI), migrate to local NVMe
107+
- Consider using snapshot to skip initial sync (see [Snapshots](/base-chain/node-operators/snapshots))
108+
- Upgrade to NVMe SSD if using SATA
109+
110+
### Reth-Specific Slow Sync
111+
112+
- **Issue**: Using Reth but sync slower than expected, low resource utilization.
113+
- **Check**: Current peer count:
114+
```bash
115+
curl -X POST -H "Content-Type: application/json" \
116+
--data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' \
117+
http://localhost:8545
118+
# Should be 30-100 for fast sync
119+
```
120+
- **Root Cause**: Reth not configured with performance flags.
121+
- **Action**: Add performance flags to `.env.mainnet`:
122+
```bash
123+
# Edit .env.mainnet
124+
ADDITIONAL_ARGS=--full --max-outbound-peers=100 --max-inbound-peers=30
125+
126+
# For systems with 32GB+ RAM, also add:
127+
# ADDITIONAL_ARGS=--full --max-outbound-peers=100 --max-inbound-peers=30 --max-cache-size=16384
128+
```
129+
- **Action**: Restart to apply changes:
130+
```bash
131+
docker compose down
132+
docker compose up -d
133+
```
134+
- **Verify**: Check flags were applied:
135+
```bash
136+
docker compose logs execution | grep "Starting reth"
137+
```
138+
139+
---
140+
141+
## Hardware Anti-Patterns
142+
143+
### Storage Configurations to Avoid
144+
145+
- **RAID-5 / RAID-6**: Causes 10x write penalty due to parity calculations. Migrate to RAID-0, RAID-10, or single NVMe.
146+
- Check: `cat /proc/mdstat`
147+
148+
- **Network-Attached Storage (NAS/iSCSI)**: Network latency kills sync performance. Use local NVMe only.
149+
- Check: `df -h | grep reth-data`
150+
151+
- **SATA SSD**: 3-5x slower than NVMe. Acceptable for testing, not for production.
152+
- Test speed: `sudo hdparm -t /dev/sda` (should be > 500 MB/s for SATA, > 2000 MB/s for NVMe)
153+
154+
### Recommended Configuration
155+
156+
- **Storage**: Local NVMe SSD (PCIe Gen3/4)
157+
- **RAM**: 32GB+ for Reth with large cache
158+
- **CPU**: 8+ cores recommended
159+
- **L1 RPC**: Paid tier or self-hosted (free tiers insufficient)
160+
161+
---
162+
163+
## Monitoring Commands
164+
165+
```bash
166+
# Calculate blocks synced per minute
167+
BLOCK1=$(curl -s -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' http://localhost:8545 | jq -r '.result' | xargs printf "%d"); sleep 60; BLOCK2=$(curl -s -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' http://localhost:8545 | jq -r '.result' | xargs printf "%d"); echo "Blocks/min: $(($BLOCK2 - $BLOCK1))"
168+
169+
# Check hours behind
170+
curl -s http://localhost:7545 | jq '((.head_l1.timestamp - .current_l1.timestamp) / 3600)'
171+
172+
# Watch sync progress
173+
watch -n 5 'curl -s -X POST -H "Content-Type: application/json" --data "{\"jsonrpc\":\"2.0\",\"method\":\"eth_blockNumber\",\"params\":[],\"id\":1}" http://localhost:8545 | jq -r ".result" | xargs printf "%d\n"'
174+
175+
# Container resources
176+
docker stats --no-stream
177+
178+
# Recent errors
179+
docker compose logs --since 1h | grep -i error | tail -20
180+
```
181+
182+
---
183+
184+
## Quick Reference
185+
186+
| Task | Command |
187+
|------|---------|
188+
| Check sync status | `curl -s http://localhost:7545 \| jq '.'` |
189+
| Current block | `curl -X POST ... eth_blockNumber ...` |
190+
| Peer count | `curl -X POST ... net_peerCount ...` |
191+
| Exec logs | `docker compose logs -f execution` |
192+
| Node logs | `docker compose logs -f node` |
193+
| Restart | `docker compose restart` |
194+
| Disk I/O | `iostat -x 1 5` |
195+
| RAID config | `cat /proc/mdstat` |
196+
| Disk speed | `sudo hdparm -t /dev/nvme0n1` |
197+
198+
---
199+
200+
## Related Issues
201+
202+
This guide addresses issues reported in:
203+
- [#127](https://github.com/base-org/node/issues/127) - Node 12+ hours behind
204+
- [#251](https://github.com/base-org/node/issues/251) - Intermittent slow sync
205+
- [#369](https://github.com/base-org/node/issues/369) - RAID-5 performance issues
206+
- [#413](https://github.com/base-org/node/issues/413) - op-reth slow sync
207+
- [#419](https://github.com/base-org/node/issues/419) - Node stuck/unsynced
208+
- [#433](https://github.com/base-org/node/issues/433) - Snapshot issues
209+
210+
For general troubleshooting, see [Node Troubleshooting](/base-chain/node-operators/troubleshooting).

0 commit comments

Comments
 (0)