Skip to content

Commit

Permalink
Dev Docs: Document Blocks-First IBD & Orphan Blocks
Browse files Browse the repository at this point in the history
This commit provides a detailed overview of the current block download
method, which I've retroactively named blocks-first for parallelism with
headers-first.

New And Significantly Revised:

* New Initial Block Download (IBD) section (h3) with Blocks-First
  subsection (h4)

* New Orphan Block subsection (under Blocks Broadcasting) describing
  orphan blocks and how they're handled under blocks-first. Also
  includes a simple illustration of the difference between orphan blocks
  and stale blocks. Thanks to luke-jr for his s/orphan block/stale
  block/ commit a couple months ago---that made this commit much easier.

Edits:

* Cleaned up a couple cases missed by previous s/orphan/stale/ commit
  because they used past tense (orphaned).

* Mentioned direct RPC changes introduced by headers-first pull in the
  RPC docs. TODO'd those sections to ensure we provide updated examples
  once 0.10 is released.

* In P2P reference section, mentioned that a `block` message can be sent
  unsolicited by miners.

* Mention that `getheaders` and `headers` were added in protocol
  version 31800.

* Moved a few internal links around and added a few new internal links.

Administrivia:

* Started adding "TODOv0.10" in HTML comments to places that need to be
  updated when 0.10 is released so that I can easily git grep for that
  tag later.
  • Loading branch information
harding committed Dec 18, 2014
1 parent 46f897f commit ab4234d
Show file tree
Hide file tree
Showing 26 changed files with 1,051 additions and 18 deletions.
17 changes: 17 additions & 0 deletions _autocrossref.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,18 @@ block:
block chain:
block-chain: block chain
block header:
block headers: block header
block height:
'`block` message': block message
'`block` messages': block message
block reward:
block time:
block version:
blocks: block
blocks-first: blocks-first sync
blocks-first sync:
blocks-first IBD: blocks-first sync
bloom filter:
broadcast:
broadcasts: broadcast
broadcasting:
Expand Down Expand Up @@ -113,6 +118,8 @@ HD protocol:
'`headers` messages': headers message
high-priority transaction: high-priority transactions
high-priority transactions:
IBD: initial block download
initial block download:
inputs: input
input:
intermediate certificate:
Expand Down Expand Up @@ -183,6 +190,8 @@ op codes: op code
'`op_hash160`': op_hash160
'`op_return`': op_return
'`op_verify`': op_verify
orphan block:
orphan blocks: orphan block
outpoint:
outpoints: outpoint
outputs: output
Expand Down Expand Up @@ -212,6 +221,7 @@ pki:
'`point()`': point function
'`pong` message': pong message
'`pong` messages': pong message
previous block header hash:
private key:
private keys: private key
proof of work:
Expand All @@ -222,6 +232,7 @@ protocol version 106: section protocol versions
protocol version 209: section protocol versions
protocol version 311: section protocol versions
protocol version 31402: section protocol versions
protocol version 31800: section protocol versions
protocol version 60000: section protocol versions
protocol version 60001: section protocol versions
protocol version 60002: section protocol versions
Expand Down Expand Up @@ -263,6 +274,10 @@ script hash:
secp256k1:
sequence number:
sequence numbers: sequence number
serialized block:
serialized blocks:
serialized transaction: raw format
serialized transactions: raw format
SIGHASH: signature hash
'`SIGHASH_ANYONECANPAY`': shacp
'`SIGHASH_ALL`': sighash_all
Expand Down Expand Up @@ -420,9 +435,11 @@ Bitcoin Core 0.1.6:
Bitcoin Core 0.2.9:
Bitcoin Core 0.3.11:
Bitcoin Core 0.3.15:
Bitcoin Core 0.3.18:
Bitcoin Core 0.6.0:
Bitcoin Core 0.6.1:
Bitcoin Core 0.7.0:
Bitcoin Core 0.8.0:
Bitcoin Core 0.9.0:
Bitcoin Core 0.9.3:
Bitcoin Core 0.10:
4 changes: 2 additions & 2 deletions _includes/guide_block_chain.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,8 +181,8 @@ the fork stronger than the other side.
Assuming a fork only contains valid
blocks, normal peers always follow the the most difficult chain
to recreate and throw away [stale blocks][stale block]{:#term-stale-block}{:.term} belonging to shorter forks.
(Stale blocks are also sometimes called orphans or orphan blocks, but
those terms are also used for blocks without a known parent block.)
(Stale blocks are also sometimes called orphans or orphan blocks<!--noref-->, but
those terms are also used for true orphan blocks without a known parent block.)

[Long-term forks][long-term fork]{:#term-long-term-fork}{:.term} are possible if different miners work at cross-purposes,
such as some miners diligently working to extend the block chain at the
Expand Down
205 changes: 202 additions & 3 deletions _includes/guide_p2p_network.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,180 @@ In order to maintain a connection with a peer, nodes by default will send a mess

{% endautocrossref %}

### Initial Block Download
{% include helpers/subhead-links.md %}

{% autocrossref %}

Before a full node can validate unconfirmed transactions and
recently-mined blocks, it must download and validate all blocks from
block 1 (the block after the hardcoded genesis block) to the current tip
of the best block chain. This is the Initial Block Download (IBD) or
initial sync.

Although the word "initial" implies this method is only used once, it
can also be used any time a large number of blocks need to be
downloaded, such as when a previously-caught-up node has been offline
for a long time. In this case, a node can use the IBD method to download
all the blocks which were produced since the last time it was online.

Bitcoin Core uses the IBD method any time the last block on its local
best block chain has a block header time more than 24 hours in the past.
Bitcoin Core 0.10 will also perform IBD if its local best block chain is
more than 144 blocks lower than its local best headers chain (that is,
the local block chain is more than about 24 hours in the past).

{% endautocrossref %}

#### Blocks-First
{% include helpers/subhead-links.md %}

{% autocrossref %}

Bitcoin Core (up until version [0.9.3][bitcoin core 0.9.3]) uses a
simple initial block download (IBD) method we'll call *blocks-first*.
The goal is to download the blocks from the best block chain in sequence.

The first time a node is started, it only has a single block in its
local best block chain---the hardcoded genesis block (block 0). This
node chooses a remote peer, called the sync node, and sends it the
`getblocks` message illustrated below.

![First GetBlocks Message Sent During IBD](/img/dev/en-ibd-getblocks.svg)

In the header hashes field of the `getblocks` message, this new node
sends the header hash of the only block it has, the genesis block
(6fe2...0000 in internal byte order). It also sets the stop hash field
to all zeroes to request a maximum-size response.

Upon receipt of the `getblocks` message, the sync node takes the first
(and only) header hash and searches its local best block chain for a
block with that header hash. It finds that block 0 matches, so it
replies with 500 block inventories (the maximum response to a
`getblocks` message) starting from block 1. It sends these inventories
in the `inv` message illustrated below.

![First Inv Message Sent During IBD](/img/dev/en-ibd-inv.svg)

Inventories are unique identifiers for information on the network. Each
inventory contains a type field and the unique identifier for an
instance of the object. For blocks, the unique identifier is a hash of
the block's header.

The block inventories appear in the `inv` message in the same order they
appear in the block chain, so this first `inv` message contains
inventories for blocks 1 through 501. (For example, the hash of block 1
is 4860...0000 as seen in the illustration above.)

The IBD node uses the received inventories to request 128 blocks from
the sync node in the `getdata` message illustrated below.

![First GetData Message Sent During IBD](/img/dev/en-ibd-getdata.svg)

It's important to headers-first nodes that the blocks be requested and
sent in order because each block header references the header hash of
the preceeding block. That means the IBD node can't fully validate a
block until its parent block has been received. Blocks that can't be
validated because their parents haven't been received are called orphan
blocks; a subsection below describes them in more detail.

Upon receipt of the `getdata` message, the sync node replies with each
of the blocks requested. Each block is put into serialized block format
and sent in a separate `block` message. The first `block` message sent
(for block 1) is illustrated below.

![First Block Message Sent During IBD](/img/dev/en-ibd-block.svg)

The IBD node downloads each block, validates it, and then requests the
next block it hasn't requested yet, maintaining a queue of up to 128
blocks to download. When it has requested every block for which it has
an inventory, it sends another `getblocks` message to the sync node
requesting the inventories of up to 500 more blocks. This second
`getblocks` message contains multiple header hashes as illustrated
below:

![Second GetBlocks Message Sent During IBD](/img/dev/en-ibd-getblocks2.svg)

Upon receipt of the second `getblocks` message, the sync node takes the
first listed header hash and searches its local best block chain for a
block with that header hash. If it finds a block with that hash, it
replies with 500 block inventories starting with the following block.
But if it doesn't find a block with that hash, it takes the next header
hash from the `getblocks` message and searches its block chain for that
hash. If that hash matches, it will reply with 500 block inventories
starting with the following hash from that point. But, again, if it
doesn't find that hash, it will proceed to check the next hash in the
message (and so on until it runs out of hashes in the message). If the
last hash in the message (besides the stopping hash) doesn't match, it
assumes the only block the two nodes have in common is block 0 and so it
sends an `inv` starting with block 1 (the same `inv` message seen
several illustrations above).

This repeated search allows the sync node to send useful inventories even if
the IBD node's local block chain forked from the sync node's local block
chain. This fork detection becomes increasingly useful the closer the
IBD node gets to the tip of the block chain.

When the IBD node receives the second `inv` message, it will request
those blocks using `getdata` messages. The sync node will respond with
`block` messages. Then the IBD node will request more inventories with
another `getblocks` message---and the cycle will repeat until the IBD
node is synced to the tip of the block chain. At that point, the node
will accept blocks sent through the regular block broadcasting described
in a later subsection.

{% endautocrossref %}

##### Blocks-First Advantages & Disadvantages
{:.no_toc}
{% include helpers/subhead-links.md %}

{% autocrossref %}

The primary advantage of blocks-first IBD is its simplicity. The primary
disadvantage is that the IBD node relies on a single sync node for all
of its downloading. This has several implications:

* **Speed Limits:** All requests are made to the sync node, so if the
sync node has limited upload bandwidth, the IBD node will have slow
download speeds. Note: if the sync node goes offline, Bitcoin Core
will continue downloading from another node---but it will still only
download from a single sync node at a time.

* **Download Restarts:** The sync node can send a non-best (but
otherwise valid) block chain to the IBD node. The IBD node won't be
able to identify it as non-best until the initial block download nears
completion, forcing the IBD node to restart its block chain download
over again from a different node. Bitcoin Core ships with several
block chain checkpoints at various block heights selected by
developers to help an IBD node detect that it is being fed an
alternative block chain history---allowing the IBD node to restart
its download earlier in the process.

* **Disk Fill Attacks:** Closely related to the download restarts, if
the sync node sends a non-best (but otherwise valid) block chain, the
chain will be stored on disk, wasting space and possibly filling up
the disk drive with useless data.

* **High Memory Use:** Whether maliciously or by accident, the sync node
can send blocks out of order, creating orphan blocks which can't be
validated until their parents have been received and validated.
Orphan blocks are stored in memory while they await validation,
which may lead to high memory use.

All of these problems are addressed in part or in full by the
headers-first IBD method used in Bitcoin Core 0.10.

**Resources:** The table below summarizes the messages mentioned
throughout this subsection. The links in the message field will take you
to the reference page for that message.

| **Message** | [`getblocks`][getblocks message] | [`inv`][inv message] | [`getdata`][getdata message] | [`block`][block message]
| **From→To** | IBD→Sync | Sync→IBD | IBD→Sync | Sync→IBD
| **Payload** | One or more header hashes | Up to 500 block inventories (unique identifiers) | One or more block inventories | One serialized block

{% endautocrossref %}

### Block Broadcasting
{% include helpers/subhead-links.md %}

Expand All @@ -151,6 +325,31 @@ New blocks are also discovered as miners publish their found blocks, and these m

{% endautocrossref %}

#### Orphan Blocks
{% include helpers/subhead-links.md %}

{% autocrossref %}

Blocks-first nodes may download orphan blocks---blocks whose previous
block header hash field refers to a block header this node
hasn't seen yet. In other words, orphan blocks have no known parent
(unlike stale blocks, which have known parents but which aren't part of
the best block chain).

![Difference Between Orphan And Stale Blocks](/img/dev/en-orphan-stale-definition.svg)

When a blocks-first node downloads an orphan block, it will not validate
it. Instead, it will send a `getblocks` message to the node which sent
the orphan block; the broadcasting node will respond with an `inv` message
containing inventories of any blocks the downloading node is missing (up
to 500); the downloading node will request those blocks with a `getdata`
message; and the broadcasting node will send those blocks with a `block`
message. The downloading node will validate those blocks, and once the
parent of the former orphan block has been validated, it will validate
the former orphan block.

{% endautocrossref %}

### Transaction Broadcasting
{% include helpers/subhead-links.md %}

Expand Down Expand Up @@ -179,13 +378,13 @@ unconfirmed transactions tend to slowly disappear from the network as
peers restart or as they purge some transactions to make room in memory
for others.

Transactions which are mined into blocks that are later orphaned may be
Transactions which are mined into blocks that later become stale blocks may be
added back into the memory pool. These re-added transactions may be
re-removed from the pool almost immediately if the replacement blocks
include them. This is the case in Bitcoin Core, which removes orphaned
include them. This is the case in Bitcoin Core, which removes stale
blocks from the chain one by one, starting with the tip (highest block).
As each block is removed, its transactions are added back to the memory
pool. After all of the orphaned blocks are removed, the replacement
pool. After all of the stale blocks are removed, the replacement
blocks are added to the chain one by one, ending with the new tip. As
each block is added, any transactions it confirms are removed from the
memory pool.
Expand Down
2 changes: 1 addition & 1 deletion _includes/ref_block_chain.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ serialized header format part of the consensus rules.
| Bytes | Name | Data Type | Description
|-------|---------------------|-----------|----------------
| 4 | version | uint32_t | The [block version][]{:#term-block-version}{:.term} number indicates which set of block validation rules to follow. See the list of block versions below.
| 32 | previous block hash | char[32] | A SHA256(SHA256()) hash in internal byte order of the previous block's header. This ensures no previous block can be changed without also changing this block's header.
| 32 | [previous block header hash][]{:#term-previous-block-header-hash}{:.term} | char[32] | A SHA256(SHA256()) hash in internal byte order of the previous block's header. This ensures no previous block can be changed without also changing this block's header.
| 32 | merkle root hash | char[32] | A SHA256(SHA256()) hash in internal byte order. The merkle root is derived from the hashes of all transactions included in this block, ensuring that none of those transactions can be modified without modifying the header. See the [merkle trees section][section merkle trees] below.
| 4 | time | uint32_t | The [block time][]{:#term-block-time}{:.term} is a Unix epoch time when the miner started hashing the header (according to the miner). Must be greater than or equal to the median time of the previous 11 blocks. Full nodes will not accept blocks with headers more than two hours in the future according to their clock.
| 4 | nBits | uint32_t | An encoded version of the target threshold this block's header hash must be less than or equal to. See the nBits format described below.
Expand Down
Loading

0 comments on commit ab4234d

Please sign in to comment.