Initial draft, comments welcome.
Draft version 0.0.30
BUIP093: Graphene Relay Proposers: George Bissias <gbiss@cs.umass.edu>, Brian Levine <levine@cs.umass.edu> Created: 2018-07-26 Status: Draft
Graphene is a protocol for efficiently relaying blocks across the peer-to-peer network.
Relaying blocks across the peer-to-peer (P2P) network using the least amount of bandwidth and latency has a number of advantages for the operation of any cryptocurrency. Blocks that can be relayed using less bandwidth propagate more quickly, which can increase synchronization among peers and reduce forks in the chain. Using less bandwidth to relay a block also allows greater participation by peers who are behind limited-bandwidth links and routes. Finally, an efficient mechanism for relaying blocks can allow maximum block size to increase, sustaining a larger number of transactions per second overall.
This specification is a stand-alone description of Graphene, but is based heavily on previous work by Ozisik, Andresen, Bissias, Houmansadr, and Levine [1].
In the Bitcoin Cash Network, blocks are relayed from a peer possessing the block (the sender) to a peer requesting it (the receiver). The core idea behind Graphene is for the sender to avoid sending information that is already held by the receiver. Blocks are comprised of a header that is previously unknown to the receiver, and a collection of transactions that are likely to be already within the receiver's mempool. Therefore, to save bandwidth, Graphene attempts to avoid resending the actual transactions and transaction IDs. Instead, Graphene makes use of two set reconciliation data structures for expressing the list of transactions: Bloom filters [4] and Invertible Bloom Lookup Tables (IBLTs) [2,3]. The combination of the two data structures is more efficient than using either alone to express the transaction list, and it is typically more efficient than constructing a list from shortened transaction IDs.
An overview of Graphene is presented in Figure 1, which depicts the following steps between two peers that have confirmed that they are running the same version of the protocol.
- A sender informs the receiver that a new block is available using an inv message.
- If the block is not already known to the receiver, she responds with a get_grblk message that includes a count of transactions in her mempool, m. (Note: the count should also include the number of transactions in the receiver's orphan pool, but for brevity we will henceforth assume that the mempool includes transactions in the orphan pool as well).
- The sender creates a Bloom filter S of all transaction IDs in the block along with an IBLT I containing just the "cheap hash" (first 8 bytes of the hash interpreted as an unsigned little endian 64 bit integer) of each transaction ID. Additionally, any full transactions likely missing from the receiver (such as the coinbase for the block) are collected into additional transactions list V. If there does not exist a protocol-defined canonical transaction ordering, then the sender also creates a list of transaction ranks R, listed lexicographically by ID, which indicates (with log(n) bits for each ID) the intended order for each transaction ID. The sender assembles S, I, V, R, and the block header into a grblk message, which he sends to the receiver.
- The receiver begins by aggregating all locally known transaction IDs into set T, which is comprised of those found in V (see above) as well as her mempool (plus orphan pool). She then uses Bloom filter S to filter transaction IDs from T. Any transaction that appears to be in S is added to her own IBLT, I' . She then performs an IBLT subtraction operation [3] on I and I' to decode the set of transaction IDs in the symmetric difference between the two sets. From this subtraction operation, she may learn either the set of false positive IDs F that falsely passed through S or the set of IDs M that are in the block but missing from T. The subtraction operation either succeeds or fails. There are two types of failure; until later versions of this protocol, recovery from these failures requires a fall-back method:
- Success: The IBLT subtraction succeeds and the set M is empty. All transactions in the block are possessed by the receiver.
- Success: The IBLT subtraction succeeds and the set M is non-empty. Missing transactions must be requested. Here the receiver requests the missing transactions using a get_grblktx message, to which the receiver responds with a grblktx message containing the full transactions indicated by M.
- IBLT Decode failure: The IBLT subtraction operation fails entirely. In this case the receiver cannot be certain of the complete set of transaction IDs in the block. This failure occurs when the receiver does not possess many transactions in the block, but can also happen infrequently (e.g., 1/240 blocks) due to the probabilistic nature of IBLTs. In the current version of this protocol, she must request a fail-over block such as XThin or a full block.
- IBLT Checksum failure, detected when the sender returns too few transactions with the grblktx message: IBLT subtraction succeeds, but returns an erroneous transaction ID due to faulty IBLT checksum. In this scenario, the receiver will issue a get_grblktx message for the erroneous transaction ID, for which the sender will not return a transaction. When the receiver detects that not all transactions have been sent, she will assume a checksum error has occurred and request a fail-over block (e.g., a full or XThin block).
- If IBLT subtraction succeeds (and missing transactions are received), and if no checksum error occurs, then the receiver will be left with an unordered set of transactions that are in the block. (Note that at this stage the receiver is certain to have the actual transactions, not just their IDs).
- The receiver places the transactions in a Merkle tree, which is validated against the root stated in the block header. The order of transactions in the Merkle tree is either determined by the network's protocol-defined canonical ordering or by the specific rank information R included in grblk.
The intuition behind Graphene's use of Bloom filters and IBLTs is as follows, presented as optional reading.
Let us consider several options for relaying a block. The first option is to simply list each 32-byte transaction ID. If our block contains n=2000 transactions, then the total cost is 64,000 bytes. Next, we realize that the chances of an accidental collision among 32-bytes IDs is almost nil, and so our second option is to limit each transaction ID to its first 8 bytes. If our block contains n=2000 transactions, then the total cost is already down to 16,000 bytes.
Our third option is to use Bloom filters, which are an incredibly useful probabilistic data structure for determining whether items are members of a given set. In this case, our set is all transactions in the sender's block (actually, the set of transaction IDs); and the items are the transactions IDs in the receiver's mempool (recall from above that the mempool is assumed to include transactions from the orphan pool as well). A Bloom filter has two special characteristics. First, it has no false negatives. That is, if a Bloom filter tells us that a transaction ID is not in the set, then it definitely is not in the set. Second, a Bloom filters does have false positives. That is, if a Bloom filter tells us that a transaction ID is in the set, then it might not be in the set. We can set the Bloom filter's false positive rate (FPR) to whatever we like. There is an important trade-off though: if the FPR is low, and the Bloom filter is therefore accurate, then it will also be larger in terms of bytes. If we don't mind some wrong answers about what transaction IDs are in the block, then the Bloom filter will be smaller.
How much space is required to relay blocks using Bloom filters? For example, we could set the FPR of the Bloom Filter to f=1/m. In that case, when the receiver checks each of the m transaction IDs in her mempoool, we can expect that the Bloom filter will wrongly state that f*m=(1/m)*m=1 transaction is in the block on average. To make matters worse, we won't know which transaction is the wrong one; as a result of the extra transaction, the Merkle root won't validate. We can try to fix this problem by lowering the FPR of the filter. For example, if we set the FPR to f=1/(144m), then we can expect that the filter will permit a wrong answer only about once every 144 blocks relayed (i.e., only about once a day). But keep in mind, this accuracy will cost us in bytes. The size of a Bloom filter with n items inserted and a false positive rate of f=1/144m is well known to be -n*ln(f)/ln2(2) = n*ln(1/(144m))/(8ln2(2)) bytes. For example, for a block with n=2000 transactions and a mempool of m=6000 transactions total, the Bloom filter will be about 7,113 bytes. That's an improvement over our first and second options, but we can do better.
A fourth option is presented by Invertible Bloom Lookup Tables (IBLTs) [2], another useful probabilistic data structure. They are designed to allow us to discover the symmetric difference between two sets of items. For example, we can create an IBLT of all transactions IDs that are in the sender's block, and then create another IBLT of the transactions in the receiver's mempool. A subtraction [3] of the first IBLT from the second will tell us exactly which transactions in the mempool are not in the block. Given this functionality, one can use IBLTs alone [7] to relay the block from sender to receiver, but unfortunately it is not an efficient approach. The size in bytes of an IBLT increases linearly with the size of the symmetric difference recovered from it. An IBLT uses about 17 bytes per transaction ID that is part of the symmetric difference, and the overhead of an IBLT (in bytes) is about 140% (this value can vary significantly for small symmetric differences, and can be set optimally using techniques described in the Recovery section below). So if the mempool is 2000 transactions larger than the block (i.e., the symmetric difference is 2000), then the sender's IBLT will be about (1.4*2000)*17=47,600 bytes. Not our best option so far.
Our fifth and best solution is a combination of both data structures. First, we pass all transactions in the receiver's mempool through a Bloom filter of the sender's block; however, we allow a good number of false positives, which results in a small Bloom filter. We clean up any mistakes made by the Bloom filter with an IBLT also sent by the sender. The symmetric difference is now quite small: it's equal to number of false positives that were produced by our Bloom filter. There is a trade-off: we can make the Bloom filter larger (more accurate), which results in a smaller IBLT; or we can make the IBLT larger (able to correct more mistakes), which results in a smaller Bloom filter. Graphene picks the parameters of both data structures together so that the summed size is optimally small (using techniques described in the Recovery section below). For example, for n=2000 and m=6000, a sender computes that an IBLT that can recover a=27 items and a Bloom filter of n items with a FPR of f=0.00675 is minimal. In our test implementation (again assuming the IBLT overhead of 140%) this results in a 3,244-byte total based on a 643-byte IBLT and a 2601-byte Bloom filter, which is about 1/5 the size of sending 8-bytes per transaction. If a canonical transaction order is not defined, an expression of the transaction ordering must also be sent, which increases the total by 2,750 bytes to 5,994 bytes (which is about 38% of the cost of sending 8-bytes per transaction). The IBLT will fail to decode about once every 240 blocks.
Graphene maintains this size advantage as block size grows. For example, for a block of n=10,000 transactions, listing 8-bytes of each transaction ID would be 80,000 bytes. With a mempool of m=30,000 transactions, Graphene's cost is 14,482 bytes (or 31,091 bytes when including ordering information).
Graphene introduces four new messages to the network protocol, which are indicated by the following command strings: get_grblk, grblk, get_grblktx, and grblktx. For each Graphene message, the associated command string must be entered in the "command" field of the message.
The get_grblk message (suggested NetMsgType implementation name: GET_GRAPHENE) transmits a single serialized CMempoolInfo data structure. It is used to both signal the desire to receive a Graphene block and to communicate the number of transactions in the receiver's mempool (plus orphan pool).
The grblk message (suggested NetMsgType implementation name: GRAPHENEBLOCK) transmits a single serialized CGrapheneBlock data structure. In the absence of failure or after successful recovery of items from set M (see above), the message contains sufficient information for the receiver to reconstruct the full block.
The get_grblktx (suggested NetMsgType implementation name: GET_GRAPHENETX) message transmits a single serialized CRequestGrapheneBlockTx. This message is sent in the event that the IBLT subtraction operation succeeded and revealed a non-empty set of missing transaction IDs M.
The grblktx (suggested NetMsgType implementation name: GRAPHENETX) message transmits a single serialized CGrapheneBlockTx. This message is sent in response to get_grblktx; it contains full transactions corresponding to any cheap hashes in CRequestGrapheneBlockTx that also appear in the block.
Graphene introduces several new data structures to the network protocol: CMemPoolInfo, CGrapheneBlock, CIblt CGrapheneBlockTx, and CRequestGrapheneBlockTx.
All data structures use the standard Bitcoin serialization format: they use little-endian for integers; vector encoding is Bitcoin standard (compact int length, vector values); and other specific use of the Bitcoin standard "compact int" encoding is noted in the tables below. For brevity, encoding that is identical to the same data structure in existing messages is omitted from our descriptions. Except for CGrapheneBlock, the data structures are comprised of relatively simple C++ constructs, which we detail below. CGrapheneBlock contains CGrapheneSet, which itself contains Bloom filter and IBLT data structures, denoted CBloomFilter and CIblt, respectively. We describe each of these complex structures separately.
CGrapheneBlock is the fundamental data structure for Graphene block propagation. Assuming no unrecoverable failures arise, and combined with any missing transactions recovered from the sender, this data structure contains all information necessary to reconstruct the full block. It is sent as part of the grblk message.
Field Name | Type | Size | Purpose |
header | Block header | 80 bytes | The header of the block being provided, encoded as per the header portion of the BLOCK message (NOT including the nTx field of the HEADERS message) |
vAdditionalTxs | vector<CTransaction> | variable | Transactions that the receiver is probably missing. Standard transaction Bitcoin serialization is used. |
nBlockTxs | uint64_t | 8 bytes | Number of Transactions in the block |
grapheneSet | CGrapheneSet | variable | Encapsulates Graphene set reconciliation logic |
The vAdditionalTxs field should include at least the coinbase transaction, since it is not possible for it to be in the receiver's mempool.
CGrapheneSet contains all information critical to the Graphene set reconciliation process, absent any block-specific details. It is the only non-standard data structure in CGrapheneBlock.
Field Name | Type | Size | Purpose |
ordered | uint8_t | 1 byte | 1 if order is important in the set and 0 otherwise; other values are reserved for future use |
nReceiverUniverseItems | uint64_t | 8 bytes | The number of transactions in the receiver's mempool |
encodedRank | vector<unsigned char> | ordered * ceil(n log(n))/8 bytes | Order information for items in set |
setFilter | CBloomFilter | variable | Bloom filter containing items in set |
setIblt | CIblt | variable | IBLT containing items in set |
If the ordered field has value 1, then it is assumed that ordering information cannot be automatically inferred (e.g. there exists no canonical transaction ordering). In this case, the encodedRank field must encode the order of transactions. The order should be encoded by the sender as follows. First, vector L is formed by listing transactions in ascending lexicographical order according to their 32-byte ID. Second, each ID in L is visited in order (lowest index to highest) and its desired rank (distance from the first position in the list) is added to vector R. Third, we populate boolean vector B by visiting each rank r in the order it appears in R, interpreting r as a binary number with ceil(log(n)) bits, and adding each bit to B (lowest order bits first). Finally, the contents of field encodedRank are formed by interpreting consecutive groups of 8 bits from B as little-endian integers and packing them into a vector of ceil(n * ceil(log(n)) / 8) unsigned 8-bit integers (thus, the first bit from B will occupy the lowest order bit of the first integer added to encodedRank).
CBloomFilter is Bitcoin's standard Bloom filter implementation with one adjustment, described below. It is used as part of the set reconciliation process in CGrapheneSet to filter out transactions from T that do not belong to the block.
Field Name | Type | Size | Purpose |
vData | vector<unsigned char> | variable | Bit array for filter |
isFull | bool | 1 byte | True if every bit in vData is set to 1 |
isEmpty | bool | 1 byte | True if every bit in vData is set to 0 |
nHashFuncs | unsigned int | 2, 4 bytes | Number of hash functions used in filter |
nTweak | unsigned int | 2, 4, bytes | Additive offset for hash function inputs |
nFlags | unsigned char | 1 byte | Defines behavior for transaction insertion |
The standard Bloom filter implementation does not always honor the false positive rate (FPR) requested during initialization. This behavior is problematic for Graphene because the FPR determines the number of erroneous transactions removed from IBLT I, which affects the decode rate for I. For this reason, it is required that nDesiredSize and nHashFuncs be defined slightly differently than in the standard CBloomFilter implementation, as follows.
nDesiredSize = (unsigned int)(ceil(-1 / LN2SQUARED * nElements * log(nFPRate) / 8));
nHashFuncs = (unsigned int)max(MIN_N_HASH_FUNC, int(vData.size() * 8 / nElements * LN2));
CIblt is an IBLT implementation that is part of the CGrapheneSet reconciliation process. In the absence of a reconciliation failure, CIblt will generate false positive list F and missing transaction cheap hashes M.
Field Name | Type | Size | Encoding Details | Purpose |
version | uint64_t | 8 bytes | Compact size | Version bits |
n_hash | uint8_t | 1 byte | Standard | Number of hash functions used |
is_modified | bool | 1 byte | Standard | True if any items have been inserted into the IBLT |
hashTable | vector<HashTableEntry> | variable | Standard | Data cells for IBLT |
The Graphene protocol requires that IBLT I' , created by the receiver, be subtracted from IBLT I, originating from the sender. In order for this operation to succeed, it is critical that the IBLTs use the same quantity of hash functions, have the same number of cells, and that hash function i uses the same function and seed for both I and I' . Although the sender may use complex optimization techniques to determine the number of cells and hash functions for I, the receiver should simply copy those values provided that they are reasonably sized. The ith hash function is the MurmurHash3 with seed i.
HashTableEntry implements a single cell of the CIblt data structure.
Field Name | Type | Size | Purpose |
count | uint32_t | 4 bytes | Number of items |
keySum | uint64_t | 8 bytes | XOR of all keys |
keyCheck | uint32_t | 4 bytes | Error checking for keySum |
valueSum | vector<uint8_t> | variable | XOR of all values |
CMemPoolInfo provides a count of transactions in the receiver's mempool (plus orphan pool) and corresponds to the variable m in the description above. It is sent as part of the get_grblk message.
Field Name | Type | Size | Purpose |
nTx | uint64_t | 8 bytes | Number of transactions that appear in the union of the receiver's mempool and orphan pool |
CRequestGrapheneBlockTx is used by the receiver to request missing transactions M according to their cheap hashes. It is sent as part of the get_grblktx message.
Field Name | Type | Size | Purpose |
blockhash | uint256 | 32 bytes | The hash of the block corresponding to missing transactions |
setCheapHashesToRequest | set<uint64_t> | variable | Cheap hashes of missing transactions requested by receiver |
CGrapheneBlockTx is returned as part of the grblktx message. It contains full transactions corresponding to the cheap hashes included in CRequestGrapheneBlockTx and is returned by the sender in response to a get_grblktx message.
Field Name | Type | Size | Purpose |
blockhash | uint256 | 32 bytes | The hash of the block corresponding to missing transactions |
vMissingTx | vector<CTransaction> | variable | Missing transactions requested by receiver |
The Bitcoin P2P protocol has a message versioning facility and Graphene version should be included in that facility.
Graphene uses IBLTs, which are a probabilistic data structure that have a tunable but non-zero failure rate during normal operation. The IBLT used in Graphene should be tuned to fail once every 240 blocks relayed or fewer for all sizes of IBLT used.
Upon failure, a fall-back option must be used. Currently, the only option is to fall back to a full block or an XTreme Thin (XThin) Block if that protocol is also enabled. Future versions of this specification will specify a more efficient in-protocol recovery mechanism.
Graphene is efficient when the parameters of the Bloom Filter and IBLT are set correctly by the sender. Here we present two methods of setting these parameters. Any method for parameterizing the Bloom Filter and IBLT can be selected by the sender, but it is possible discover the optimal choice resulting in the smallest possible serialized Graphene block. The first method is simpler to implement, but it also less efficient and will result in a slightly lower decode rate than desired.
1. Let m be the size of the receiver's mempool, and let n be the number of transactions in the block. Let t be the average size in bytes per item required to recover a items from the IBLT (i.e., total IBLT table size in bytes divided by a). For example, in our implementation, t=17 bytes, approximately. Then the IBLT should be parameterized to recover a=n/(ct) items, where c=8ln2(2). The Bloom filter should be set to have a false positive rate (FPR) of f=a/(m-n). The derivation of these values is presented in the Appendix of this specification.
2. The sizes of the Bloom filter and IBLT can be significantly smaller if a (brute force) linear search is used to parameterize the two data structures. Simply put, for each integer value of a from 1 to m: (1) determine the size of the IBLT from its actual implementation, taking into account serialization costs; (2) determine the size in bytes of the Bloom filter for a false positive rate of f=a/(m-n) when n items are inserted from its implementation. The value of a that has the minimum sum cost in bytes is selected.
We also note that the IBLT decode rate varies with the number of recovered items a. In order to ensure a consistent decode rate (1 out of 240 blocks, for example), it is necessary to modify the number of hash functions and overhead used when constructing the IBLT. This, in turn, changes the value of t, which affects the accuracy of method 1 above. Determining the combination of hash function quantity and overhead that yields the smallest IBLT for a given decode rate is an orthogonal optimization problem. We have released a stand-alone implementation of IBLTs in C++ (with a Python wrapper) and a script to determine such values.
A simple comparison of Graphene to related work is as follows. A block holds n=2000 transactions, which the receiver holds in its mempool along with 4000 other transactions; in other words m=2000+4000=6000.
Using Graphene, the sender sends a block announcement, and the receiver responds with a get_grblk message, which includes the value m. The sender computes that an IBLT that can recover a=27 items and a Bloom filter of n items with a FPR of f=a/(m-n)=27/(6000-2000)=0.00675 is minimal. In our test implementation, this results in 3,244 bytes total, the sum of a 643-byte IBLT and a 2,601-byte Bloom filter. Without a canonical transaction order, an expression of the transaction ordering must also be sent, increasing the total by 2,750 bytes to 5,994 bytes. (The receiver's IBLT is not sent over the network.)
XTreme Thin Blocks [5, 8, 9] has the receiver start by sending a 3,956-byte Bloom Filter of the mempool with an FPR of f=1/m=1/2000=0.0005, and 8-bytes for each of the n=2000 transactions. The total is therefore 3956+8*2000= 19,956.
Compact Blocks [6] would send over just the 6 bytes for each of the n=2000 transactions, for a total of 6*2000= 12,000.
The above is a simplistic comparison, as the actual operation of all three protocols is more involved. For example, XTreme Thin Blocks v12.1 and above performs "Bloom Filter Targeting", which significantly reduces the size of the Bloom filter it uses.
The size of all these approaches grows linearly with block size, but Graphene grows more slowly. As the example shows, Graphene would benefit significantly from a standardized, canonical ordering of transactions in blocks, which has been proposed by others for separate benefits.
An 8-byte (64-bit) transaction ID results in a very low probability of collision. Let b be the size in bits of the shortened transaction IDs. The probability of collision in this "Birthday Attack" scenario is well-known to be approximated by 1-exp(-m(m-1)/2**(b+1)). For example, for a mempool of m=10,000,000 transactions, the probability of collision using b=64 bits is approximately 0.0000027.
In order to ensure protocol compatibility among different implementations, we clarify proper peer behavior in the following two potentially ambiguous scenarios.
1. A receiver peer should ban any other peer that sends any improperly formatted Graphene message.
2. A receiver peer is never banned by a potential sender for requesting a Graphene block, even if the receiver was never initially sent an inv message for the requested block by the potential sender.
Older clients remain fully compatible and interoperable after this change.
https://github.com/BitcoinUnlimited/BitcoinUnlimited/tree/release
Graphene makes the assumption that the mempool of the receiver already contains the transactions in the block. To relax this assumption, future versions of this specification will ask the sender to do the following between Steps 2 and 3. For each transaction in the block, the sender determines if, for this receiver, he has previously sent or received an INV message. For any transactions where that is not the case, the entire transaction is sent to the receiver as part of the vAdditionalTxs data structure.
As noted above, future versions of this protocol will include a more efficient method of recovery.
[1] OZISIK, A. P. , ANDRESEN, G., BISSIAS G., HOUMANSADR, A. , and LEVINE, B. N. Graphene: A New Protocol for Block Propagation Using Set Reconciliation. In Proc. of International Workshop on Cryptocurrencies and Blockchain Technology (ESORICS Workshop), (Sept 2017). (2 page pdf summary)
[2] GOODRICH, M., and MITZENMACHER, M. Invertible bloom lookup tables. In Proc. Annual Allerton Conf. on Communications, Control, and Computing (Sept 2011), pp. 792–799. (pdf)
[3] EPPSTEIN, D., GOODRICH, M. T., UYEDA, F., and VARGHESE, G. What’s the Difference?: Efficient Set Reconciliation Without Prior Context. In Proc. ACM SIGCOMM (2011). (pdf)
[4] BLOOM, B. H. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13, 7 (July 1970), 422–426. (wikipedia)
[5] TSCHIPPER, P. BUIP 010 Xtreme Thinblocks. Jan 2016.
[6] CORALLO, M. BIP 152: Compact block relay. April 2016.
[7] ANDRESEN, G. O(1) Block Propagation. August 2014.
[8] CLIFFORD, A., RIZUN, P.R., SUISANI, A., STONE, A., and TSCHIPPER, P. Towards Massive On-Chain Scaling: Presenting Our Block Propagation Results With Xthin. May 2016.
[9] TSCHIPPER, P. Detailed Protocol Design for Xtreme Thin blocks Apr 2017..
The values for the simple (but less efficient) method of minimizing Graphene's Bloom filter and IBLT are derived as follows. Throughout, we assume that there are n transactions in the block, m in the receiver mempool, and x in the intersection between block and receiver mempool (note that the current Bitcoin Unlimited implementation assumes x = n-1). We derive the size of both Bloom filter and IBLT in terms of a, the number of transactions that must be recovered from the IBLT.
First, the size of a Bloom filter in bytes, TBF(a) is given by
TBF(a) = -n ln(f(a)) / (8 ln2(2)) bytes,
where f(a) is the Bloom filter false positive rate defined as
f(a) = a / (m - x).
Next, let TI(a) be the size of the IBLT. As specified above, there are 17 bytes per cell and in order to ensure a high decode rate, each item recovered from the IBLT requires approximately 1.4 cells. Thus, the typical size of an IBLT that is tuned to recover a transactions is
TI(a) = 1.4 * 17 * a bytes.
It follows then that the aggregate size of the Bloom filter and IBLT is
T(a) = TBF(a) + TI(a) = -n ln(a/(m-x))/(8ln2(2)) + 1.4 * 17 * a.
Taking the derivative of this equation with respect to a, setting it equal to 0, and solving it for a, we see that T(a) is minimized when
a = n / (1.4 * 8 * 17 * ln2(2)).
We are grateful for insightful feedback from Andrew Stone, Peter Tschipper, Andrea Suisani, Awemany, and Peter Rizun.
This document is placed in the public domain.