diff --git a/swarm/docs/sw^3/bandwidth.rst b/swarm/docs/sw^3/bandwidth.rst index 7531d5d52800..27d003c3e176 100644 --- a/swarm/docs/sw^3/bandwidth.rst +++ b/swarm/docs/sw^3/bandwidth.rst @@ -145,5 +145,5 @@ By default nodes will store all chunks forwarded as the response to a retrieve r These lookup results are worth storing because repeated requests for the same chunk can be served from the node's local storage without the need to "purchase" the chunk again from others. This strategy implicitly takes care of auto-scaling the network. Chunks originating from retrieval traffic will fill up the local storage adjusting redundancy to use maximum dedicated disk/memory capacity of all nodes. A preference to store frequently retrieved chunks results in higher redundancy aligning with more current usage. All else being equal, the more redundant a chunk, the fewer forwarding hops are expected for their retrieval, thereby reducing expected latency as well as network traffic for popular content. -Whereas retrieval compensation may prove sufficient for keeping the network in a relatively healthy state in terms of latency, from a resilience point of view, extra incentives are needed. We turn to this problem now. +Whereas retrieval compensation may prove sufficient for keeping the network in a relatively healthy state in terms of latency, from a resilience point of view, more work is needed. We may need additional redundancy to be resilient against partial network outages and we need extra incentives to ensure long-term availablilty of content even when it is accessed rarely. In the following two sections we discuss these problems in turn. diff --git a/swarm/docs/sw^3/index.rst b/swarm/docs/sw^3/index.rst index d9dc4fb8abc1..e73882baf8bd 100644 --- a/swarm/docs/sw^3/index.rst +++ b/swarm/docs/sw^3/index.rst @@ -16,6 +16,7 @@ Welcome to the swarm documentation! :maxdepth: 4 bandwidth + erasure storage swap parameters diff --git a/swarm/docs/sw^3/storage.rst b/swarm/docs/sw^3/storage.rst index 8272f6df6937..a2037a20c9e3 100644 --- a/swarm/docs/sw^3/storage.rst +++ b/swarm/docs/sw^3/storage.rst @@ -116,76 +116,7 @@ Assuming all chunks of the original file are different this yields a potential .. rubric:: Footnotes .. [#] We also explored the possibility that degree of redundancy is subsumed under local replication. Local replicas are instances of a chunk stored by nodes in a close neighbourhood. If that particular chunk is crucial in the reconstruction of the content, the swarm is much more vulnerable to chunk loss or latency due to attacks. This is because if the storers of the replicas are close, inflitrating in the storers' neighbourhood can be done with as many nodes as chunk type (as opposed to as many as chunk replicas). If there is cost to sybil attacks this brings down the cost by a factor of n where n is the number of replicas. We concluded that local replication is important for resilience in case of intermittend node dropouts, however, inferior to other solutions to express security level as expressed by the owner. -Luckily there are a lot more economical ways to encode a file redundantly. In what follows we spell out our proposal to introduce a scheme for loss tolerant swarm hash. - -Loss-tolerant Merkle Trees ----------------------------------------------------------- - -Recall that each node (except possibly the last one on each level) has 128 children each of which represent the root hash of a subtree or, at the last level, represent a 4096 byte span of the file. Let us now suppose that we divide our file into 100 equally sized pieces, and then add 28 more parity check pieces using a Reed-Solomon code so that now any 100 of the 128 pieces are sufficient to reconstruct the file. On the next level up the chunks are composed of the hashes of their first hundered data chunks and the 28 hashes of the parity chunks. Let's take the first 100 of these and add an additional 28 parity chunks to those such that any 100 of the resulting 128 chunks are sufficient to reconstruct the origial 100 chunks. And so on on every level. -In terms of availability, every subtree is equally important to every other subtree at this level. The resulting data structure is not a balanced tree since on every level :math:`i` the last 28 chunks are parity leaf chunks while the first 100 are branching nodes encoding a subtree of depth :math:`i-1` redundantly. - -In practice of course, data chunks are still prefered over the parity chunks in order to avoid CPU overhead in reconstruction. This data structure has preserved its merkle properties and can be used for partial integrity check. - - - -The Cauchy-Reed-Solomon (henceforth CRS) scheme is a systemic erasure code capable of implementing a scheme whereby any :math:`m` out of :math:`n` fix-sized pieces are able to reconstruct the original data blob of size :math:`m` pieces with storage overhead of -:math:`n-m` [#]_ . Once we got the :math:`m` pieces of the original blob, CRS scheme provides a method to inflate it to size :math:`n` by supplementing :math:`n-m` so called parity pieces. With that done, assuming `p` is the probability of losing one piece, if all :math:`n` pieces are independently stored, the probability of loosing the original content is :math:`p^{n-m+1}` exponential while extra storage is linear. These properties are preserved if we apply the coding to every level of a swarm chunk tree. - -.. rubric:: Footnotes -.. [#] There are open source libraries to do Reed Solomon or Cauchy-Reed-Solomon coding. See https://www.usenix.org/legacy/event/fast09/tech/full_papers/plank/plank_html/, https://www.backblaze.com/blog/reed-solomon/, http://rscode.sourceforge.net/. - -Assuming we fix :math:`n=128` the branching factor of the swarm hash (chunker). -The chunker algorithm would proceed the following way when splitting the document: - -0. Set input to the data blob. -1. Read the input 4096 byte chunks at a time. Count the chunks by incrementing :math:`i` - IF fewer than 4096 bytes are left in the file, fill up the last fraction to 4096 -2. Repeat 1 until there's no more data or :math:`i \mod m=0` -3. If there is no more data add padding of :math:`j` chunks such that :math:`i+j \mod m=0`. -3. use the CRS scheme on the last :math:`m` chunks to produce :math:`128-m` parity chunks resulting in a total of 128 chunks. -4. Record the hashes of the 128 chunks concatenated to result in the next 4096 byte chunk of the next level. -5. If there is more data repeat 1. otherwise -6. If the next level data blob is of size larger than 4096, set the input to this and repeat from 1. -7. Otherwise remember the blob as the root chunk - - -Benefits of CRS merkle tree -------------------------------------- - -This per-level m-of-n Cauchy-Reed-Solomon erasure code introduced into the swarm chunk tree does not only ensure file availability, but also offers further benefits of increased resilience and ways to speed up retrieval. - - -All chunks are created equal -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A tree encoded as suggested above has the same redundancy at every node [#]_ . This means that chunks nearer to the root are no longer more important than chunks near the file. Every node has an m-of-128 redundancy level and no chunk after the root chunk is more important than any other chunk. - -.. rubric:: Footnotes -.. [#] If the filesize is not a specific multiple of 4096 bytes, then the last chunk at every level will actually have a higher redundancy even than the rest. - - -A problem that immediately presents itself is the following: if nodes are compensated only for serving chunks, then less popular chunks are less profitable and more likely to be deleted; therefore, if users only download the 100 data chunks and never request the parity chunks, then these are more likely to get deleted and ultimately not be available when they are finally needed. - -Another approach would be to use non-systemic coding. A systemic code is one in which the data remains intact and we add extra parity data whereas in a non-systemic code we replace all data with parity data such that (in our example) all 128 pieces are really created equal. While the symmetry of this approach is appealing, this leads to forced decoding and thus to a high CPU usage even in normal operation and it also prevents us from streaming files from the swarm. - -Luckily the problem is solved by the automated audit scheme which audits the integrity of all chunks and does not distinguish between data or parity chunks. - -Self healing -^^^^^^^^^^^^^^^^^^^^^^ - -Any client downloading a file from the swarm can detect if a chunk has been lost. The client can reconstruct the file from the parity data (or reconstruct the parity data from the file) and resync this data into the swarm. That way, even if a large fraction of the swarm is wiped out simultaneously, this process should allow an organic healing process to occur and it is encouraged that the default client behavior should be to repair any damage detected. - -Improving latecy of retrievals -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Alpha is the name the original Kademlia gives to the number of peers in a Kademlia bin that are queried simultaneously during a lookup. The original Kademlia paper sets alpha=3. This is impractical for Swarm because the peers do not report back with new addresses as they would do in pure Kademlia but instead forward all queries to their peers. Swarm is coded in this way to make use of semi-stable longer-term devp2p connections. Setting alpha to anything greater than 1 thus increases the amount of network traffic substantially – setting up an exponential cascade of forwarded lookups (but it would soon collapse back down onto the target of the lookup). -However, setting alpha=1 has its own downsides. For instance, lookups can stall if they are forwarded to a dead node and even if all nodes are live, there could be large delays before a query is complete. The practice of setting alpha=2 in swarm is designed to speed up file retrieval and clients are configured to accept chunks from the first/fastest forwarding connection to be established. -In an erasure coded setting we can in a sense have a best of both worlds. The default behavior should be to set alpha=1 i.e. to query one peer only for each chunk lookup, but crucially, to issue a lookup request not just for the data chunks but for the parity chunks as well. The client then could accept the first m of every 128 chunks queried to get some of the same benefits of faster retrieval that redundant lookups provide without a whole exponential cascade. - - -Improving resilience in case of non-saturated Kademlia table -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - +Luckily there are a lot more economical ways to encode a file redundantly. In particular the erasure coded, loss tolerant merkle tree, discussed in the previous section, allows the user to choose their own level of guaranteed data redundancy and security. From here on we assume that the user applied CRS encoding when splitting their content and therefore expressed their desired degree of redundancy in the CRS parameters, the price of which they pay in terms of the increased number of chunks they need to pay storage for without adding complexity to the storage distribution and pricing.