-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snappy compression for DEVp2p #706
Conversation
Sup? I have been pinged. The beacon has been lit. |
Feedback welcome :) |
👍 |
👍
I think that sounds like a reasonable thing to have, yes. |
@holiman Any suggestion on a message size limit? |
Oh, no I don't know enough about the layer to make a good suggestion. |
Can I get a sense for how this would effect light clients? Would light clients be required to implement this compression/decompression as well? For Context: We're aiming to have Py-EVM be primary used as a light client and thus are trying to not require any system dependencies. So far we've been able to skirt around all of them, but this would require us to create our own pure python implementation of snappy (as I can't find any existing ones that seem well supported). |
One concern that I have is that it can reveal size information about the payload if the application protocol is padding messages and trusting on the devp2p layer for encryption. If the compression algorithm is able to compress the added padding it can make the padding useless. Whisper comes to mind but I believe that whisper does padding and encryption on the application level before pushing it to the network layer. So it probably isn't a problem for whisper. It can also be mitigated by choosing a padding algorithm that the compression algorithm cannot compress efficiently. |
Even with random noise padding there are attacks to try to compromise the encryption :( |
Is there a reason to use snappy over lz4? It seems that it has a slightly better compression rate, while also offering better speed. I think it was also quite a bit faster for decompression, but I couldn't find any reference for that in my quick search to confirm. |
@pipermerriam This would lie in the devp2p protocol, on top of which all other protocols (including light client) runs. As such, yes. There would also be the possibility to make compression optional and enable it only if both sides signal via a flag in the protocol handshake. The issue I see is that "optional features" leads to a openssl-type pandorra's box where there are so many different modes of operation to mix and match, that it was extremely hard to make a properly secure combo. So my suggestion would be to enforce it and not allow wiggle-room for implementers to write suboptimal code. @bas-vk I don't think devp2p should aim to provide darkness itself, rather to be a performant general purpose transport. If Whisper encrypts all its messages, Snappy should not be able to compress it at all, so it should not be able to interfere. IMHO an applciation layer protocol should not rely on the underlying transport being one kind or another for security. I can imagine eventually having gateways into/out of Ethereum from other protocols, where it won't be devp2p, and app protocols shouldn't break because of it. @awishformore I haven't done extensive benchmarks or comparisons between different compression algorithms, but Snappy seems to be fast enough (a single i7 core with the Go implementation can encode at 350MB/s text files (a lot of work, but a lot of gain too) and 15.7GB/s jpegs (non compressible)). At these rates compression speed doesn't seem to matter compared to network bandwidth limitations. The other benefit of Snappy is that it has widely available and battle-tested implementations for just about any programming language out there: https://google.github.io/snappy/. As such, clients have ready tools out of the box that can be incorporated without any effort. I'm happy to consider other compression algos if it can beat Snappy at both operation speed and implementation support. |
@mightypenguin I would not consider the underlying devp2p transport a secure element. Purely because these higher level protocols (eth, les, shh, bzz) may run on top of any other transport too, so there's no guarantee that somewhere in the chain the content won't fly in plain-text. As such, if an application protocol needs security, it needs to take care of it itself and not rely on undernets. |
Would it make sense to enforce 16 MB limit for the decompressed size on the devp2p level? That's easier than implementing lazy decompressing (but also may be used together with it) |
@gumb0 that would indeed be the most conservative solution as it would keep all current limits in place, just performing better transport wise. |
@karalabe just FYI, on the original repository, you can see at https://github.com/lz4/lz4 that it beats Snappy by around 2x for decompression speed. As far as I'm aware of, there are native implementations in Go, C/C++, C#, Java (probably some others) and bindings for pretty much any other language. |
I can't seem to find pure Go implementation apart from https://github.com/pierrec/lz4, which I don't find reliable enough to trust given that there are no benchmarks vs. Snappy having been developed officially by Google. |
I'd also prefer Snappy over Lz4, since snappy is well maintained and battle-tested. |
It seems Snappy has a pure Rust and Javascript implementations too (not sure about the speed), so that's a plus. |
I'd suggest making this optional on the subprotocol level. During handshake nodes would exchange an extra bit with each capability that tells if they want compression for that capability. If both parties agree on that bit the subprotocol session will be compressed. Some subprotocols, such as parity snapshot download and whisper already exchange compressed or encrypted data, so they would benefit from disabling devp2p compression. |
@karalabe are you suggesting to also use the Snappy framing format? (Which has support for different chunks, including uncompressed chunks, for incompressible blocks). I think the framing format is quite useless otherwise, because it has a header and a checksum, all of which are already provided by devp2p I assume. |
@axic No, I'm suggesting just plain compression of the message payloads. |
@karalabe perhaps you could make this clear and also link to the spec https://github.com/google/snappy/blob/master/format_description.txt Also worth mentioning in the "16Mb limit section" that the decompressed length is the first value in the stream:
@arkpar it seems that up to 4GB of literal data can be stored in a chunk in Snappy, should the compressor detect it is incompressible, it would store it as a literal.
|
@arkpar I don't think the complexity is worth the performance save.
The current proposal is easy to add to any client, elegant in that it's immediately useful for all subprotocols and backward compatible with old clients. If you think your suggestion can be done in a similar way I'd be curious to see a reference implementation, but as I highlighted above, I don't think it can be done without a coordinated overhaul of the devp2p protocol once again. |
I agree, and think the extra engineering effort of selective compression isn't worth it; compression is fast and widely useful, and far simpler if just implemented across the board. |
After having a day to think about this I'm going to formally withdraw any objections to this being required or suggestion that we make it optional. Snappy seems well supported and we'll figure out a way to work around the dependency if needed. |
Simple snappy test vectors would be convenient but I can't find any. This is what I'm getting for input "Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project." with xerial/snappy-java: 51F05057696B697065646961206973206120667265652C207765622D62617365642C20636F6C6C61626F7261746976652C206D756C74696C696E6775616C20656E6379636C6F70656469612070726F6A6563742E And in framed stream mode (x-snappy-framed): FF060000734E6150705901550000609A375657696B697065646961206973206120667265652C207765622D62617365642C20636F6C6C61626F7261746976652C206D756C74696C696E6775616C20656E6379636C6F70656469612070726F6A6563742E Note that the example compression on the Snappy Wikipedia page is invalid: CA02F04257696B697065646961206973206120667265652C207765622D62617365642C20636F6C6C61626F7261746976652C206D756C74696C696E6775616C20656E6379636C6F093FF08170726F6A6563742E00000000000000000000000000 |
Indeed, I want to add a few tests before merging the EIP, hence why I
didn't yet. Will try today, though no promise.
…On Sep 23, 2017 12:12, "Evan Saulpaugh" ***@***.***> wrote:
Snappy test vectors would be convenient but I can't find any. This is what
I'm getting for "Wikipedia is a free, web-based, collaborative,
multilingual encyclopedia project." with snappy-java.
51F05057696B697065646961206973206120667265652C207765622D6261
7365642C20636F6C6C61626F7261746976652C206D756C74696C696E6775
616C20656E6379636C6F70656469612070726F6A6563742E
And in framed stream mode (x-snappy-framed):
FF060000734E6150705901550000609A375657696B697065646961206973
206120667265652C207765622D62617365642C20636F6C6C61626F726174
6976652C206D756C74696C696E6775616C20656E6379636C6F7065646961
2070726F6A6563742E
Note that the example compression on the Snappy Wikipedia page is invalid:
CA02F04257696B697065646961206973206120667265652C207765622D62
617365642C20636F6C6C61626F7261746976652C206D756C74696C696E67
75616C20656E6379636C6F093FF08170726F6A6563742E00000000000000000000000000
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#706 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAH6GdUk3TS2oLaIQVEUUjHpUoyFOFk_ks5slMtggaJpZM4PQC_1>
.
|
@esaulpaugh @axic I've added a note that no framing is used whatsoever due to devp2p already being message based. Also I've provided 2 test vector encodings of a block RLP generated with Go and Python, along with verification utilities in both languages. |
EIPS/eip-devp2p-snappy.md
Outdated
@@ -0,0 +1,192 @@ | |||
## Preamble | |||
|
|||
EIP: <to be assigned> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EIP: 706
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor changes and spelling corrections.
EIPS/eip-devp2p-snappy.md
Outdated
## Preamble | ||
|
||
EIP: <to be assigned> | ||
Title: devp2p snappy compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEVp2p Snappy Compression
EIPS/eip-devp2p-snappy.md
Outdated
Author: Péter Szilágyi <peter@ethereum.org> | ||
Type: Standard Track | ||
Category: Networking | ||
Status: Draft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status: Final
EIPS/eip-devp2p-snappy.md
Outdated
Created: 2017-09-07 | ||
|
||
## Abstract | ||
The base networking protocol (devp2p) used by Ethereum currently does not employ any form of compression. This results in a massive amount of bandwidth wasted in the entire network, making both initial sync as well as normal operation slower and laggyer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slower and laggier
EIPS/eip-devp2p-snappy.md
Outdated
|
||
However, most of this data (blocks, transactions) are heavily compressable. By enabling compression at the message payload level, we can reduce the previous numbers to 1.01GB upload / 13.46GB download on the main network, and 46.21MB upload / 463.65MB download on the test network. | ||
|
||
The motivation behind doing this at the devp2p level (opposed to eth for example) is that it would enable compression for all sub-protocols (eth, les, bzz) seamlessly, reducing any complexity those protocols might incur in trying to individually optimize for data traffic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEVp2p
EIPS/eip-devp2p-snappy.md
Outdated
|
||
### Avoiding DOS attacks | ||
|
||
Currently a devp2p message length is limited to 24 bits, amoutning to a maximum size of 16MB. With the introduction of Snappy compression, care must be taken not to blidly decompress messages, since they may get significantly larger than 16MB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amounting spelling fix.
blindy spelling fix.
EIPS/eip-devp2p-snappy.md
Outdated
|
||
Currently a devp2p message length is limited to 24 bits, amoutning to a maximum size of 16MB. With the introduction of Snappy compression, care must be taken not to blidly decompress messages, since they may get significantly larger than 16MB. | ||
|
||
However, Snappy is capable of calculating the decompressed size of an input message without inflating it in memory. This can be used to discard any messages which decompress above some threshold. **The proposal is to use the same limit (16MB) as the threshold for decompressed messages.** This retains the same guarantees that the current devp2p protocol does, so there won't be surprises in application level protocols. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current DEVp2p
EIPS/eip-devp2p-snappy.md
Outdated
|
||
**Alternative solutions to data compression that have been brought up and discarded are:** | ||
|
||
Extend protocol `xyz` to support compressed messages versus doing it at devp2p level: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEVp2p
EIPS/eip-devp2p-snappy.md
Outdated
|
||
Don't explicitly limit the decompressed message size, only the compressed one: | ||
|
||
* **Pro**: Allows larger messages to traverse through devp2p. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEVp2p
EIPS/eip-devp2p-snappy.md
Outdated
* **Con**: Needs lazy decompression to allow size limitations without DOS. | ||
|
||
## Backwards Compatibility | ||
This proposal is fully backward compatible. Clients upgrading to the proposed devp2p protocol version `5` should still support skipping the compression step for connections that only advertise version `4` of the devp2p protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEVp2p
@Souptacular PTAL |
EIPS/eip-devp2p-snappy.md
Outdated
|
||
Currently a DEVp2p message length is limited to 24 bits, amounting to a maximum size of 16MB. With the introduction of Snappy compression, care must be taken not to blindy decompress messages, since they may get significantly larger than 16MB. | ||
|
||
However, Snappy is capable of calculating the decompressed size of an input message without inflating it in memory. This can be used to discard any messages which decompress above some threshold. **The proposal is to use the same limit (16MB) as the threshold for decompressed messages.** This retains the same guarantees that the current DEVp2p protocol does, so there won't be surprises in application level protocols. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you extend Snappy is capable of calculating the decompressed size of an input message without inflating it in memory with (taken from the spec):
The stream starts with the uncompressed length (up to a maximum of 2^32 - 1),
stored as a little-endian varint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
* The handshake message is **never** compressed, since it is needed to negotiate the common version. | ||
* Snappy framing is **not** used, since the DEVp2p protocol already message oriented. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add here (or the "references" section) the link to the reference spec: https://github.com/google/snappy/blob/master/format_description.txt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a references section to the end.
|
||
* The handshake message is **never** compressed, since it is needed to negotiate the common version. | ||
* Snappy framing is **not** used, since the DEVp2p protocol already message oriented. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also would mention as a feature that should be no concern about compressing already compressed data since Snappy can store uncompressed literals up to 4GB in size should the compressor choose so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note.
Also if it was agreed to include this in Constantinople it would make sense changing the README too. |
@axic Addressed all your comments. Btw, you don't need a hard fork to enable this update. DEVp2p v4 continues to operate without Snappy, and v5 with Snappy. The two can coexist within a node. PTAL. |
Changes have been applied and the EIP was agreed upon in Core dev meeting 25: ethereum/pm#23 |
Ping @fjl @Arachnid @vbuterin @Souptacular @arkpar @holiman.
Open question: Should we enforce compression for all connections running devp2p v5, or should we add a capabilities bitset instead which can be used to signal current (and potentially future) compression capabilities and let clients agree on common ones? The non obvious thing here is the extra complexity if multiple conflicting capabilities match (e.g. multiple compression algos, which one do we chose).
Update: