-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: expose header decoder as public API mimicking zstd -lv foo.zst
#237
Comments
I could be added, but it would not be super reliable.
My suggestion in general would be to keep the size outside the compression part. |
I work with externally supplied archives, without a side-channel for supplying size. It is true that the presence of the value is not guaranteed, but when it is there, it is there. At present I am doing an abomination like this:
Replacing the exec() abomination with reliable code you've already written would be a plus. Please reconsider! 😉 |
:cryingbear: indeed. A quite reliable way would be to parse all block headers of a stream. It would of course requiring reading through the stream, but blocks would not need to be decompressed, so it should be possible to do at IO speeds. I am not a super fan of returning the frame header values since it is unreliable when multiple frames are present. |
I apologize for being imprecise - I was actually not directly suggesting that. I was more asking about the possibility of a more general-ish method that would utilize internally the part I linked. Something that works only when the input buffer is a regular file or somesuch For reference |
Note that there is precedent on this already:
|
Agree. I would like something that always works no matter how the file was compressed, but it could optionally allow unreliable numbers. It could be something like: // UncompressedSize returns the uncompressed size of the provided stream.
// If scanBlocks is true all blocks of the stream will be scanned.
// While very fast it will require to read the entire stream.
// If onlyUseHeader is true any value in the header size is returned.
// If no header size is set blocks will be scanned if scanBlocks is set.
func UncompressedSize(r io.Reader, scanBlocks, onlyUseHeader bool) (int64, error) |
@klauspost I love it! |
It shouldn't take a crazy amount of time to do, but I can't make specific promises. |
No worries at all. As you can see I have a ( horrid ) workaround in place for this. I will rip it out when your version lands. Thank you for considering my request! |
Actually now that I think about this - the "lighter" option will have to consume the first header in the stream anyway, so further decompression will fail. I wonder if there is a way to keep some state in an instance to make the following possible:
|
I know that would be convenient, but it would also mean a lot of housekeeping and scanning blocks will be impossible since it would have to buffer the entire stream. So if you want the size you must be able to reset the stream yourself. |
This is fair, and I agree with your justification. However in this case the proposed API Moreover, at least for TLDR: it seems that the following would be both simpler and more robust: |
@ribasushi I looked into this. Unfortunately compressed blocks don't have their uncompressed size available and the sequences must be decoded, which is quite expensive. I will still add a function to get the frame header and the first block. |
As per subject it would be awesome if
compress/zstd/framedec.go
Lines 183 to 206 in 98b287b
zstd
binary or to decompress a stream and count the bytes 🙀The text was updated successfully, but these errors were encountered: