Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ZstdInputStream #1

Open
wants to merge 3 commits into
base: preview-master
Choose a base branch
from

Conversation

arnej27959
Copy link
Owner

preview of PR

@arnej27959 arnej27959 requested a review from jonmv July 5, 2022 08:13
@arnej27959
Copy link
Owner Author

@jonmv can we go through this together at some point? i have some thoughts but very uncertain about best direction.

@jonmv
Copy link
Collaborator

jonmv commented Jul 5, 2022

Of course! Anything I should look at to prepare? I'm not familiar with zstd compression, so perhaps a whitepaper of sorts?

@arnej27959
Copy link
Owner Author

look at the simple (non-streaming) decompressor in this library first:

https://github.com/arnej27959/aircompressor/blob/preview-master/src/main/java/io/airlift/compress/zstd/ZstdFrameDecompressor.java

for the format itself, this is quite readable:

https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md

inputPosition = 0;
}
else {
int newSize = (inputBuffer.length * 2 + size) & BUFFER_SIZE_MASK;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be 0 if the new size is otherwise too small, which can happen if the constructor specifying an input buffer size is used, with a small argument. I also didn't check for the possibility of overflow.

int magic = bb.getInt();
if ((magic >= 0x184D2A50) && (magic <= 0x184D2A5F)) {
inputPosition += SIZE_OF_INT;
skipBytes = bb.getInt();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be interpreted as an unsigned int.

}
else {
skipBytes -= inputAvailable();
inputPosition = 0;
Copy link
Collaborator

@jonmv jonmv Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing update to evictedInput? Perhaps a method for decreasing input position (and incrementing evicted counter) is a good idea?

outputPosition = 0;
outputEnd = 0;
if (singleSegmentFlag) {
windowSize = (int) curHeader.contentSize;
Copy link
Collaborator

@jonmv jonmv Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contentSize is an unsigned long, but we can't reliably use that as window size.

ensureOutputSpace(windowSize);
}
else {
windowSize = curHeader.windowSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, windowSize could also overflow an (unsigned) int.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The library already enforces windowSize < 1 << 23, so I guess this could be used here as well.

{
if (outputSpace() < size) {
check(outputAvailable() == 0, "logic error");
if (windowSize * 4 + size < outputPosition) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overflow?

int newSize = (outputBuffer.length
+ windowSize * 4
+ size
+ DEFAULT_BUFFER_SIZE) & BUFFER_SIZE_MASK;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overflow?

blockDecompressor = null;
if (contentChecksumFlag) {
check(inputAvailable() >= SIZE_OF_INT, "missing checksum data");
inputPosition += SIZE_OF_INT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally skipped for now?

jonmv and others added 2 commits July 5, 2022 14:48
* Pick out bits with binary literals, for readability
* Simplify making room for output
* Read block headers with binary literals
* A couple of comments, and access static method through class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants