Skip to content

[SPARK-54571][CORE] Use LZ4 safeDecompressor#54496

Closed
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-54571/fix/use-lz4-safe-decompressor
Closed

[SPARK-54571][CORE] Use LZ4 safeDecompressor#54496
Yicong-Huang wants to merge 1 commit intoapache:masterfrom
Yicong-Huang:SPARK-54571/fix/use-lz4-safe-decompressor

Conversation

@Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Feb 26, 2026

What changes were proposed in this pull request?

Switch LZ4CompressionCodec.compressedInputStream to use lz4Factory.safeDecompressor() instead of lz4Factory.fastDecompressor().

Since Spark upgraded to at.yawk.lz4:lz4-java:1.10.0 (via #53327), LZ4BlockInputStream now has a Builder API that accepts LZ4SafeDecompressor. This PR uses that API:

LZ4BlockInputStream.newBuilder()
  .withDecompressor(lz4Factory.safeDecompressor())
  .withChecksum(xxHashFactory.newStreamingHash32(defaultSeed).asChecksum)
  .withStopOnEmptyBlock(false)
  .build(s)

This is a follow-up to #53290, which was blocked because LZ4BlockInputStream in lz4-java 1.8.0 did not accept LZ4SafeDecompressor. The lz4-java 1.10.0 Builder API unblocks this change.

Why are the changes needed?

As noted by the lz4-java maintainer (@yawkat) in #53290, as of lz4-java 1.8.1, safeDecompressor is substantially faster than fastDecompressor. The recommended migration path with proper performance is to use safeDecompressor via the 1.10.0+ Builder API.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@dbtsai
Copy link
Member

dbtsai commented Feb 26, 2026

Do you have benchmark? Thanks.

@pan3793
Copy link
Member

pan3793 commented Feb 26, 2026

You guys might have missed #53454

@Yicong-Huang
Copy link
Contributor Author

You guys might have missed #53454

Oops, yeah sorry we have missed your PR. This was brought into sight again due to performance regression. I will close this one and let's move our discussion in your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants