Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Oct 28, 2025

Rationale for this change

  1. Add more compression codecs to seed corpus
  2. Tweak fuzz target to make fuzzing slightly faster (around ~30% locally according to my measurements), which will allow testing more mutations per day

Are these changes tested?

Not specifically by CI, but hopefully they will make fuzzing more efficient.

Are there any user-facing changes?

No.

@pitrou
Copy link
Member Author

pitrou commented Oct 28, 2025

@github-actions crossbow submit fuzz

@pitrou pitrou marked this pull request as ready for review October 28, 2025 15:55
@pitrou pitrou requested a review from wgtmac as a code owner October 28, 2025 15:55
@github-actions
Copy link

Revision: efa842a

Submitted crossbow builds: ursacomputing/crossbow @ actions-9baaa59a9e

Task Status
test-build-cpp-fuzz GitHub Actions

@pitrou pitrou requested a review from adamreeve October 28, 2025 16:01
@pitrou
Copy link
Member Author

pitrou commented Oct 28, 2025

@github-actions crossbow submit fuzz

@github-actions
Copy link

Revision: 4bbdd6a

Submitted crossbow builds: ursacomputing/crossbow @ actions-001f65930b

Task Status
test-build-cpp-fuzz GitHub Actions

@pitrou
Copy link
Member Author

pitrou commented Oct 29, 2025

@wgtmac @EnricoMi I would welcome a review on this.

@EnricoMi
Copy link
Contributor

Nice! What about the other compressions (LZO, BZ2, LZ4_*)?

@pitrou
Copy link
Member Author

pitrou commented Oct 29, 2025

These are all the compression codecs that Parquet C++ supports. BZ2 is not in the Parquet format, and LZO is closed source.

@EnricoMi
Copy link
Contributor

These are all the compression codecs that Parquet C++ supports. BZ2 is not in the Parquet format, and LZO is closed source.

Fine with me, was just wondering since they are defined:

struct Compression {
/// \brief Compression algorithm
enum type {
UNCOMPRESSED,
SNAPPY,
GZIP,
BROTLI,
ZSTD,
LZ4,
LZ4_FRAME,
LZO,
BZ2,
LZ4_HADOOP
};
};

Copy link
Contributor

@EnricoMi EnricoMi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 29, 2025
@pitrou
Copy link
Member Author

pitrou commented Oct 29, 2025

Fine with me, was just wondering since they are defined:

Well, some of them are supported by other parts of Arrow. For example you could read a BZ2-compressed CSV file.

@pitrou pitrou merged commit 42f27ab into apache:main Oct 29, 2025
41 of 43 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Oct 29, 2025
@pitrou pitrou deleted the gh47978-pq-seed-corpus-codecs branch October 29, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants