-
Notifications
You must be signed in to change notification settings - Fork 308
HPCC-34656 Allow BlockCompressed nodes to have configurable compression #20423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-34656 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements configurable compression for BlockCompressed index nodes, transitioning from legacy compression to a more flexible block-based approach. The changes enable the HybridIndexCompressor to use BlockCompressed nodes for leaf nodes while maintaining configurable compression methods.
Key changes:
- Modified HybridIndexCompressor to use BlockCompressed nodes for leaf compression
- Added configurable compression method support to BlockCompressed nodes with COMPRESS_METHOD_ZSTDS as default
- Removed legacy uncompressed pathways and simplified compression logic
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
system/jlib/jlzw.hpp | Added explicit byte type to CompressionMethod enum |
system/jhtree/keybuild.cpp | Updated HybridIndexCompressor to use BlockCompressedIndexCompressor for leaf nodes |
system/jhtree/jhblockcompressed.hpp | Added configurable compression support and new BlockCompressedIndexCompressor class |
system/jhtree/jhblockcompressed.cpp | Simplified compression logic and removed legacy uncompressed pathways |
56c2da5
to
de9de84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpmcmu looks good.
A few items that can be cleaned up.
I would be tempted to leave the compress options to a different PR, and at the same time add logic to reuse the compressor for multiple nodes. (See inplace compressor logic which does that.)
Deleteing the expandedPayload logic can be done as part of the clean up, or as a separate PR.
Other items to do in later PRs include potentially adding a compensating delay, and looking at other potential changes to the node format.
system/jhtree/jhblockcompressed.cpp
Outdated
|
||
keyLen = _keyHdr->getMaxKeyLength(); | ||
keyCompareLen = _keyHdr->getNodeKeyLength(); | ||
if (hdr.nodeType == NodeBranch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be deleted - always a leaf node.
system/jhtree/jhblockcompressed.hpp
Outdated
|
||
protected: | ||
//The following are used for faking payload events | ||
mutable CriticalSection expandPayloadCS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment above the critical section where it was originally defined to see why it was not a member.
I think I would actually prefer to delete this and the expandedPayload member, and delete the logical associated with it. It doesn't really help to fake the events, and it increases the complication/memory size.
system/jhtree/jhblockcompressed.hpp
Outdated
CompressionMethod method = translateToCompMethod(option, COMPRESS_METHOD_NONE); | ||
if (method != COMPRESS_METHOD_NONE) | ||
{ | ||
compressionMethod = method; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include some of the logic from the inplace compression to support compression options, both here in "value", and also name="compressopt"
system/jhtree/jhblockcompressed.cpp
Outdated
hdr.keyBytes += sizeof(compressionMethod); | ||
|
||
//Adjust the fixed key size to include the fileposition field which is written by writekey. | ||
bool rowCompressed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used I think
system/jhtree/jhblockcompressed.cpp
Outdated
size32_t fixedKeySize = isVariable ? 0 : keyLen + sizeof(offset_t); | ||
|
||
ICompressHandler * handler = queryCompressHandler(compressionMethod); | ||
compressor.open(keyPtr, maxBytes-hdr.keyBytes, handler, "", isVariable, fixedKeySize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow options to be passed -possibly in a follow on PR...
@ghalliday pushed up code review changes, I went ahead and added the compressorOptions to this PR but I haven't added in the option to reuse the compressor yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpmcmu please fix the one issue, and squash.
system/jhtree/jhblockcompressed.cpp
Outdated
if (hdr.nodeType == NodeBranch) | ||
keyLen = keyHdr->getNodeKeyLength(); | ||
|
||
keyLen = keyHdr->getNodeKeyLength(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong line - you should have kept .
keyLen = _keyHdr->getMaxKeyLength();
it is always a leaf
- Updated hybrid index compressor to use blockcompressed for leafs - Modified BlockCompressed nodes to make compression configurable - Removed uncompressed pathways from BlockCompressed Signed-off-by: James McMullan James.McMullan@lexisnexis.com
Jirabot Action Result: |
Signed-off-by: James McMullan James.McMullan@lexisnexis.com
Type of change:
Checklist:
Smoketest:
Testing: