-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC: Add compression properties #4273
Conversation
|
||
return new Context(stripeSize, blockSize); | ||
static String lookupValue(Configuration conf, OrcConf orcConf, String icebergPropertyKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is wrong if this is looking up an Iceberg table property like ORC_STRIPE_SIZE_BYTES
in a Hadoop Configuration. Iceberg settings should be pulled from table properties, not from Hadoop.
I don't think that we can merge this until we fix the ORC configuration PR that recently went in. See #3810 (comment) for more details. The configuration context should not use Hadoop Configuration. |
a93bda2
to
d54c377
Compare
Updated based on #4291. |
String codecAsString = config.getOrDefault(OrcConf.COMPRESS.getAttribute(), ORC_COMPRESSION_DEFAULT); | ||
codecAsString = config.getOrDefault(ORC_COMPRESSION, codecAsString); | ||
|
||
String compressionStrategy = config.getOrDefault(OrcConf.COMPRESSION_STRATEGY.getAttribute(), | ||
ORC_COMPRESSION_STRATEGY_DEFAULT); | ||
compressionStrategy = config.getOrDefault(ORC_COMPRESSION_STRATEGY, compressionStrategy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just follow the same config parse approach as the above config keys did ? I'm saying using the generic PropertyUtil.propertyAsString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, please also use the CompressionStrategy
enum to validate whether this configured value is a valid one ? ( Just as this comment said: https://github.com/apache/iceberg/pull/4273/files#r836994947 )
Preconditions.checkArgument(vectorizedRowBatchSize > 0, "VectorizedRow batch size must be > 0"); | ||
|
||
return new Context(stripeSize, blockSize, vectorizedRowBatchSize); | ||
String codecAsString = config.getOrDefault(OrcConf.COMPRESS.getAttribute(), ORC_COMPRESSION_DEFAULT); | ||
codecAsString = config.getOrDefault(ORC_COMPRESSION, codecAsString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will suggest to use the similar approach as parquet
did: try to parse the string into a CompressionKind
so that we can validate whether this value is a valid one in the correct place.
String codecAsString = config.getOrDefault(DELETE_ORC_COMPRESSION, dataContext.codecAsString()); | ||
|
||
String compressionStrategy = config.getOrDefault(DELETE_ORC_COMPRESSION_STRATEGY, | ||
dataContext.compressionStrategy()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please use the same approach to parse the codec
& compression-strategy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just left serval comments, and I think we are recommended to add unit tests as this PR did to avoid regression .
Updated based on comments. |
@@ -82,19 +93,29 @@ public void testOrcTableProperties() throws Exception { | |||
DynFields.builder().hiddenImpl(writer.getClass(), "conf").build(writer); | |||
|
|||
Configuration configuration = confField.get(); | |||
Assert.assertEquals(String.valueOf(blockSizeBytes), configuration.get(OrcConf.BLOCK_SIZE.getAttribute())); | |||
Assert.assertEquals(String.valueOf(stripeSizeBytes), configuration.get(OrcConf.STRIPE_SIZE.getAttribute())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to using OrcConf to get value directly so no need to know property key detail.
@@ -53,12 +56,20 @@ | |||
|
|||
@Test | |||
public void testOrcTableProperties() throws Exception { | |||
Long stripeSizeBytes = 32L * 1024 * 1024; | |||
Long blockSizeBytes = 128L * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed Long
to long
because what OrcConf.XXX#getLong
returns is a long
, and I got
Ambiguous method call. Both assertEquals(Object, Object) in Assert and assertEquals(long, long) in Assert match
when using both Long
and long
as parameters for Assert#assertEquals
.
Thanks all for your review! |
This PR adds compression-codec and compression-strategy of ORC in
TableProperties
, @openinx @hililiwei would you help review this? Thanks!