-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC, Hive: Fix column projection when reading ORC files and orc.force.positional.evolution
is set to true
on the default configuration
#2111
Conversation
…e.positional.evolution` is set to `true` on the default configuration
If I understand correctly, the id-based projection requests fields by name and if the projection type is changed in the Hive conf, that causes the wrong projection. Right? If so, then I agree with the fix to set this in the conf that we use for readers. |
mr/src/test/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandlerTestUtils.java
Show resolved
Hide resolved
@@ -106,7 +115,9 @@ public void start(int poolSize) { | |||
|
|||
TServerSocket socket = new TServerSocket(0); | |||
int port = socket.getServerSocket().getLocalPort(); | |||
this.hiveConf = newHiveConf(port); | |||
initConf(conf, port); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: if initConf
returned the configuration, this could be one line still: this.hiveConf = initConf(conf, port);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the change the configuration was created in newHiveConf()
. The new method changes values in the conf parameter. The change meant to emphasize that we are not creating a new conf, only changing values in the one got in the parameter list.
Shall we move back to create a new config and initialize that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If conf is passed in, I think the change to initConf
makes sense. I just think the patch could be slightly smaller. Not a blocker, though.
After this change (HIVE-20129 Revert to position based schema evolution for orc tables) in Hive the Iceberg ORC column projection is not working as expected.
The effect:
I think it would be good to turn off this config for our Iceberg ORC readers as we expect it to be false anyway and other systems using Iceberg might need it to be set differently.
The patch contains 3 groups of changes:
orc.force.positional.evolution
totrue
in the tests - this causesTestHiveIcebergStorageHandlerLocalScan.testColumnSelection()
to failorc.force.positional.evolution
tofalse
inORC.ReadBuilder
to override the configuration values to the expected one - tis fixes the test failures