Description
Setting the table schema in the global job conf inHiveIcebergStorageHandler#configureJobConf()
- see #1557 - introduced a problem: whichever table calls this function last, ends up having its schema stored in the global conf (overwriting all the previous ones).
This leads to issues when using multiple tables (joins) in Tez: in HiveIcebergSerde#initialize()
, the schema is taken from the conf when present, but that leads to failures for multiple tables since all tables are now using the schema of the table that was the "winner" (that one that set its schema last in the above configureJobConf()
step).
Furthermore, even for Hive1.1, this approach should only partially fix the issues outlined in #1557, since when working with multiple tables, the same problem of overwriting the schema in the global conf can occur when retrieving the schema here . (the only problem scenario laid out in #1557 was a one-table order by query).
As a result, I'm proposing to revert this change. The proper fix should look into why the table schema property in Hive1.1 is missing from the config object here - possibly there is a missing config merger step in Hive1 whereby the props from the jobInputProperties are not overlaid on top of the global job conf.