-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Bug] throw Hive DDL and paimon schema mismatched excetion when show the table's schema with the timestamp type #5501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The ut case failed. |
Has fixed, please review again. Thanks. |
@zhuangchong help cc,thx |
For type validation failure for Timestamp LTZ, I think the key problem is not the validation failure perhaps. Summarize the main steps of creating table which metastore is hms with spark sql:
// org.apache.paimon.hive.HiveTypeUtils.PaimonToHiveTypeVisitor
@Override
public TypeInfo visit(LocalZonedTimestampType localZonedTimestampType) {
return LocalZonedTimestampTypeUtils.hiveLocalZonedTimestampType();
} // org.apache.paimon.hive.LocalZonedTimestampTypeUtils
public static TypeInfo hiveLocalZonedTimestampType() {
try {
Class<?> typeInfoFactoryClass =
Class.forName("org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory");
Field field = typeInfoFactoryClass.getField("timestampLocalTZTypeInfo");
return (TypeInfo) field.get(null);
} catch (Exception e) {
return TypeInfoFactory.timestampTypeInfo;
}
} In pr #4571, it determines timestamp ltz type based on hive runtime version. Therefore, I think the key problem is to find why Maybe you can test this in org.apache.paimon.spark.sql.DDLWithHiveCatalogTest in paimon-spark-3.5. |
First, thank you for your previous response. The two-step process you mentioned about Spark table creation is correct. The first step of converting Spark types to Paimon types works properly. However, an issue occurs during the second step when using As shown in the code snippet you referenced: public static TypeInfo hiveLocalZonedTimestampType() {
try {
Class<?> typeInfoFactoryClass =
Class.forName("org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory");
Field field = typeInfoFactoryClass.getField("timestampLocalTZTypeInfo");
return (TypeInfo) field.get(null);
} catch (Exception e) {
// Exception occurs: java.lang.NoSuchFieldException: timestampLocalTZTypeInfo
return TypeInfoFactory.timestampTypeInfo;
}
} The code itself is correct. It attempts to retrieve the When using a native Hive 3 client to create Paimon tables, the Therefore, a more practical solution for Paimon users—as implemented in my PR—is to relax the type validation checks in Paimon's code. This adjustment allows the system to gracefully handle Spark's Hive 2.x limitations while maintaining compatibility with Hive 3 environments. |
@Jack1007 Thanks for your explanation! It seems like a compatible problem between hive 2 and hive 3. Could you please separate this pr to two PRs, one for type validation and one for time offset? And create a new issue to describe the problem of time offset in detail, as the handling of time offsets in Hive is complex. |
OK, This PR will focus on fixing the timestamp type validation issue. I will create a new PR specifically to address and explain the timezone offset problems along with the proposed resolution steps. |
I have split this PR into two separate ones. This PR retains only the code fixes for Hive timestamp type validation, while the other PR specifically addresses and explains the time zone offset issue with the The other PR can be found here: 5571 @LsomeYeah Please review this PR again. Thank you! |
if (schemaFieldTypeInfos | ||
.get(i) | ||
.getTypeName() | ||
.equals("timestamp with local time zone")) { | ||
// Hive timestamp is compatible with paimon timestamp | ||
// with local time zone | ||
LOG.info( | ||
"Hive DDL and paimon schema mismatched, but Hive timestamp is compatible with paimon timestamp with local time zone"); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not check whether the name of the paimon field is equal to the name of the hive field. And it does not verify whether the two types are compatible (e.g., schemaFieldTypeInfo is ltz, but hiveFieldTypeInfo is int, which is not compatible). We can allow ltz in paimon to be compatible with timestamp and ltz in hive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there was a logical flaw in this section. I have adjusted the if
conditions to maintain the validity of higher-level checks while enforcing stricter field validation—now requiring both matching field names and restricting compatibility solely between Hive's timestamp
and Paimon's timestamp with local time zone
types.
…with local time zone types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
To fix issue #5450
This PR addresses three issues related to Hive3 compatibility with Paimon tables:
Type Validation Failure for Timestamp LTZ
Hive3 stores the timestamp with local time zone type as timestamp in its metastore, while Paimon explicitly uses the type name timestamp with local time zone during type conversion. Although Hive3 natively supports this type, the metastore's type name mismatch causes validation failures. We resolve this by allowing type compatibility validation to bypass the naming inconsistency.
-8-Hour Timezone Offset in Query Results
After fixing the first issue, Hive3 successfully executes DDL operations but returns timestamp values with an incorrect -8-hour offset for the Beijing timezone. The root cause lies in the PaimonTimestampLocalTZObjectInspector.getPrimitiveJavaObject method (in the paimon-hive-connector-3.1 module), which fails to apply the session timezone during deserialization. According to the Parquet specification, timestamp with local time zone fields store UTC timestamps and require conversion to the local timezone during processing. We fix this by ensuring proper UTC-to-session-timezone conversion in the deserialization logic.
+8-Hour Offset in SparkSQL Queries After INSERT
When inserting Beijing local time into a Hive3-created table with a timestamp with local time zone field, querying via SparkSQL reveals an 8-hour offset. This is caused by Hive incorrectly treating timestamp data as local time during serialization, violating the timestamp with local time zone semantics. Specifically, the PaimonTimestampLocalTZObjectInspector.convert method erroneously converts Hive time data to a local Timestamp type instead of adhering to UTC. The fix involves converting the Hive time data to a UTC timestamp before storing it in Paimon’s timestamp type, ensuring alignment with the expected UTC-based storage semantics.
References:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#instant-semantics-timestamps-normalized-to-utc
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/timezone/#timestamp_ltz-type
https://spark.apache.org/docs/latest/sql-ref-datatypes.html#TimestampType
https://hive.apache.org/docs/latest/different-timestamp-types_103091503/
Linked issue: open #5450
Tests
API and Format
no
Documentation
no