[MINOR] Fix logical type issue for timestamp columns #17601

linliu-code · 2025-12-15T20:34:33Z

Change Logs

This pr #9743 adds more schema evolution functionality and schema processing. However, we used the InternalSchema system to do various operations such as fix null ordering, reorder, and add columns. At the time, InternalSchema only had a single Timestamp type. When converting back to avro, this was assumed to be micros. Therefore, if the schema provider had any millis columns, the processed schema would end up with those columns as micros.

In this pr to update column stats with better support for logical types: #13711, the schema issues were fixed, as well as additional issues with handling and conversion of timestamps during ingestion.

this pr aims to add functionality to spark and hive readers and writers to automatically repair affected tables.
After switching to use the 1.1 binary, the affected columns will undergo evolution from timestamp-micros to timestamp-mills. Normally a lossy evolution that is not supported, this evolution is ok because the data is actually still timestamp-millis it is just mislabeled as micros in the parquet and table schemas

Impact

When reading from a hudi table using spark or hive reader if the table schema has a column as millis, but the data schema is micros, we will assume that this column is affected and read it as a millis value instead of a micros value. This correction is also applied to all readers that the default write paths use. As a table is rewritten the parquet files will be correct. A table's latest snapshot can be immediately fixed by writing one commit with the 1.1 binary, and then clustering the entire table.

Risk Level

High,
extensive testing was done and functional tests were added.

Documentation Update

#14100

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

…4161) Co-authored-by: Jonathan Vexler <=> Co-authored-by: sivabalan <n.siva.b@gmail.com> Co-authored-by: Vamsi <vamsi@onehouse.ai> Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com> Co-authored-by: Lin Liu <linliu.code@gmail.com>

hudi-bot · 2025-12-24T00:55:43Z

CI report:

5ef5773 UNKNOWN
1d2d706 UNKNOWN
4bbecb7 UNKNOWN
73e1942 UNKNOWN
ffcc9ca UNKNOWN
4c5a493 UNKNOWN
408cc29 UNKNOWN
0c4e026 UNKNOWN
8583da1 UNKNOWN
0c7b7b9 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

linliu-code and others added 13 commits November 21, 2025 09:36

Fix logical timestamp issue

c817b81

Disable spark2.4 for now

9ab9a1e

Fix CI issue

564bce7

Disable spark-scala tests for spark2.x to 3.3

b9c9333

Remove the unnecessary file

4401c99

Handle AvroSchemaConverterWithTimestampNTZ

5d8d587

Fix validation and integration test failures

c620f31

Remove support from spark3.2

3f2a439

Disable NTZ convert and see what happens

9e5dab1

Fix the CI issues by using maven flags

5ad7baf

Skip compiling for spark <= 3.1

2069166

Remove spark3.2 for NTZ support

1f02a88

linliu-code changed the base branch from master to branch-0.x December 15, 2025 20:34

linliu-code force-pushed the branch-0.x-with-logic_types_fix branch from ac2916a to 5ef5773 Compare December 15, 2025 20:40

github-actions bot added the size:XL PR with lines of changes > 1000 label Dec 15, 2025

linliu-code force-pushed the branch-0.x-with-logic_types_fix branch 12 times, most recently from 408cc29 to 0c4e026 Compare December 16, 2025 02:51

Fix cherry-pick error

79c4a88

linliu-code force-pushed the branch-0.x-with-logic_types_fix branch from 0c4e026 to 79c4a88 Compare December 16, 2025 03:22

Add more changes

8559d6a

linliu-code force-pushed the branch-0.x-with-logic_types_fix branch from 98f9a13 to 8583da1 Compare December 24, 2025 00:09

Fix more issues

0c7b7b9

linliu-code force-pushed the branch-0.x-with-logic_types_fix branch from 8583da1 to 0c7b7b9 Compare December 24, 2025 00:18

linliu-code marked this pull request as ready for review December 24, 2025 00:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MINOR] Fix logical type issue for timestamp columns #17601

[MINOR] Fix logical type issue for timestamp columns #17601

linliu-code commented Dec 15, 2025 •

edited

Loading

Uh oh!

hudi-bot commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[MINOR] Fix logical type issue for timestamp columns #17601

Are you sure you want to change the base?

[MINOR] Fix logical type issue for timestamp columns #17601

Conversation

linliu-code commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

hudi-bot commented Dec 24, 2025

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linliu-code commented Dec 15, 2025 •

edited

Loading