fix: [HUDI-8401] Unifying Partial Update modes and Merge Into Partial Update Encoding #17604

PavithranRick · 2025-12-16T01:02:19Z

Describe the issue this Pull Request addresses

This PR unifies Partial Update Modes defined via table properties with partial update encoding used by Spark SQL MERGE INTO, and cleans up inconsistencies in how partial updates are handled across the write path.

Specifically, prior to this change:

Partial update handling logic was fragmented.
Merge Into partial update encoding was only supported for Spark records.
Table property–driven partial update modes and Merge Into encoding were not unified.

This work was originally based on PR #13540 from Lin, with additional fixes and refinements applied.

Summary and Changelog

This PR introduces a revised and unified design for partial update handling across all record formats and write paths.

Key changes:

Unifying Partial Update Modes from table properties and Merge Into partial update encoding.
Fixed all partial update handling within BufferedRecordMergerFactory.
Extended Partial update encoding (Merge Into support) across all record formats (previously only Spark record was supported).

Revised PartialUpdateMode design:

Possible values for table property hoodie.table.partial.update.mode:

IGNORE_DEFAULTS
FILL_UNAVAILABLE
KEEP_VALUES

There is no default value for PartialUpdateMode. For tables without partial update requirements, this property may be absent.

Note on KEEP_VALUES:

This value may not be explicitly present in table properties.
It will take effect when partial columns are encoded via Spark SQL MERGE INTO.
When invoking BufferedRecordMerger for merging, this mode is expected to be set accordingly.

BufferedRecordMergerFactory changes:

BufferedRecordMergerFactory.create() now accepts:

enablePartialEncoding (boolean): indicates whether partial (vs full) record merging is used.
Option<PartialUpdateMode>: defines the merge semantics when partial encoding is enabled.

If enablePartialEncoding is true, the provided PartialUpdateMode will be honored.

Interaction between table property and Merge Into encoding:

Merge Mode	Table Property: hoodie.table.partial.update.mode	Writer	BufferedRecordMerger: enablePartialEncoding	BufferedRecordMerger: Option
event time	not set	spark-ds	false	Option.empty()
event time	not set	Merge Into	true	KEEP_VALUES
event time	IGNORE_DEFAULTS	spark-ds	true	IGNORE_DEFAULTS
event time	IGNORE_DEFAULTS	Merge Into	true	IGNORE_DEFAULTS

Pending:

Unless we land get the fix landed for https://issues.apache.org/jira/browse/HUDI-9638 these new PartialUpdateModes are not taking effect for end to end functional tests.
Also, we need to add more tests directly against BufferedRecordMerger class.

Impact

Cleaning up PartialUpdateMode feature and unifying MergeInto partial update encoding along with other PartialUpdateModes from table config.

Risk Level

medium

Documentation Update

The config description must be updated if new configs are added or the default value of the configs are changed.

Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

…and Table property for PartialUpdateMode

…UpdateMode

…oodieSchema

yihua

Have we done any performance benchmarks as I remember there's concern around the performance impact?

the-other-tim-brown · 2025-12-22T00:58:00Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/KeepValuesPartialMergingUtils.java

+ * Class to assist with merging two versions of the record that may contain partial updates using
+ * {@link org.apache.hudi.common.table.PartialUpdateMode#KEEP_VALUES} mode.
+ */
+public class KeepValuesPartialMergingUtils<T> {


Nitpick: Utils typically don't have any state. Let's name this something like PartialMergerWithKeepValues

the-other-tim-brown · 2025-12-22T01:00:11Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/KeepValuesPartialMergingUtils.java

+  private static final Map<HoodieSchema, Map<String, Integer>>
+      FIELD_NAME_TO_ID_MAPPING_CACHE = new ConcurrentHashMap<>();
+  private static final Map<Pair<Pair<HoodieSchema, HoodieSchema>, HoodieSchema>, HoodieSchema>
+      MERGED_SCHEMA_CACHE = new ConcurrentHashMap<>();


This cache can just grow over the life of the application. I think we can create a single instance for each BufferedRecordMerger and then make this an instance variable

the-other-tim-brown · 2025-12-22T01:02:16Z

hudi-common/src/main/java/org/apache/hudi/common/table/read/KeepValuesPartialMergingUtils.java

+    Object[] fieldVals = new Object[fields.size()];
+    int idx = 0;
+    List<HoodieSchemaField> mergedSchemaFields = mergedSchema.getFields();
+    for (HoodieSchemaField mergedSchemaField : mergedSchemaFields) {


What is the expected behavior for nested fields?

…UpdateMode

hudi-bot · 2025-12-23T23:37:34Z

CI report:

5f062b3 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

nsivabalan added 3 commits July 24, 2025 19:14

Fixing PartialUpdateMode

4ee4e1b

Fixing Partial Update Modes and Unifying merge into Partial encoding …

3edf908

…and Table property for PartialUpdateMode

Cleaning up code

2ac8fe4

github-actions bot added the size:XL PR with lines of changes > 1000 label Dec 16, 2025

Pavithran Ravichandiran added 3 commits December 16, 2025 11:28

Merge branch 'master' of ssh://github.com/apache/hudi into fixPartial…

13ef6b1

…UpdateMode

HUDI-8401 - rebase with master fixes

a1d2f6d

HUDI-8401 - Changed partial encoding reader->recordcontext; avro -> h…

9489435

…oodieSchema

PavithranRick requested review from nsivabalan and the-other-tim-brown December 18, 2025 20:33

PavithranRick marked this pull request as ready for review December 18, 2025 20:37

HUDI-8401 - CI Test failure fixes around avro reader record context

dae446e

yihua reviewed Dec 20, 2025

View reviewed changes

the-other-tim-brown reviewed Dec 22, 2025

View reviewed changes

Merge branch 'master' of ssh://github.com/apache/hudi into fixPartial…

5f062b3

…UpdateMode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: [HUDI-8401] Unifying Partial Update modes and Merge Into Partial Update Encoding #17604

fix: [HUDI-8401] Unifying Partial Update modes and Merge Into Partial Update Encoding #17604

PavithranRick commented Dec 16, 2025 •

edited

Loading

Uh oh!

yihua left a comment •

edited

Loading

Uh oh!

the-other-tim-brown Dec 22, 2025

Uh oh!

the-other-tim-brown Dec 22, 2025

Uh oh!

the-other-tim-brown Dec 22, 2025

Uh oh!

hudi-bot commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: [HUDI-8401] Unifying Partial Update modes and Merge Into Partial Update Encoding #17604

Are you sure you want to change the base?

fix: [HUDI-8401] Unifying Partial Update modes and Merge Into Partial Update Encoding #17604

Conversation

PavithranRick commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

yihua left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

the-other-tim-brown Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

the-other-tim-brown Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

the-other-tim-brown Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Dec 23, 2025

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

PavithranRick commented Dec 16, 2025 •

edited

Loading

yihua left a comment •

edited

Loading