Skip to content

Conversation

@voonhous
Copy link
Member

@voonhous voonhous commented Dec 26, 2025

Describe the issue this Pull Request addresses

This PR refactors the hudi-spark-datasource/hudi-spark and hudi-spark-datasource/hudi-spark-common modules to reduce boilerplate code by leveraging Project Lombok annotations. Specifically, it replaces explicit Logger instantiation, manual getter/setter methods, and empty constructors with their equivalent Lombok annotations (@Slf4j, @Getter, @Setter,@NoArgsConstructor, @AllArgsConstructor, @Data, @Value, @ToString).

This improves code readability and maintainability without altering the runtime logic.

Summary and Changelog

This change introduces the Lombok dependency to the hudi-spark-datasource/hudi-spark and hudi-spark-datasource/hudi-spark-common modules and refactors several classes to utilize Lombok annotations.

  • Added Lombok annotations wherever possible to hudi-spark-datasource/hudi-spark and hudi-spark-datasource/hudi-spark-common modules.

Impact

  • Public API: None.
  • User Experience: No visible change for end-users.
  • Performance: No impact (compile-time code generation).
  • Code Health: Reduces lines of code and standardizes logging/accessor patterns.

Risk Level

none

(This is a pure refactoring change involving standard library annotations; no business logic was modified.)

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@voonhous voonhous force-pushed the lombokify-hudi-spark branch from e60079a to 33ce24e Compare December 26, 2025 03:33
@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Dec 26, 2025
@voonhous voonhous force-pushed the lombokify-hudi-spark branch from 33ce24e to d1d8220 Compare December 26, 2025 03:45
@voonhous voonhous changed the title refactor: Add Lombok annotations to hudi-spark refactor: Add Lombok annotations to hudi-spark,hudi-spark-common Dec 26, 2025
@voonhous voonhous force-pushed the lombokify-hudi-spark branch from 33b949e to 34a619b Compare December 26, 2025 04:18
@voonhous voonhous requested a review from yihua December 26, 2025 04:34
@apache apache deleted a comment from hudi-bot Dec 26, 2025
@apache apache deleted a comment from hudi-bot Dec 26, 2025
Copy link
Contributor

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! There are some places where we can apply parameterized logging, would be good if we can address them

* Note: Explicit setters are used instead of Lombok's @Setter annotation because this class is accessed from Scala code
* (RunBootstrapProcedure.scala). Since Scala compilation happens before Java compilation in the Maven build lifecycle,
* Lombok-generated methods would not be visible to the Scala compiler, causing compilation errors.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

this.props = this.propsFilePath == null || this.propsFilePath.isEmpty() ? buildProperties(this.configs)
: readConfig(fs.getConf(), new StoragePath(this.propsFilePath), this.configs).getProps(true);
LOG.info("Starting data import with configs : " + props.toString());
log.info("Starting data import with configs : " + props.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can apply parameterized logging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}
this.bootstrapConfig = builder.build();
LOG.info("Created bootstrap executor with configs : " + bootstrapConfig.getProps());
log.info("Created bootstrap executor with configs : " + bootstrapConfig.getProps());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can apply parameterized logging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

writer.save(tablePath); // ultimately where the dataset will be placed
String commitInstantTime1 = HoodieDataSourceHelpers.latestCommit(fs, tablePath);
LOG.info("First commit at instant time :" + commitInstantTime1);
log.info("First commit at instant time :" + commitInstantTime1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can apply parameterized logging to some places in this class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

private DataFrameWriter<Row> updateHiveSyncConfig(DataFrameWriter<Row> writer) {
if (enableHiveSync) {
LOG.info("Enabling Hive sync to " + hiveJdbcUrl);
log.info("Enabling Hive sync to " + hiveJdbcUrl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

FileSystem fs = FileSystem.get(jssc.hadoopConfiguration());
String commitInstantTime1 = HoodieDataSourceHelpers.latestCommit(fs, tablePath);
LOG.info("Commit at instant time :" + commitInstantTime1);
log.info("Commit at instant time :" + commitInstantTime1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

waitTillNCommits(fs, numExpCommits, 180, 3);
commitInstantTime2 = HoodieDataSourceHelpers.listCommitsSince(fs, tablePath, commitInstantTime1).stream().sorted().findFirst().get();
LOG.info("Second commit at instant time :" + commitInstantTime2);
log.info("Second commit at instant time :" + commitInstantTime2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can apply parameterized logging to some places in this class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

// Ensure all commits were synced to the Metadata Table
HoodieTableMetaClient metadataMetaClient = createMetaClient(metadataTableBasePath);
LOG.warn("total commits in metadata table " + metadataMetaClient.getActiveTimeline().getCommitsTimeline().countInstants());
log.warn("total commits in metadata table " + metadataMetaClient.getActiveTimeline().getCommitsTimeline().countInstants());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can apply parameterized logging to some places in this class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@voonhous voonhous merged commit 93932fc into apache:master Dec 27, 2025
138 of 141 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants