Fix instant file name path retrieval in Hudi Active Timeline #18213

atezs82 · 2023-07-10T13:19:06Z

Description

When submitting even a very simple query for certain types of Hudi tables (stored on S3) we were constantly receiving the following error:

SQL Error [65536]: Query failed (#20230710_093123_00000_bqjz2): No scheme for file system location: path/to/hudi/table/within/bucket/.hoodie

Stacktrace taken from the Trino coordinator:

...
Caused by: java.lang.IllegalArgumentException: No scheme for file system location: path/to/hudi/table/within/bucket/.hoodie
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:218)
	at io.trino.filesystem.Location.of(Location.java:74)
	at io.trino.plugin.hudi.timeline.HudiActiveTimeline.getInstantFileNamePath(HudiActiveTimeline.java:88)
	at io.trino.plugin.hudi.timeline.HudiActiveTimeline.getInstantDetails(HudiActiveTimeline.java:71)
...

The issue could be reproduced for this table always, on the other hand, other tables were working just fine. We were using a simple SQL query select * from test_catalog.test_schema.table limit 1;, but more complex queries were also failing due to this error.

This PR proposes a solution for the problem by slightly modifying the way Trino handles the Hudi timeline internally. Albeit the solution is simple, I have also added some unit test coverage about the fix. Please let me know your thoughts about it (I needed to include a Java mock library into Maven, this might not be desirable from overall project perspective).

Additional context and related issues

At some point during its operation, the Hudi connector tries to update ("reset") its internal state of the Hudi filesystem if "replaced instants" (files) are present on the Hudi timeline. This operation involves accessing the Hudi table's metadata folder on the object store. When this path segment is created, the corresponding io.trino.filesystem.Location class is created in a manner that it did not contain the scheme, just the path (see the code change below). This is forbidden in the code of that class (since it has to adhere to the format scheme://[userInfo@]host[:port][/path]), therefore the mentioned exception was thrown (refer to the Javadoc of the Location class for more information).

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

Fixed reading Hudi tables with replaced instants.

cla-bot · 2023-07-10T13:19:08Z

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Attila Tóth.
This is most likely caused by a git client misconfiguration; please make sure to:

check if your git client is configured with an email to sign commits git config --list | grep email
If not, set it up using git config --global user.email email@example.com
Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot · 2023-07-10T13:21:10Z

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Attila Tóth.
This is most likely caused by a git client misconfiguration; please make sure to:

check if your git client is configured with an email to sign commits git config --list | grep email
If not, set it up using git config --global user.email email@example.com
Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot · 2023-07-10T13:24:17Z

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

ebyhr · 2023-07-11T01:43:17Z

plugin/trino-hudi/pom.xml

@@ -374,6 +374,13 @@
            <scope>test</scope>
        </dependency>

+        <dependency>
+            <groupId>org.mockito</groupId>
+            <artifactId>mockito-core</artifactId>


We basically don't use mockito in tests. Please fix BaseHudiConnectorTest or TestHudiSparkCompatibility (or both).

Ok, I'll fix that and append the existing suites.

The test in trino-hudi is ready. I have created a simple copy of the already existing test table stock_ticks_cow in which the Hudi timeline has replacecommit actions, that is present in the patchset. I have confirmed that without the code change the test is also failing exactly with the above mentioned exception. Please let me know what do you think about this solution/direction.

Still need some time to familiarize myself with the product tests and add a test case there as well for this fix.

I extended TestHudiSparkCompatibility in the latest patchset, verifying that it also fails with the No scheme for file system location: exception above. Modified the setup method as well to have a default value for S3_BUCKET if that is not specified externally. Please let me know what do you think about this solution in overall.

Signing of the CLA is underway, I will send it in the upcoming days.

@ebyhr Are you OK with the proposed changes? Thanks for taking a look again at this!

cla-bot · 2023-07-11T08:28:26Z

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot · 2023-07-12T13:25:35Z

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot · 2023-07-12T20:48:34Z

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

atezs82 · 2023-07-24T08:04:17Z

Hey @ebyhr! Thanks for looking at this change! I'm ready to work on this more if needed - what do you think about the latest state of the PR? I have already sent the signed CLA on the 13th of July already.

.../test/resources/hudi-testing-data/stock_ticks_cow_clustered/.hoodie/20230712105030944.commit

...rino-product-tests/src/main/java/io/trino/tests/product/hudi/TestHudiSparkCompatibility.java

ebyhr · 2023-08-01T13:36:23Z

...rino-product-tests/src/main/java/io/trino/tests/product/hudi/TestHudiSparkCompatibility.java

+        createNonPartitionedTableWithReplaceCommit(tableName, COW_TABLE_TYPE);
+
+        try {
+            onTrino().executeQuery("SELECT id, name FROM hudi.default." + tableName);


Please verify the SELECT result.

Done - what I want to see here though is that the query passes. If it does, then replace commits were parsed in the timeline successfully. Please let me know if you thought about some additional verification here.

Please verify the rows using containsOnly instead of hasAnyRows.

...rino-product-tests/src/main/java/io/trino/tests/product/hudi/TestHudiSparkCompatibility.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/timeline/HudiActiveTimeline.java

...rino-product-tests/src/main/java/io/trino/tests/product/hudi/TestHudiSparkCompatibility.java

atezs82 · 2023-08-04T15:42:00Z

@ebyhr Thanks for all the comments and suggestions! I tried to answer them accordingly, please let me know what do you think. I have also verified that tests were failing if the code change in HudiActiveTimeline was not in place.

ebyhr · 2023-08-07T22:29:38Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/timeline/HudiActiveTimeline.java

-        return Location.of(fileName.contains(SCHEMA_COMMIT_ACTION) ? metaClient.getSchemaFolderName() : metaClient.getMetaPath().path()).appendPath(fileName);
+        Location metaPath = metaClient.getMetaPath();
+        if (fileName.contains(SCHEMA_COMMIT_ACTION)) {
+            return metaPath.appendPath(HudiTableMetaClient.SCHEMA_FOLDER_NAME);


Is .appendPath(fileName) missing? Do existing tests cover this condition?

I have added the appendPath(fileName) to the schema commit path of the condition, it was clearly missing from there.

In order to test this, we need a second table with at least one schemacommit action in the Hudi timeline. Will work on preparing that as well in this patchset.

Please note my comment below regarding this.

ebyhr · 2023-08-07T22:31:18Z

plugin/trino-hudi/src/test/resources/hudi-testing-data/stock_ticks_cow_clustered/README.md

@@ -0,0 +1,74 @@
+# Context


Thanks for adding this README file! Is it possible to create the table with SQL? We usually use SQL when creating the resource likes plugin/trino-delta-lake/src/test/resources.

Do you mean Spark SQL instead of this minimal Spark pipeline?

Changed the README.md file accordingly. When the schemacommit test is ready, I will also add a README.md there.

Please note my comment below regarding this.

ebyhr · 2023-08-07T22:34:00Z

...rino-product-tests/src/main/java/io/trino/tests/product/hudi/TestHudiSparkCompatibility.java

+        createNonPartitionedTableWithReplaceCommit(tableName, COW_TABLE_TYPE);
+
+        try {
+            onTrino().executeQuery("SELECT id, name FROM hudi.default." + tableName);


Please verify the rows using containsOnly instead of hasAnyRows.

atezs82 · 2023-08-11T13:25:50Z

Regarding the comment

Please verify the rows using containsOnly instead of hasAnyRows.

The code has been changed accordingly, but the test table contains 4 rows instead of 2. Will analyze the root cause of that as well. For now the test has this but I would like to understand the root cause of this.

atezs82 · 2023-08-16T20:30:56Z

I merged the two test tables to one (now that one has replacecommits and schemacommits as well). Please let me know what do you think about this current approach (eg. is the number of files and their sizes acceptable?). Moreover, shall we keep the schemacommit content even though it does not seem to force the logic to enter the branch in question (see @ebyhr's question about this above)?

During the creation of the testcases, I also noticed a different fault when handling replacecommits - test records were duplicated in a seemingly incorrect fashion. (This is the same issue.) I investigated this a bit further and found out that there is a problem with JSON deserialization in HudiReplaceCommitMetadata. Since the tests looked very strange due to this (for two test records I needed to assert 4), I also took the liberty to propose a fix for this in this PR. What do you think about this?

If you agree with having the schemacommit tests and this fix I will write a detailed README.md about how did I create the merged test table. (It is relatively simple, but might not be completely possible in Spark SQL though, since schema evolution in Hudi is still experimental and some parameters need to be set on the Spark Context beforehand.)

Otherwise, I can also work on restoring the test suite to contain just the replacecommit test - that was proven to reproduce the exception we have experienced. (Probably that might be a simpler approach.) I can also further dig into why does that branch does not get called, but I will need more time to understand the code regarding this (it operates with direct method calls and method references as well internally in HudiActiveTimeline/HudiDefaultTimeline). Please advise how shall I proceed with this.

Thanks in advance for looking at this once again!

atezs82 · 2023-08-17T06:44:04Z

One more addition: it bugged me so I quickly looked at the option of altering the needed configuration for the schemacommit tests from product tests. Turns out that eg. SET spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension results in the exception org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.extensions, so this might not be that easy to do from there (also it would be not desirable to modify a commonly used test SparkContext from a single test like this).

atezs82 · 2023-08-17T18:41:03Z

After spending some time on analyzing the code further about the schema commit support in HudiActiveTimeline I suspect that this feature is not yet in use. The only real place from where HudiActiveTimeline::getInstantDetails is called from is HudiTableFileSystemView::resetFileGroupsReplaced, but even there, right before the method is called, replace commits are filtered out right before when HudiDefaultTimeline::getCompletedReplaceTimeline is called - so getInstantDetails is called only with this commit action type.

In theory future users of the Hudi timeline interface might benefit from this feature (get proper filenames for schema commits as well, which reside under the .schema subfolder within the Hudi metadata folder), but for now it seems it is not used for any feature the connector exposes. I have also checked and seen that this was never called from any unit tests of trino-hudi.

Given this, I would suggest to remove the schema tests altogether, and fall back to just the replace commit tests, which clearly reproduced the original issue. That would make the test system simpler, we can just use the product tests if needed to test this (so that a large number of files do not need to be added to trino-hudi), and if we want to test in trino-hudi we do not need complicated Spark SQL, we can use the same method which is currently in use in the product tests.

@electrum @ebyhr What is your opinion about this?

ebyhr · 2023-08-28T00:33:50Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/model/HudiReplaceCommitMetadata.java

-    // for ser/deser
-    public HudiReplaceCommitMetadata()
+    @JsonCreator(mode = JsonCreator.Mode.PROPERTIES)
+    public HudiReplaceCommitMetadata(@JsonProperty("partitionToReplaceFileIds") final Map<String, List<String>> partitionToReplaceFileIds,


Move the 1st argument to next line and remove final.

public HudiReplaceCommitMetadata( @JsonProperty("partitionToReplaceFileIds") Map<String, List<String>> partitionToReplaceFileIds, @JsonProperty("compacted") Boolean compacted) {

Could you add a commit body to explain why this change is needed to "Fix replace commit handling in Hudi Active Timeline"?

The commit message body now has a brief reasoning about this change.

ebyhr · 2023-08-28T00:41:27Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/model/HudiReplaceCommitMetadata.java

+        this.partitionToReplaceFileIds = partitionToReplaceFileIds;
+        this.compacted = compacted;


Are these fields nulalble?

They are not. If eg. partitionToReplaceFileIds become null this code will throw an NPE (from HudiActiveTimeline::reetFileGroupsReplaced):

// get replace instant mapping for each partition, fileId return replaceMetadata.getPartitionToReplaceFileIds().entrySet().stream() .flatMap(entry -> entry.getValue().stream().map(fileId -> Map.entry(new HudiFileGroupId(entry.getKey(), fileId), instant)));

Then, it would be better to add ImmutableMap.copyOf & requireNonNull and change Boolean to boolean

this.partitionToReplaceFileIds = ImmutableMap.copyOf(requireNonNull(partitionToReplaceFileIds, "partitionToReplaceFileIds is null"));

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/table/HudiTableFileSystemView.java

ebyhr · 2023-08-28T00:41:35Z

plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/testing/ResourceHudiTablesInitializer.java

@@ -155,6 +155,7 @@ private static void copyDir(Path srcDir, Path dstDir)
    public enum TestingTable
    {
        HUDI_NON_PART_COW(nonPartitionRegularColumns()),
+        HUDI_NON_PART_COW_CL_SC(schemaChangedNonPartitionedRegularColumns()),


What's the meaning of CL and SC?

CL means "clustered", while SC means "schema change" in this context.

I don't think code readers can understand the meaning. Please avoid abbreviations.
https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#avoid-abbreviations

Ok, I will remove the schema change tests altogether as requested anyways, and will change this as well in the next commit.

I have removed the abbreviation, the table is called HUDI_NON_PART_COW_CLUSTERING (where COW is a well-known Hudi concept).

plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/testing/ResourceHudiTablesInitializer.java

ebyhr · 2023-08-28T00:41:41Z

...ces/hudi-testing-data/hudi_non_part_cow_cl_sc/.hoodie/.schema/20230815074720567.schemacommit

@@ -0,0 +1 @@
+{"schemas":[{"max_column_id":13,"version_id":20230815074720567,"type":"record","fields":[{"id":0,"name":"_hoodie_commit_time","optional":true,"type":"string","doc":""},{"id":1,"name":"_hoodie_commit_seqno","optional":true,"type":"string","doc":""},{"id":2,"name":"_hoodie_record_key","optional":true,"type":"string","doc":""},{"id":3,"name":"_hoodie_partition_path","optional":true,"type":"string","doc":""},{"id":4,"name":"_hoodie_file_name","optional":true,"type":"string","doc":""},{"id":5,"name":"rowId","optional":true,"type":"string"},{"id":6,"name":"partitionId","optional":true,"type":"string"},{"id":7,"name":"preComb","optional":true,"type":"long"},{"id":8,"name":"name","optional":true,"type":"string"},{"id":9,"name":"versionId","optional":true,"type":"string"},{"id":10,"name":"toBeDeletedStr","optional":true,"type":"string"},{"id":11,"name":"intToLong","optional":true,"type":"int"},{"id":12,"name":"longToInt","optional":true,"type":"long"},{"id":13,"name":"description","optional":true,"type":"string"}]},{"max_column_id":12,"version_id":0,"type":"record","fields":[{"id":0,"name":"_hoodie_commit_time","optional":true,"type":"string","doc":""},{"id":1,"name":"_hoodie_commit_seqno","optional":true,"type":"string","doc":""},{"id":2,"name":"_hoodie_record_key","optional":true,"type":"string","doc":""},{"id":3,"name":"_hoodie_partition_path","optional":true,"type":"string","doc":""},{"id":4,"name":"_hoodie_file_name","optional":true,"type":"string","doc":""},{"id":5,"name":"rowId","optional":true,"type":"string"},{"id":6,"name":"partitionId","optional":true,"type":"string"},{"id":7,"name":"preComb","optional":true,"type":"long"},{"id":8,"name":"name","optional":true,"type":"string"},{"id":9,"name":"versionId","optional":true,"type":"string"},{"id":10,"name":"toBeDeletedStr","optional":true,"type":"string"},{"id":11,"name":"intToLong","optional":true,"type":"int"},{"id":12,"name":"longToInt","optional":true,"type":"long"}]}]}


#18213 (comment)

Let's remove these file and use product test instead.

I have removed the tests for the schema commits.

ebyhr · 2023-08-28T00:41:54Z

...rino-product-tests/src/main/java/io/trino/tests/product/hudi/TestHudiSparkCompatibility.java

+
+        try {
+            assertThat(onTrino().executeQuery("SELECT id, name FROM hudi.default." + tableName + ";"))
+                    .containsOnly(ImmutableList.of(


nit: ImmutableList.of is redundant.

codope · 2023-09-13T10:29:07Z

@ebyhr @mosabua This PR is critical and affects Hudi table with clustering. I see most comments are addressed. Any chance of getting this merged sooner and released in the next version?

atezs82 · 2023-09-13T10:48:27Z

Please let me know if any other things need to be done here. We are using a patched Trino image, but we would also would like to upgrade to an official Trino release having this fix soon. Thanks in advance!

codope

The change look good. Thanks @atezs82 for fixing the issue.
Please also include me as a reviewer for any change in Hudi connector.

ebyhr · 2023-09-15T00:15:53Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/model/HudiReplaceCommitMetadata.java

+        this.partitionToReplaceFileIds = partitionToReplaceFileIds;
+        this.compacted = compacted;


Then, it would be better to add ImmutableMap.copyOf & requireNonNull and change Boolean to boolean

this.partitionToReplaceFileIds = ImmutableMap.copyOf(requireNonNull(partitionToReplaceFileIds, "partitionToReplaceFileIds is null"));

...rino-product-tests/src/main/java/io/trino/tests/product/hudi/TestHudiSparkCompatibility.java

ebyhr · 2023-09-15T00:20:27Z

plugin/trino-hudi/src/test/resources/hudi-testing-data/hudi_non_part_cow_clustered/README.md

+CREATE TABLE hudi_non_part_cow
+USING hudi
+OPTIONS (
+hoodie.table.name = 'hudi_non_part_cow',
+hoodie.datasource.write.recordkey.field = 'rowId',
+hoodie.datasource.write.precombine.field = 'preComb',
+hoodie.table.type = 'COPY_ON_WRITE'
+)
+LOCATION 'file:///<<< EXISTING TABLE LOCATION >>>'
+CREATE TABLE hudi_non_part_cow_clustered
+USING hudi
+OPTIONS (
+   hoodie.table.name = 'hudi_non_part_cow_clustered',
+   hoodie.datasource.write.recordkey.field = 'rowId',
+   hoodie.datasource.write.precombine.field = 'preComb',
+   hoodie.table.type = 'COPY_ON_WRITE',
+   hoodie.clustering.inline = 'true',
+   hoodie.clustering.inline.max.commits = '1'
+)
+LOCATION 'file:///<<< NEW TABLE LOCATION >>>'
+AS SELECT * FROM hudi_non_part_cow


So, why do we need this resource files instead of product tests? I think we can simply execute this CTAS statement in product tests?

This was needed so that this special behavior (replace commit handling) is also tested in the Hudi connector, but I also agree on that this is still a bit more complex than it should be. I removed this test altogether from the Hudi connector and let the product tests test this behavior as well in the last changeset.

...rino-product-tests/src/main/java/io/trino/tests/product/hudi/TestHudiSparkCompatibility.java

ebyhr · 2023-09-15T00:24:21Z

The commit body is too long. Please follow https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages

atezs82 · 2023-09-16T18:28:46Z

Thanks for the review and the suggestions!

I have shortened/slightly reformatted the commit message to contain the changes requested in #18213 (comment), but still have a shorter subject line (with a longer body explaining the exact changes done). Please let me know if that is acceptable.

Regarding #18213 (comment), I have implemented the requested changes, on the other hand, changing the compacted field to boolean from Boolean also had some implications on the overriden equals and hashCode methods within the HudiReplaceCommitMetadata class. I have modified those methods to contain the partitionToReplaceFileIds field, since it was missing from both methods. Please let me know if this change is acceptable here or shall we place it in an other PR.

...rino-product-tests/src/main/java/io/trino/tests/product/hudi/TestHudiSparkCompatibility.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/model/HudiReplaceCommitMetadata.java

When reading Hudi tables with replace commits, the location of the Hudi commit file was assembled incorrectly in the Hudi Active Timeline logic. The Hudi replace commit metadata model object was also patched to deserialize Hudi metadata properly. This was needed so that Trino users (and the tests for the active timeline) do not see duplicated records for replace commits.

mosabua · 2023-09-20T23:07:19Z

Thank you @ebyhr for all your work on this.

Also fyi @codope

codope · 2023-09-21T07:29:13Z

Thanks a lot @mosabua @ebyhr @atezs82 for helping fix this critical issue!

atezs82 · 2023-09-21T08:09:27Z

I also would like to thank @ebyhr for all the work he put into reviewing this change.

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch from 32a55c9 to 88124ab Compare July 10, 2023 13:21

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch from 88124ab to 60d2776 Compare July 10, 2023 13:24

github-actions bot added the hudi Hudi connector label Jul 10, 2023

ebyhr reviewed Jul 11, 2023

View reviewed changes

ebyhr assigned atezs82 Jul 11, 2023

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch from 60d2776 to 2046013 Compare July 11, 2023 08:28

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch from 2046013 to 410a077 Compare July 12, 2023 13:25

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch from 410a077 to 4f5a8b4 Compare July 12, 2023 20:48

github-actions bot added the tests:hive label Jul 12, 2023

atezs82 removed their assignment Jul 17, 2023

ebyhr reviewed Aug 1, 2023

View reviewed changes

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch from 4f5a8b4 to f5c27f8 Compare August 4, 2023 11:11

cla-bot bot added the cla-signed label Aug 4, 2023

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch 3 times, most recently from 8f53bae to b99ac1a Compare August 4, 2023 13:05

atezs82 assigned ebyhr Aug 4, 2023

ebyhr removed their assignment Aug 4, 2023

ebyhr reviewed Aug 7, 2023

View reviewed changes

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch 2 times, most recently from 5c41bdc to eaac554 Compare August 11, 2023 13:18

atezs82 mentioned this pull request Aug 20, 2023

[HUDI] Can not query with table have replacecommit #18188

Open

ebyhr reviewed Aug 28, 2023

View reviewed changes

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch 3 times, most recently from aa1edd5 to e01c2c7 Compare September 4, 2023 07:12

codope approved these changes Sep 13, 2023

View reviewed changes

martint requested a review from electrum September 14, 2023 20:19

ebyhr reviewed Sep 15, 2023

View reviewed changes

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch from e01c2c7 to 64cc81f Compare September 16, 2023 18:23

ebyhr reviewed Sep 19, 2023

View reviewed changes

atezs82 force-pushed the fix-hudi-instant-file-name-fetch branch 2 times, most recently from 7082301 to ba70065 Compare September 19, 2023 09:17

ebyhr force-pushed the fix-hudi-instant-file-name-fetch branch from ba70065 to 7eee37b Compare September 19, 2023 11:59

ebyhr force-pushed the fix-hudi-instant-file-name-fetch branch from 7eee37b to 802b7c5 Compare September 20, 2023 08:56

ebyhr merged commit ea4277d into trinodb:master Sep 20, 2023

github-actions bot added this to the 427 milestone Sep 20, 2023

colebow mentioned this pull request Sep 25, 2023

Add Trino 427 release notes #19023

Merged

hashhar mentioned this pull request Oct 4, 2023

No schema for file system location error in Hudi connector #18313

Closed

codope mentioned this pull request Oct 26, 2023

Fixes data duplication when Hudi Table contains a replace commit #18032

Closed

		this.partitionToReplaceFileIds = partitionToReplaceFileIds;
		this.compacted = compacted;

		@@ -0,0 +1 @@
		{"schemas":[{"max_column_id":13,"version_id":20230815074720567,"type":"record","fields":[{"id":0,"name":"_hoodie_commit_time","optional":true,"type":"string","doc":""},{"id":1,"name":"_hoodie_commit_seqno","optional":true,"type":"string","doc":""},{"id":2,"name":"_hoodie_record_key","optional":true,"type":"string","doc":""},{"id":3,"name":"_hoodie_partition_path","optional":true,"type":"string","doc":""},{"id":4,"name":"_hoodie_file_name","optional":true,"type":"string","doc":""},{"id":5,"name":"rowId","optional":true,"type":"string"},{"id":6,"name":"partitionId","optional":true,"type":"string"},{"id":7,"name":"preComb","optional":true,"type":"long"},{"id":8,"name":"name","optional":true,"type":"string"},{"id":9,"name":"versionId","optional":true,"type":"string"},{"id":10,"name":"toBeDeletedStr","optional":true,"type":"string"},{"id":11,"name":"intToLong","optional":true,"type":"int"},{"id":12,"name":"longToInt","optional":true,"type":"long"},{"id":13,"name":"description","optional":true,"type":"string"}]},{"max_column_id":12,"version_id":0,"type":"record","fields":[{"id":0,"name":"_hoodie_commit_time","optional":true,"type":"string","doc":""},{"id":1,"name":"_hoodie_commit_seqno","optional":true,"type":"string","doc":""},{"id":2,"name":"_hoodie_record_key","optional":true,"type":"string","doc":""},{"id":3,"name":"_hoodie_partition_path","optional":true,"type":"string","doc":""},{"id":4,"name":"_hoodie_file_name","optional":true,"type":"string","doc":""},{"id":5,"name":"rowId","optional":true,"type":"string"},{"id":6,"name":"partitionId","optional":true,"type":"string"},{"id":7,"name":"preComb","optional":true,"type":"long"},{"id":8,"name":"name","optional":true,"type":"string"},{"id":9,"name":"versionId","optional":true,"type":"string"},{"id":10,"name":"toBeDeletedStr","optional":true,"type":"string"},{"id":11,"name":"intToLong","optional":true,"type":"int"},{"id":12,"name":"longToInt","optional":true,"type":"long"}]}]}

Fix instant file name path retrieval in Hudi Active Timeline #18213

Fix instant file name path retrieval in Hudi Active Timeline #18213

Uh oh!

Conversation

atezs82 commented Jul 10, 2023

Description

Additional context and related issues

Release notes

Uh oh!

cla-bot bot commented Jul 10, 2023

Uh oh!

cla-bot bot commented Jul 10, 2023

Uh oh!

cla-bot bot commented Jul 10, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cla-bot bot commented Jul 11, 2023

Uh oh!

cla-bot bot commented Jul 12, 2023

Uh oh!

cla-bot bot commented Jul 12, 2023

Uh oh!

atezs82 commented Jul 24, 2023

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atezs82 commented Aug 4, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atezs82 Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atezs82 Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atezs82 Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atezs82 commented Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atezs82 commented Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atezs82 commented Aug 17, 2023

Uh oh!

atezs82 commented Aug 17, 2023

Uh oh!

atezs82 Aug 11, 2023 •

edited

Loading

atezs82 Aug 11, 2023 •

edited

Loading

atezs82 Aug 11, 2023 •

edited

Loading

atezs82 commented Aug 11, 2023 •

edited

Loading

atezs82 commented Aug 16, 2023 •

edited

Loading