[PECOBLR-1131] Fix incorrect refetching of expired CloudFetch links when using Thrift protocol. #1066

tejassp-db · 2025-11-08T06:10:16Z

Description

Thrift protocol has a orientation field with values FETCH_NEXT, FETCH_PRIOR or FETCH_FIRST. This field is always set to FETCH_NEXT resulting in incorrect refetch. To fetch from a particular chunk index the Thrift protocol requires the start row offset to be set. The chunk index and start row offset information is available from the expired links. Use the start row offset to fetch the links in the Thrift protocol.

Testing

This fix is tested with an integration test that validates that the correct links are fetched when fetching from a pair of chunk index and start row offset. There are also unit tests to validate correct client behaviour when unexpected responses are received from the server.

Additional Notes to the Reviewer

I also made some changes to the validation of the results. Commented within the PR.

In Thrift server the CloudFetch links cannot be fetched by chunk index. Changing code to fetch CloudFetch links from a start row offset.

- Fix chunk index value in chunk creation, do not start from zero. - Fix assertions for the fetched links.

Add integration tests to check refetch of links works correctly.

Fix validation and error handling. Check for exact startRowOffset match.

Add unit tests for thrift result fetch error paths.

Revert getNumRows visibility to protected.

Throw exception when chunk is missing for a chunkIndex.

tejassp-db · 2025-11-08T06:11:48Z

src/main/java/com/databricks/jdbc/common/util/DatabricksThriftUtil.java

-        .setExternalLink(chunkInfo.getFileLink())
-        .setChunkIndex(chunkIndex)
-        .setExpiration(Long.toString(chunkInfo.getExpiryTime()));
+            .setExternalLink(chunkInfo.getFileLink())


@jayantsing-db Setting additional fields in ExternalLink. Is this an issue?

I don't see any issue.

tejassp-db · 2025-11-08T06:13:35Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftAccessor.java

-      boolean fetchMetadata)
-      throws DatabricksHttpException {
-    String statementId = StatementId.loggableStatementId(operationHandle);
-    verifySuccessStatus(responseStatus, context, statementId);


@jayantsing-db Checking the validity of a previously received response is not the responsibility of this function. Moved it to outside this function at the point where the response is received.

tejassp-db · 2025-11-08T06:14:48Z

src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkLinkDownloadService.java

+    T chunk = chunkIndexToChunksMap.get(chunkIndex);
+    if (chunk == null) {
+      // Should never happen.
+      throw new IllegalStateException("Chunk not found in map for index " + chunkIndex + ". "


@jayantsing-db Is this the correct exception to be thrown here?

jayantsing-db

Added some comments/questions.

jayantsing-db · 2025-11-10T06:20:53Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftAccessor.java

        resultSet = response.getDirectResults().getResultSet();
        resultSet.setResultSetMetadata(response.getDirectResults().getResultSetMetadata());
      } else {
+        verifySuccessStatus(


I think we already verify response at line 215 checkResponseForErrors(response);. That should suffice.

Should this be moved to after pollTillOperationFinished and before the if-else block?

verifySuccessStatus is taking response which is already tested at checkResponseForErrors(response). This check can be removed totally?

jayantsing-db · 2025-11-10T06:24:11Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftAccessor.java

-        resultSet =
-            getResultSetResp(response.getStatus(), operationHandle, "getStatementResult", -1, true);
+
+        verifySuccessStatus(


I think this should be just after line 404 before we try to access the operation state in response. Previously the check occurred in the function getResultSetResp only as a side-effect of reusing the function. Since we are now more deliberate with this check, it makes more sense to put it after line 404.

jayantsing-db · 2025-11-10T06:27:36Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftAccessor.java

-      String context,
-      int maxRowsPerBlock,
-      boolean fetchMetadata)
+  private TFetchResultsResp executeRequest(TFetchResultsReq request)


nit: can this be renamed to executeFetchRequest?

jayantsing-db · 2025-11-10T06:32:51Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java

+      throw new DatabricksSQLException(error, DatabricksDriverErrorCode.INVALID_STATE);
+    }
+
+    // Subsequent fetches fetch from the next set of rows.


Why we do not need offset for subsequent fetches? After the first request, do you expect the server side iterator to reset to the desired offset? I am not sure if the reset actually happens and will check on this.

The server resets and this feature is tested in the ThriftCloudFetchIntegrationTests. Maybe we should confirm with the server team once?

good point! But I was checking through the code too on this one : But from server code, this does appear to be stateful. The implementation is indeed correct.

https://github.com/databricks-eng/runtime/blob/f2e44cd5250c1523b06b3d075ac00ea34b2f5027/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/CloudStoreFetchIterator.scala#L106

https://github.com/databricks-eng/runtime/blob/f2e44cd5250c1523b06b3d075ac00ea34b2f5027/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/CloudStoreBasedResultHandler.scala#L117C27-L117C40

- rename executeRequest to executeFetchRequest - moved response status check to after the call

samikshya-db

Thanks for the awesome fix, @tejassp-db! Also congrats on your first PR 🎉

LGTM overall, added some minor comments

samikshya-db · 2025-11-13T08:20:56Z

src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkLinkDownloadService.java

+    T chunk = chunkIndexToChunksMap.get(chunkIndex);
+    if (chunk == null) {
+      // Should never happen.
+      throw new IllegalStateException(


let's throw DatabricksValidationException here - as we push telemetry with these internal exceptions

DatabricksValidationException is linked to INPUT_VALIDATION_ERROR error code. Can you suggest an alternative exception.

We could create a new exception stating DatabricksInvalidStateException but I guess just passing INVALID_STATE to DatabricksException lgtm

samikshya-db · 2025-11-13T08:48:52Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java

+        "public Collection<ExternalLink> getResultChunks(statementId = {"
+            + statementId
+            + "}, chunkIndex = {"
+            + chunkIndex


nit: use string.format across this file

samikshya-db · 2025-11-13T09:07:21Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java

+              + externalLinks.get(0).getRowOffset()
+              + " context="
+              + context;
+      throw new DatabricksSQLException(error, DatabricksDriverErrorCode.INVALID_STATE);


nit : can we change this to DatabricksValidationSQLException

DatabricksValidationSQLException has error code INPUT_VALIDATION_ERROR, but this is an unexpected/invalid state. Does any other error code fit this use case?

INVALID STATE lgtm

samikshya-db · 2025-11-13T09:09:22Z

src/test/java/com/databricks/jdbc/integration/e2e/ThriftCloudFetchIntegrationTests.java

+import org.junit.jupiter.api.Test;
+
+/** Integration test to test CloudFetch link refetching using Thrift client. */
+public class ThriftCloudFetchIntegrationTests {


Do we run these tests on the repo automatically? If not, can you add that too? Or you can add fake service tests.

Need to figure this out. I have also added fake service tests.

Ok, in which case - this can be removed.

samikshya-db · 2025-11-13T09:42:27Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java

+      throw new DatabricksSQLException(error, DatabricksDriverErrorCode.INVALID_STATE);
+    }
+
+    // Subsequent fetches fetch from the next set of rows.


good point! But I was checking through the code too on this one : But from server code, this does appear to be stateful. The implementation is indeed correct.

https://github.com/databricks-eng/runtime/blob/f2e44cd5250c1523b06b3d075ac00ea34b2f5027/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/CloudStoreFetchIterator.scala#L106

https://github.com/databricks-eng/runtime/blob/f2e44cd5250c1523b06b3d075ac00ea34b2f5027/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/CloudStoreBasedResultHandler.scala#L117C27-L117C40

samikshya-db · 2025-11-13T09:59:44Z

Can you also fix the failing tests on the PR?

nikhilsuri-db · 2025-11-21T07:26:24Z

src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java

+  /**
+   * Returns the chunk provider for testing purposes.
+   *
+   * @return the chunk provider wrapped in Optional


Optional comment is not aligned with the return type.

nikhilsuri-db · 2025-11-21T07:37:33Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftAccessor.java

    return response;
  }

+  private TFetchResultsReq createFetchResultsReqWithDefaults(TOperationHandle operationHandle) {


Thanks for abstracting it in a single method

nikhilsuri-db · 2025-11-21T07:41:43Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java

-    AtomicInteger index = new AtomicInteger(0);
-    do {
-      fetchResultsResp = thriftAccessor.getResultSetResp(getOperationHandle(statementId), context);
+    AtomicLong index = new AtomicLong(chunkIndex);


Not related to your PR but why do we have AtomicLong inside a method context? Is not this just overhead?

Since the value is used inside a lambda function, Java allows only immutable/final variables inside lamdas.

- Fix method doc. - Use String.format to format a debug line.

Add fake integration tests to validate cloud fetch links re-fetch from a chunk index and start row offset works.

jayantsing-db · 2025-11-25T11:46:03Z

src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java

+    while (fetchResultsResp.hasMoreRows) {
+      fetchResultsResp = thriftAccessor.getResultSetResp(getOperationHandle(statementId));
      fetchResultsResp
          .getResults()
          .getResultLinks()
          .forEach(
              resultLink ->
                  externalLinks.add(createExternalLink(resultLink, index.getAndIncrement())));
-    } while (fetchResultsResp.hasMoreRows);
-    if (chunkIndex < 0 || externalLinks.size() <= chunkIndex) {
-      String error = String.format("Out of bounds error for chunkIndex. Context: %s", context);
-      LOGGER.error(error);
-      throw new DatabricksSQLException(error, DatabricksDriverErrorCode.INVALID_STATE);
    }


@tejassp-db , i fixed the link download service to now correctly trigger the link refresh (deadlock issue) on expiry in the case of Thrift. However, this API still has a while loop. The will again take fifteen minutes for a sufficiently large extract and we would be forced to do a refresh again. Instead of fetching all links, can we have an API that just send one fetch request starting from a chunk index like SEA? The link download service will automatically handle calling that API with an appropriate chunk index.

Fixed this. @jayantsing-db Can you please check if any assumptions break with this change?

Do not fetch all links starting at index in getResultChunks in thrift link fetch. Mimic behaviour of sdk client.

tejassp-db added 9 commits November 4, 2025 15:50

[PECOBLR-1131] Fetch Thrift links from a start offset.

e2000ad

In Thrift server the CloudFetch links cannot be fetched by chunk index. Changing code to fetch CloudFetch links from a start row offset.

[PECOBLR-1131] Fix chunk index in link creation.

84e129d

- Fix chunk index value in chunk creation, do not start from zero. - Fix assertions for the fetched links.

[PECOBLR-1131] Add integration tests.

7d198ec

Add integration tests to check refetch of links works correctly.

[PECOBLR-1131] Fix chunk fetch validation.

581b98d

Fix validation and error handling. Check for exact startRowOffset match.

[PECOBLR-1131] Unit tests for result fetch errors.

fbedf44

Add unit tests for thrift result fetch error paths.

[PECOBLR-1131] Revert visibility of a method.

58eaec5

Revert getNumRows visibility to protected.

[PECOBLR-1131] Throw exception when chunk absent.

60fc546

Throw exception when chunk is missing for a chunkIndex.

[PECOBLR-1131] Format as per conventions.

0794307

Merge branch 'main' into PECOBLR-1131

0e633e3

tejassp-db requested review from gopalldb, jayantsing-db, nikhilsuri-db and samikshya-db November 8, 2025 06:10

tejassp-db self-assigned this Nov 8, 2025

tejassp-db commented Nov 8, 2025

View reviewed changes

[PECOBLR-1131] Apply spotless formatter.

070069e

jayantsing-db reviewed Nov 10, 2025

View reviewed changes

tejassp-db and others added 2 commits November 10, 2025 14:12

[PECOBLR--1131] Minor fixes.

4d79420

- rename executeRequest to executeFetchRequest - moved response status check to after the call

Merge branch 'main' into PECOBLR-1131

13b33a0

samikshya-db approved these changes Nov 13, 2025

View reviewed changes

nikhilsuri-db reviewed Nov 21, 2025

View reviewed changes

tejassp-db and others added 3 commits November 25, 2025 10:21

Merge branch 'main' into PECOBLR-1131

51fca9e

[PECOBLR-1131] Minor fixes.

ece4152

- Fix method doc. - Use String.format to format a debug line.

[PECOBLR-1131] Add fake integration tests.

fdb10bd

Add fake integration tests to validate cloud fetch links re-fetch from a chunk index and start row offset works.

jayantsing-db reviewed Nov 25, 2025

View reviewed changes

tejassp-db added 2 commits November 28, 2025 16:07

[PECOBLR-1131] Fix failing unit tests.

eca4f03

[PECOBLR-1131] Fix tests.

25b7e5d

tejassp-db added 5 commits November 28, 2025 17:03

[PECOBLR-1131] Fix formatting issues.

933fbfc

PECOBLR-1131 Add Github actions for thrift integ tests.

6bd358f

PECOBLR-1131 Merge branch 'main' into PECOBLR-1131

ba249e9

PECOBLR-1131 Do not fetch all links in getResultChunks.

c649d57

Do not fetch all links starting at index in getResultChunks in thrift link fetch. Mimic behaviour of sdk client.

Merge branch 'main' into PECOBLR-1131

44453e8

[PECOBLR-1131] Fix incorrect refetching of expired CloudFetch links when using Thrift protocol. #1066

Are you sure you want to change the base?

[PECOBLR-1131] Fix incorrect refetching of expired CloudFetch links when using Thrift protocol. #1066

Uh oh!

Conversation

tejassp-db commented Nov 8, 2025

Description

Testing

Additional Notes to the Reviewer

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jayantsing-db left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samikshya-db left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samikshya-db commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!