Skip to content

Conversation

nishkarsh-db
Copy link
Collaborator

@nishkarsh-db nishkarsh-db commented Aug 4, 2025

Description

Implemented request identification by explicitly passing the request type (e.g. THRIFT_METADATA, UC_VOLUME_PUT_INTO) (an enum) in the DatabricksHttpClient execute function, for all clients (Thrift, UCVolume, DBFS, Telemetry, AWS)

Testing

The old tests are successfully executing. New tests are yet to be added.

@nishkarsh-db nishkarsh-db changed the title successfully passing config in the execute function for Thrift Client… Successfull identification of request type in the execute function Aug 8, 2025
@nishkarsh-db nishkarsh-db marked this pull request as ready for review August 8, 2025 10:23
@nishkarsh-db nishkarsh-db changed the title Successfull identification of request type in the execute function [PECOBLR-450] Successfull identification of request type in the execute function Aug 8, 2025
* Helper method to set HTTP request configuration for volume operations based on operation type.
* Only works with Thrift client connections; SDK clients use their own retry logic.
*/
private void setVolumeOperationConfig(HTTPRequestType config) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to setVolumeOperationRequestType

}
} catch (Exception e) {
// Catch all exceptions (SQLException, NullPointerException, etc.)
// This ensures volume operations continue to work even if HTTP config setup fails
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the retry logic work in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wont work in this case, the request type will be NOT_SET which will be treated as non idempotent.


String listFilesSQLQuery = createListQuery(catalog, schema, volume, folder);

setVolumeOperationConfig(HTTPRequestType.UC_VOLUME_LIST);
Copy link
Collaborator

@shivam2680 shivam2680 Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better move this inside try block of createStatement. This indicates you are update request type just before sending request.

Applies to the entire class


boolean volumeOperationStatus = false;

setVolumeOperationConfig(HTTPRequestType.UC_VOLUME_GET_TO);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is UC_VOLUME_GET_TO? should this be UC_VOLUME_GET?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

if (isAllowedInputStreamForVolumeOperation) {
responseStream = databricksHttpClient.execute(httpGet);
responseStream =
databricksHttpClient.execute(httpGet, HTTPRequestType.DBFS_VOLUME_EXECUTE_GET);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use same request types across UcVolumeClient and DBFSVolumeClient

Comment on lines 13 to 20
UC_VOLUME_LIST,
UC_VOLUME_SHOW_VOLUMES,
UC_VOLUME_GET_TO,
UC_VOLUME_PUT_INTO,
UC_VOLUME_REMOVE,
DBFS_VOLUME_EXECUTE_GET,
DBFS_VOLUME_EXECUTE_PUT,
DBFS_VOLUME_EXECUTE_DELETE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we are having redundant enums for dbfs and uc volume clients? Please unify


// enumerates the types of HTTP requests

public enum RequestIdempotency {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we dont need this now, altough a similar enum will be needed in Retry Handler. WIll delete this file.

@Override
public CloseableHttpResponse execute(HttpUriRequest request, boolean supportGzipEncoding)
public void setCurrentRequestType(HTTPRequestType currentHTTPRequestType) {
if (this.currentHTTPRequestType == HTTPRequestType.NOT_SET) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention in comments, in what case NOT_SET becomes important. Document the UC volumes example we discussed


TCancelOperationResp cancelOperation(TCancelOperationReq req) throws DatabricksHttpException {
try {
setHttpRequestType(HTTPRequestType.THRIFT_CANCEL_OPERATION);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant setting of request type. you've already set the request type in getThriftResponse

Comment on lines -167 to -169
.setDefaultRequestConfig(makeRequestConfig(connectionContext.getSocketTimeout()))
.setRetryHandler(retryHandler)
.addInterceptorFirst(retryHandler);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't remove this until the new retry mechanism is ready. Please undo this chage

Comment on lines -159 to -162
DatabricksHttpRetryHandler retryHandler =
type.equals(HttpClientType.COMMON)
? new DatabricksHttpRetryHandler(connectionContext)
: new UCVolumeHttpRetryHandler(connectionContext);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we can rid of HttpClientType now that we have request type.


@Override
public CloseableHttpResponse execute(
HttpUriRequest request, HTTPRequestType config, boolean supportGzipEncoding)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintain consistent naming for params. requestType instead of config

Comment on lines 94 to 96
public void setHttpRequestType(HTTPRequestType config) {
this.httpClient.setCurrentRequestType(config);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintain consistent naming for params.

request.setHeader("Content-Encoding", "gzip");
}

currentHTTPRequestType = HTTPRequestType.NOT_SET;
Copy link
Collaborator

@shivam2680 shivam2680 Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state management feels a bit weird. Why are you updating to NOT_SET here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in setRequestType we are setting only when its previously NOT_SET

// Execute the request and handle the response
long httpRequestStartTime = System.currentTimeMillis();
try (CloseableHttpResponse response = httpClient.execute(request)) {
HTTPRequestType currentRequestConfig = httpClient.getCurrentRequestType();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to currentRequestType

context.fetchAndSetFlagsFromServer(httpClientMock, request);
assertTrue(context.isFeatureEnabled(FEATURE_FLAG_NAME));
verify(httpClientMock).execute(request);
verify(httpClientMock).execute(eq(request), isA(HTTPRequestType.class));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we are not specifying the actual HTTPRequestType across tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New tests for request Type verification are going to be added anyways, so didn't want to change the existing tests behaviour.

package com.databricks.jdbc.common;

public enum HTTPRequestType {
NOT_SET,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTPRequestType should be UNKNOWN instead of NOT_SET (I guess the weird nomenclature is because you're actually using RequestType as a state for the http client).

VOLUME_DELETE,
AUTH,
TELEMETRY_PUSH,
OTHER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this used for?

private final IDatabricksConnectionContext connectionContext;
private IdleConnectionEvictor idleConnectionEvictor;
private CloseableHttpAsyncClient asyncClient;
private HTTPRequestType currentHTTPRequestType;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treating this as a class level property makes this class thread unsafe, this is a request level property and the implementation must be in accordance

public void setCurrentRequestType(HTTPRequestType currentHTTPRequestType) {
/* The NOT_SET check is needed while executing UC volume requests, as they go through Thrift ExecuteStatement,
So, this function will be called 2 times (from UC Volume Client as well as Thrift Client). In order to avoid that conflict,
this check is required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a hacky way of doing things to bypass the request type being set at the class level

REQUEST_IDEMPOTENCY_MAP.put(HTTPRequestType.VOLUME_PUT, RequestIdempotency.NON_IDEMPOTENT);
REQUEST_IDEMPOTENCY_MAP.put(HTTPRequestType.NOT_SET, RequestIdempotency.NON_IDEMPOTENT);
REQUEST_IDEMPOTENCY_MAP.put(HTTPRequestType.OTHER, RequestIdempotency.NON_IDEMPOTENT);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we define this mapping as a part of HTTPRequestType enum

throws DatabricksHttpException {

HttpContext httpContext = new BasicHttpContext();
initializeRetryState(httpContext);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we maintain the retry counts/duration in the http context? what's the benefit of that?

Thread.currentThread().interrupt();
throw new RuntimeException("Thread interrupted during retry", e);
} catch (RuntimeException e) {
throwHttpException(e, request);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw RuntimeException for InterruptedException and HttpException for RuntimeException? what's the logic here?


public interface IRetryStrategy {
/* Tells whether a given HTTP status code is retriable or not */
boolean isStatusCodeRetriable(int statusCode, IDatabricksConnectionContext connectionContext);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should instead just do isRetriable, it should be oblivious to clients on why we chose to retry (status code, etc.), additionally, we should only retry if we have 503 + retry-after header, all those checks should be done by this class, take a look at: https://github.com/databricks/databricks-sdk-java/blob/d1de7b8a9a397fcf34e93120e43d8bc2f459d004/databricks-sdk-java/src/main/java/com/databricks/sdk/core/retry/RetryStrategy.java


// Last attempt check
if (attempt > MAX_RETRIES) {
return null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we return null? should we just not return the last response?

HttpStatus.SC_REQUEST_TOO_LONG,
HttpStatus.SC_REQUEST_URI_TOO_LONG,
HttpStatus.SC_UNSUPPORTED_MEDIA_TYPE,
HttpStatus.SC_REQUESTED_RANGE_NOT_SATISFIABLE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also have non retryable exceptions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants