Skip to content

Conversation

@azkrishpy
Copy link
Contributor

@azkrishpy azkrishpy commented Oct 29, 2025

Issue #, if available:
#806
Description of changes:
Exposes S3 Request Metrics from the CRT S3 Client.

  • S3RequestMetrics: Captures timing, request/response info, and error details for each S3 request attempt
  • onTelemetry() callback: New method in S3MetaRequestResponseHandler invoked after each request completes
  • ErrorType classification: Categorizes errors (SUCCESS, THROTTLING, SERVER_ERROR, CONFIGURED_TIMEOUT, IO, OTHER)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@azkrishpy
Copy link
Contributor Author

Java SDK users can use a custom MetricPublisher that bridges CRT metrics to SDK's MetricCollection:

public class CrtMetricPublisher implements MetricPublisher {
    private final MetricPublisher delegate;

    public CrtMetricPublisher(MetricPublisher delegate) {
        this.delegate = delegate;
    }

    public void publishCrtMetrics(S3RequestMetrics crtMetrics) {
        MetricCollection collection = MetricCollection.builder()
            .name("S3Request")
            .creationTime(Instant.now())
            .putMetric(CoreMetric.API_CALL_DURATION, 
                Duration.ofNanos(crtMetrics.getApiCallDurationNs()))
            .putMetric(CoreMetric.SERVICE_ID, crtMetrics.getServiceId())
            .putMetric(CoreMetric.OPERATION_NAME, crtMetrics.getOperationName())
            .putMetric(CoreMetric.RETRY_COUNT, crtMetrics.getRetryCount())
            .putMetric(CoreMetric.AWS_REQUEST_ID, crtMetrics.getAwsRequestId())
            .putMetric(CoreMetric.AWS_EXTENDED_REQUEST_ID, crtMetrics.getAwsExtendedRequestId())
            .putMetric(CoreMetric.BACKOFF_DELAY_DURATION, 
                Duration.ofNanos(crtMetrics.getBackoffDelayDurationNs()))
            .putMetric(CoreMetric.SERVICE_CALL_DURATION, 
                Duration.ofNanos(crtMetrics.getServiceCallDurationNs()))
            .putMetric(CoreMetric.SIGNING_DURATION, 
                Duration.ofNanos(crtMetrics.getSigningDurationNs()))
            .build();

        delegate.publish(collection);
    }

    @Override
    public void publish(MetricCollection metricCollection) {
        delegate.publish(metricCollection);
    }

    @Override
    public void close() {
        delegate.close();
    }
}

Usage

CrtMetricPublisher metricPublisher = new CrtMetricPublisher(
    LoggingMetricPublisher.create());

S3MetaRequestResponseHandler responseHandler = new S3MetaRequestResponseHandler() {
    @Override
    public void onTelemetry(S3RequestMetrics metrics) {
        metricPublisher.publishCrtMetrics(metrics);
    }

    @Override
    public void onFinished(S3FinishedResponseContext context) {
        // Handle completion
    }
};

S3MetaRequestOptions options = new S3MetaRequestOptions()
    .withMetaRequestType(MetaRequestType.GET_OBJECT)
    .withHttpRequest(httpRequest)
    .withResponseHandler(responseHandler);

try (S3MetaRequest request = client.makeMetaRequest(options)) {
    // Request executes, metrics published via onTelemetry callback
}

@azkrishpy
Copy link
Contributor Author

The following metrics are not available from CRT for java since sdk has the information:

  • CredentialsFetchDuration
  • EndpointResolveDuration
  • MarshallingDuration
  • ServiceEndpoint

return this.extendedRequestId;
}

public String getAwsRequestId() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multipart/ranged get requests, which request is this for? The first request? Same question for other metrics such as service call duration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both? Request metrics are made available for any request that involves communicating with S3. If it is a multipart request, each part gets its own metrics data. if the requests are ranged gets, each get gets its own metrics object.

CRT retries request (upload part, range-get) level failures up to 5 times and one of the metrics by definition requires an aggregation (API Call duration). API call duration would be zero for failures but the successful final attempt aggregates the duration correctly.

The rest of the metrics are unique by nature. Does this help?

}

/**
* Invoked to report telemetry of partial execution of meta request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by partial execution? When is this invoked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are multiple parallelized request created for a single meta request, each request gets its own response with metrics. Although the meta request might not have been fully executed, the request might be complete and hence customers receive the metric data for that request. This is what I meant by partial execution of meta request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, can we update the documentation to make it more clear?

/**
* Metrics collected upon completion of an S3 Request
*/
public class S3RequestMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@azkrishpy azkrishpy Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/metrics-list.html

the docs did not contain details of HTTP metrics being delivered to user so I had omitted the same. However, if necessary it is an easy change since the members are still privately available within the metrics object and I can expose additional public methods for access. Please do let me know if this is an issue. Thanks for the review!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if we can add HTTP metrics, that'd be great. They are super helpful for troubleshooting purposes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants