Skip to content

Conversation

@ShreyasSinha
Copy link

  1. Adding RequestBody for MPU which internally uses RewindableContent.
  2. Implementing checksum related functions in RewindableContent.
    Refer: Support XML MPU in GCS Java SDK #3348

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: storage Issues related to the googleapis/java-storage API. labels Oct 23, 2025
Comment on lines 188 to 203
String getMd5() throws IOException {
try (SeekableByteChannel in = Files.newByteChannel(path, StandardOpenOption.READ)) {
in.position(readOffset);
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] buffer = new byte[8192];
ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
while (in.read(byteBuffer) != -1) {
byteBuffer.flip();
md.update(byteBuffer);
byteBuffer.clear();
}
return Base64.getEncoder().encodeToString(md.digest());
} catch (NoSuchAlgorithmException e) {
throw new IOException(e);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to send MD5 to gcs, or can we send just the crc32c?

There's not much use in computing both.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do without MD5 for now.
Will add it later if we get a strong pushback from Salesforce.

Comment on lines 334 to 341
com.google.common.hash.Hasher crc32cHasher = Hashing.crc32c().newHasher();
for (ByteBuffer buffer : buffers) {
crc32cHasher.putBytes(buffer.duplicate());
}
int crc32cInt = crc32cHasher.hash().asInt();
ByteBuffer bb = ByteBuffer.allocate(4);
bb.putInt(crc32cInt);
return Base64.getEncoder().encodeToString(bb.array());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use our existing crc32c infrastructure and options.

Suggested change
com.google.common.hash.Hasher crc32cHasher = Hashing.crc32c().newHasher();
for (ByteBuffer buffer : buffers) {
crc32cHasher.putBytes(buffer.duplicate());
}
int crc32cInt = crc32cHasher.hash().asInt();
ByteBuffer bb = ByteBuffer.allocate(4);
bb.putInt(crc32cInt);
return Base64.getEncoder().encodeToString(bb.array());
GuavaHasher hasher;
{
Hasher defaultHasher = Hasher.defaultHasher();
if (defaultHasher instanceof NoOpHasher) {
return null;
} else {
hasher = Hasher.enabled();
}
}
Crc32cLengthKnown cumulative = Crc32cValue.zero();
for (ByteBuffer buffer : buffers) {
cumulative = cumulative.concat(hasher.hash(buffer::duplicate));
}
return Utils.crc32cCodec.encode(cumulative.getValue());

Comment on lines 124 to 131
String getMd5() throws IOException {
return "";
}

@Override
String getCrc32c() throws IOException {
return "";
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values aren't correct. Empty bytes do have md5 and crc32c values that aren't empty string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP changes. It was a draft PR.

Comment on lines 62 to 64
abstract String getMd5() throws IOException;

abstract String getCrc32c() throws IOException;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the best place for this logic to live. Especially in the case of the File based one, the entire file would have to be read from disk before processing could proceed. gRPC has streaming checksum support, and JSON should be getting it soon as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me raise this PR along with UploadParts changes only, it would make it easierin terms of review.

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
import java.util.Base64; // Changed from com.google.api.client.util.Base64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is new, not changed from a previous value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP changes. It was a draft PR.

Preconditions.checkArgument(
offset <= totalLength,
"provided offset must be less than or equal to totalLength (%s <= %s)",
"provided offset must be less than orequal to totalLength (%s <= %s)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this please

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP changes. It was a draft PR.

Comment on lines 523 to 527
assertThrows(
HttpResponseException.class,
() ->
multipartUploadHttpRequestManager.sendAbortMultipartUploadRequest(
endpoint, request, httpStorageOptions));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this change, please revert here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP changes. It was a draft PR.

@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Oct 28, 2025
public UploadPartResponse uploadPart(UploadPartRequest request, RequestBody requestBody) {
return retrier.run(
retryAlgorithmManager.idempotent(),
() -> httpRequestManager.sendUploadPartRequest(uri, request, requestBody),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to do dirty state tracking, and to actually rewind the content before a followup invocation. See

AtomicBoolean dirty = new AtomicBoolean(false);
return retrier.run(
() -> {
if (dirty.getAndSet(true)) {
ResumableOperationResult<@Nullable StorageObject> query = query();
long persistedSize = query.getPersistedSize();
if (contentRange.endOffsetEquals(persistedSize) || query.getObject() != null) {
return query;
} else {
task.rewindTo(persistedSize);
}
}
return task.call();
},
Decoder.identity());
for how this is done for json resumable uploads.

Additionally, this should pass the RewindableContent to sendUploadPartRequest so that this method manages the lifecycle of it and integrates correctly with the retries.

return false;
}

abstract String getCrc32c();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: return Crc32cLengthKnown and let the consumer of this value own the logic of encoding the value for the respected use.

For example, in grpc, crc32c values are ints not base64 encoded 4-byte big-endian representation of an unsigned 32-bit integer.

Comment on lines +155 to +157
private void addHeadersForUploadPart(RequestBody requestBody, HttpHeaders headers) {
headers.put("x-goog-hash", "crc32c=" + requestBody.getContent().getCrc32c());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one-line logic-less helper methods don't help

Is there a plan to add more in here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the googleapis/java-storage API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants