Skip to content

Conversation

@clintropolis
Copy link
Member

@clintropolis clintropolis commented Sep 18, 2025

Description

This PR adds the capability to store segments in s3 without compressing with zip, similar to the 'local' deep storage option. This is mainly for experimentation at this point, but went ahead and documented it just in case anyone else wants to experiment 🤷

When druid.storage.zip is false, the load spec stores the prefix to use instead of the exact location, and the pullers/killers/movers when for a path that ends with / and then do a list operation and iterate over and apply the operation to the results.

);
}
catch (AmazonServiceException e) {
if (S3Utils.ERROR_ENTITY_TOO_LARGE.equals(S3Utils.getS3ErrorCode(e))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block is duplicated, would be good to dedupe it.

final URI uri = objectLocation.toUri(S3StorageDruidModule.SCHEME);
final ByteSource byteSource = getByteSource(uri);
final File outFile = new File(outDir, Paths.get(objectLocation.getPath()).getFileName().toString());
outFile.createNewFile();

Check notice

Code scanning / CodeQL

Ignored error status of call Note

Method getSegmentFiles ignores exceptional return value of File.createNewFile.
Comment on lines +93 to +103
private static final DataSegment DATA_SEGMENT_1_NO_ZIP = new DataSegment(
"test",
Intervals.of("2015-04-12/2015-04-13"),
"1",
ImmutableMap.of("bucket", TEST_BUCKET, "key", KEY_1 + "/"),
null,
null,
NoneShardSpec.instance(),
0,
1
);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
DataSegment.DataSegment
should be avoided because it has been deprecated.
Comment on lines +105 to +115
private static final DataSegment DATA_SEGMENT_2_NO_ZIP = new DataSegment(
"test",
Intervals.of("2015-04-13/2015-04-14"),
"1",
ImmutableMap.of("bucket", TEST_BUCKET, "key", KEY_2 + "/"),
null,
null,
NoneShardSpec.instance(),
0,
1
);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
DataSegment.DataSegment
should be avoided because it has been deprecated.
object1.setBucketName(bucket);
object1.setKey(keyPrefix + "meta.smoosh");
object1.getObjectMetadata().setLastModified(new Date(0));
object1.setObjectContent(new FileInputStream(tmpFile));

Check warning

Code scanning / CodeQL

Potential input resource leak Warning test

This FileInputStream is not always closed on method exit.
object2.setBucketName(bucket);
object2.setKey(keyPrefix + "00000.smoosh");
object2.getObjectMetadata().setLastModified(new Date(0));
object2.setObjectContent(new FileInputStream(tmpFile));

Check warning

Code scanning / CodeQL

Potential input resource leak Warning test

This FileInputStream is not always closed on method exit.
@clintropolis clintropolis merged commit 5d26ebb into apache:master Sep 19, 2025
62 checks passed
@clintropolis clintropolis deleted the s3_zip_no_zip branch September 19, 2025 10:49
@cecemei cecemei added this to the 35.0.0 milestone Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants