This sample demonstrates unzipping of zip file stored in S3 bucket without loading whole zip file in to client memory/disk. It uses CSharpziplib zipinputstream to extract the zip entry iteratively.
For each zip entry found, if the size > 50 MB and temporary file will be created else it will be loaded in memory. It uses AWS TransferUtility to upload the extracted zip file on disk or in memory.
NOTE: Inspired from Java based s3 stream zip project https://github.com/nejckorasa/s3-stream-unzip
NOTE: If process is running in a memory constrained envionment consider useing workstation gc configuration More details of GC types Configure GC types
- Maintain folder structure as in zip file.
- Zipped output can be in a separate bucket.
- Returns the list of object keys stored in the output bucket.
- Fully asynchronous.
- Uses AWS S3 TransferUtility to parallelize the uploads.
- If processing of any zip entry fails whole operation is stops.
NOTE:
Unzip may fail if you use Windows buitin Zip utility.Compress from Windows explorer creates Deflate64 if size > 2GB and Deflate64 is not supported by SharpZipLib because it not a standard compression type.
dotnet S3StreamUnzip.dll <<inputbucket>> <<input_zip_object_key>> <<outputbucket>> <<out_put_dir_prefix>> [<<s3_service_url>>]
Sample usage with local minio
dotnet S3StreamUnzip.dll myinputbucket sample.zip outputBucket demo_prefix http://127.0.0.1:9000
Sample usage with AWS S3. Will use credentails & region using .aws profile
dotnet S3StreamUnzip.dll myinputbucket sample.zip outputBucket demo_prefix
NOTE: Currently AWS TransferUtility configuration is not exposed to caller,if needed it can be allowed to configure via API. Default configuration of TransferUtilityConfig are
ConcurrentServiceRequests is 10
NumberOfUploadThreads is also 10
// logger
using var loggerFactory = LoggerFactory.Create(builder =>
{
builder.AddFilter("S3StreamUnzip", LogLevel.Debug); // if you want to see less logs, set it to LogLevel.Information
builder.ClearProviders();
builder.AddConsole();
});
var s3StreamLogger = loggerFactory.CreateLogger("S3StreamUnzip");
// s3 client
AmazonS3Config config = new AmazonS3Config();
AmazonS3Client s3Client = new AmazonS3Client(config);
// call the api
S3UnzipManager unzipManager = new S3UnzipManager(s3Client, logger);
try
{
var listOfExtractedObjectKeys = unzipManager.UnzipUsingCSharpziplib(inputBucketName, inputZipObjectKey, outputBucketName, string.Empty).GetAwaiter().GetResult();
foreach (var objectKey in listOfExtractedObjectKeys)
{
Console.WriteLine(objectKey);
}
}
catch (Exception exp)
{
Console.WriteLine(exp.ToString());
}
NOTE: un zipped file in output bucket follows the folder structure as in zip file, need to make sure that path structure & name follows
S3 Object naming guidlines