Skip to content

Application for unzipping of zip file stored in S3 bucket without loading whole zip file in to client memory/disk

License

Notifications You must be signed in to change notification settings

madhub/S3StreamUnzip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This sample demonstrates unzipping of zip file stored in S3 bucket without loading whole zip file in to client memory/disk. It uses CSharpziplib zipinputstream to extract the zip entry iteratively.

For each zip entry found, if the size > 50 MB and temporary file will be created else it will be loaded in memory. It uses AWS TransferUtility to upload the extracted zip file on disk or in memory.

NOTE: Inspired from Java based s3 stream zip project https://github.com/nejckorasa/s3-stream-unzip

NOTE: If process is running in a memory constrained envionment consider useing workstation gc configuration More details of GC types Configure GC types

Features

  1. Maintain folder structure as in zip file.
  2. Zipped output can be in a separate bucket.
  3. Returns the list of object keys stored in the output bucket.
  4. Fully asynchronous.
  5. Uses AWS S3 TransferUtility to parallelize the uploads.
  6. If processing of any zip entry fails whole operation is stops.

NOTE:
Unzip may fail if you use Windows buitin Zip utility.Compress from Windows explorer creates Deflate64 if size > 2GB and Deflate64 is not supported by SharpZipLib because it not a standard compression type.

Sample usage.

dotnet S3StreamUnzip.dll <<inputbucket>>  <<input_zip_object_key>> <<outputbucket>> <<out_put_dir_prefix>> [<<s3_service_url>>]

Sample usage with local minio

dotnet S3StreamUnzip.dll myinputbucket  sample.zip outputBucket demo_prefix  http://127.0.0.1:9000

Sample usage with AWS S3. Will use credentails & region using .aws profile

dotnet S3StreamUnzip.dll myinputbucket  sample.zip outputBucket demo_prefix

Using the C# API

NOTE: Currently AWS TransferUtility configuration is not exposed to caller,if needed it can be allowed to configure via API. Default configuration of TransferUtilityConfig are
ConcurrentServiceRequests is 10
NumberOfUploadThreads is also 10

// logger
using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.AddFilter("S3StreamUnzip", LogLevel.Debug); // if you want to see less logs, set it to LogLevel.Information
    builder.ClearProviders();
    builder.AddConsole();
});
var s3StreamLogger = loggerFactory.CreateLogger("S3StreamUnzip");

// s3 client
AmazonS3Config config = new AmazonS3Config();
AmazonS3Client s3Client = new AmazonS3Client(config);

// call the api
S3UnzipManager unzipManager = new S3UnzipManager(s3Client, logger);
try
{
    var listOfExtractedObjectKeys = unzipManager.UnzipUsingCSharpziplib(inputBucketName, inputZipObjectKey, outputBucketName, string.Empty).GetAwaiter().GetResult();
    foreach (var objectKey in listOfExtractedObjectKeys)
    {
        Console.WriteLine(objectKey);
    }
}
catch (Exception exp)
{
    Console.WriteLine(exp.ToString());

}

NOTE: un zipped file in output bucket follows the folder structure as in zip file, need to make sure that path structure & name follows
S3 Object naming guidlines

About

Application for unzipping of zip file stored in S3 bucket without loading whole zip file in to client memory/disk

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages