Skip to content

Performance gap between GCS Python Client and boto3 for object downloads #1458

Open
@dreamtalen

Description

@dreamtalen

Environment details

  • GCP VM instance: c4a-highcpu-72
  • OS type and version: Linux, 6.1.0-31-cloud-arm64
  • Python version: Python 3.11.2
  • pip version: pip 23.0.1
  • google-cloud-storage version: 2.19.0 & 3.1.0

Steps to reproduce

When comparing download performance between Google Cloud Storage Python Client and boto3 (AWS SDK), we observed that GCS client is significantly slower (about 50% slower) than boto3 for downloading the same objects stored in a GCS bucket.

GCS Client Implementation (Two methods tested)

  1. Using blob.download_as_bytes():
blob = bucket.blob(key)
data = blob.download_as_bytes()
  1. Using blob.open() (10%-30% faster than method 1, but still 50% slower than boto3):
with blob.open("rb") as f:
    data = f.read()

boto3 Implementation

response = s3_client.get_object(Bucket=bucket_name, Key=key)
data = response['Body'].read()

Performance Results

With ThreadPoolExecutor(max_workers=16), I got following average throughput downloading 64MB x 1000 objects from GCS bucket to memory:

  • boto3 get_object : 12 Gbps
  • GCS download_as_bytes(): 3.2 Gbps in 2.19.0 & 4.2 Gbps in 3.1.0
  • GCS blob.open(): 4.5 Gbps in both 2.19.0 & 3.1.0

Questions

  1. Is this performance gap expected?
  2. Are there any recommended optimizations or best practices for improving download performance with the GCS Python client?
  3. Are there any internal differences in how GCS supports S3-compatible APIs handling downloads that might explain the performance gap?

Additional Context

Benchmarking scripts are available at https://github.com/dreamtalen/boto3-benchmark/tree/main/google-cloud-storage

Thanks!

Metadata

Metadata

Assignees

Labels

api: storageIssues related to the googleapis/python-storage API.type: questionRequest for information or clarification. Not an issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions