Performance gap between GCS Python Client and boto3 for object downloads

#### Environment details

  - GCP VM instance: c4a-highcpu-72  
  - OS type and version: Linux, 6.1.0-31-cloud-arm64
  - Python version: Python 3.11.2
  - pip version: pip 23.0.1
  - `google-cloud-storage` version: 2.19.0 & 3.1.0

#### Steps to reproduce

When comparing download performance between Google Cloud Storage Python Client and boto3 (AWS SDK), we observed that GCS client is significantly slower (about 50% slower) than boto3 for downloading the same objects stored in a GCS bucket.

#### GCS Client Implementation (Two methods tested)
1. Using `blob.download_as_bytes()`:
```python
blob = bucket.blob(key)
data = blob.download_as_bytes()
```

2. Using `blob.open()` (10%-30% faster than method 1, but still 50% slower than boto3):
```python
with blob.open("rb") as f:
    data = f.read()
```

#### boto3 Implementation
```python
response = s3_client.get_object(Bucket=bucket_name, Key=key)
data = response['Body'].read()
```

#### Performance Results
With `ThreadPoolExecutor(max_workers=16)`, I got following average throughput downloading 64MB x 1000 objects from GCS bucket to memory:
- boto3 `get_object` : 12 Gbps
- GCS `download_as_bytes()`: 3.2 Gbps in 2.19.0 & 4.2 Gbps in 3.1.0
- GCS `blob.open()`: 4.5 Gbps in both 2.19.0 & 3.1.0


#### Questions
1. Is this performance gap expected?
2. Are there any recommended optimizations or best practices for improving download performance with the GCS Python client?
3. Are there any internal differences in how GCS supports S3-compatible APIs handling downloads that might explain the performance gap?

#### Additional Context
- We've tried various optimizations including:
  - Using `raw_download=True`
  - Configuring connection pools (Ref: https://stackoverflow.com/questions/52653409/increase-connection-pool-size)
  - Using different chunk sizes in `blob.open("rb", chunk_size=xxx)`
- The performance gap remains consistent across multiple test runs

Benchmarking scripts are available at https://github.com/dreamtalen/boto3-benchmark/tree/main/google-cloud-storage

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance gap between GCS Python Client and boto3 for object downloads #1458

Environment details

Steps to reproduce

GCS Client Implementation (Two methods tested)

boto3 Implementation

Performance Results

Questions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance gap between GCS Python Client and boto3 for object downloads #1458

Description

Environment details

Steps to reproduce

GCS Client Implementation (Two methods tested)

boto3 Implementation

Performance Results

Questions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions