Closed
Description
Environment
0.9
Binding:
Python
Environment:
Cloudflare R2
Bug
Edit : the issue is with vacuum, it is very slow for a delete operation
I am running a cloud function to vacuum and optimize a small delta table in Cloudflare R2, the table has currently 45 partition (per day( and every day, I insert 288 new small files.
the function take nearly 10 minute to finish, that's seems very slow, and I am not sure if it will scale later when the table increase in size
here is the code I use
from deltalake import DeltaTable
import os
delta_path = 's3://delta/scada'
storage_options = {
"Region": "us-east-1",
"AWS_ACCESS_KEY_ID": os.environ.get("aws_access_key_id_secret") ,
"AWS_SECRET_ACCESS_KEY": os.environ.get("aws_secret_access_key_secret") ,
"AWS_ENDPOINT_URL" : os.environ.get("endpoint_url_secret") ,
"AWS_S3_ALLOW_UNSAFE_RENAME":"true"
}
def compaction(request):
dt = DeltaTable(delta_path,storage_options=storage_options)
dt.optimize()
dt.vacuum(retention_hours=24,dry_run=False, enforce_retention_duration=False)
return 'done'
Activity