Skip to content

vacuum is very slow on Cloudflare R2 #1366

Closed
@djouallah

Description

@djouallah

Environment

0.9

Binding:
Python

Environment:
Cloudflare R2


Bug

Edit : the issue is with vacuum, it is very slow for a delete operation

I am running a cloud function to vacuum and optimize a small delta table in Cloudflare R2, the table has currently 45 partition (per day( and every day, I insert 288 new small files.

the function take nearly 10 minute to finish, that's seems very slow, and I am not sure if it will scale later when the table increase in size

here is the code I use

from deltalake import DeltaTable
import os
delta_path = 's3://delta/scada'
storage_options = {
"Region": "us-east-1",   
"AWS_ACCESS_KEY_ID":     os.environ.get("aws_access_key_id_secret") ,
"AWS_SECRET_ACCESS_KEY": os.environ.get("aws_secret_access_key_secret")   ,   
"AWS_ENDPOINT_URL" :     os.environ.get("endpoint_url_secret") ,
"AWS_S3_ALLOW_UNSAFE_RENAME":"true"
}
def compaction(request):
    dt = DeltaTable(delta_path,storage_options=storage_options)
    dt.optimize()
    dt.vacuum(retention_hours=24,dry_run=False,  enforce_retention_duration=False)
    return 'done'

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions