Skip to content

write_deltalake is not creating checkpoints #1815

Closed
@yefetBenTili

Description

@yefetBenTili

Delta-rs version: 0.10.0

Binding:

Environment:
Cloud provider: AWS
OS: macOs
Other:


We have a Delta Lake on S3 with over 2TB of data, which we write to daily. using we use write_deltalake (writing new partitions every day with partition filters)

We noticed a significant decline in read performance after a few weeks. which led to further investigation I discovered that no checkpoint files were being written. Currently, I am at over 4000 transaction JSON files, and no checkpoint file is there.

As far as I know, Delta's default behavior includes checkpointing after the 10th version. Is there a way to enforce this or trigger it manually?

    write_deltalake(
        df
        mode="overwrite",
        schema=config.persrec_history_schema,
        storage_options={"AWS_S3_ALLOW_UNSAFE_RENAME": "True"},
        partition_by=[*partition_dict.keys()],
        partition_filters= partiton_filters],
    )

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions