Closed
Description
Delta-rs version: 0.10.0
Binding:
Environment:
Cloud provider: AWS
OS: macOs
Other:
We have a Delta Lake on S3 with over 2TB of data, which we write to daily. using we use write_deltalake
(writing new partitions every day with partition filters
)
We noticed a significant decline in read performance after a few weeks. which led to further investigation I discovered that no checkpoint files were being written. Currently, I am at over 4000 transaction JSON files, and no checkpoint file is there.
As far as I know, Delta's default behavior includes checkpointing after the 10th version. Is there a way to enforce this or trigger it manually?
write_deltalake(
df
mode="overwrite",
schema=config.persrec_history_schema,
storage_options={"AWS_S3_ALLOW_UNSAFE_RENAME": "True"},
partition_by=[*partition_dict.keys()],
partition_filters= partiton_filters],
)
Activity