Ability to append to an existing directory of parquet files with new partitions (mode=append) #18750
Labels
enhancement
New feature or an improvement of an existing feature
needs decision
Awaiting decision by a maintainer
Description
Hey.
Spark has
mode=append
for writing parquet files. This is kind of useful, it just adds more partitions to the folder of an existing dataset. Great for writing in batches across multiple runs.How would you solve this in polars? I know adding data to an existing parquet file is a whole different game but just adding more files should be fairly OK no? I suspect, just not overwriting / deleting the whole existing folder structure should do the trick.
Edit
Digging into this, I realize there's a way already with partitioned data when the partition we write to is unique / always new (e.g. by generating a run_id column)
Polars writes parquet like this
and yarrow has default behavior
overwrite_or_ignore
so it should just add more files and ignore the existing ones. Exactly what I was looking for. Will whip up quick example.
The text was updated successfully, but these errors were encountered: