Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I'm trying to use RecordBatchWriter
trait to introduce dynamism in the type of output - e.g. depending on the command line options, I'd like to write parquet, csv, json or ipc. I know that corresponding writers all implement this trait, thanks to #4206 and #4228 by @alexandreyc.
Specifically, I was planning to use Box<dyn RecordBatchWriter>
and pass it to the writing code, something like this:
let writer: Box<dyn RecordBatchWriter> = match args.format {
"parquet" => Box::new(parquet::arrow::ArrowWriter::try_new(file, schema, None)),
"csv" => Box::new(arrow::csv::Writer::new(file)),
...
}
writer.write(my_batch);
writer.close(); // this is required to write the formats correctly
Unfortunately, due to close
method in that trait consuming self
, we can't really use this approach, as now close() needs to know the exact type of the writer. Calling close() on the trait object gives compilation error "the size of dyn RecordBatchWriter
cannot be statically determined". See example in Rust playground.
Describe the solution you'd like
Add a finish(&mut self)
method to the trait. This is backwards-compatible, as well as an established pattern e.g. see parquet writers.
This will not make the trait object-safe, but will unblock usage as Box<dyn RecordBatchWriter>
.
Describe alternatives you've considered
Manually downcasting to specific types and then calling close() on them, but that doesn't seem to work, as RecordBatchWriter needs to explicitly have as_any
method or similar.