Arc partition values in TableSchema#19137
Conversation
alamb
left a comment
There was a problem hiding this comment.
Looks good to me -- I recommend Fields but otherwise 👍
| /// These columns are NOT present in the data files but are appended to each | ||
| /// row during query execution based on the file's location. | ||
| table_partition_cols: Vec<FieldRef>, | ||
| table_partition_cols: Arc<Vec<FieldRef>>, |
There was a problem hiding this comment.
I recommend we use https://docs.rs/arrow/latest/arrow/datatypes/struct.Fields.html (which is already basically Arc<Vec>) but can be constructed zero copy potentially from existing schemas
There was a problem hiding this comment.
I tried this but it:
- Is incompatible with your recommendation in https://github.com/apache/datafusion/pull/19137/files/fd79a3ba89e9c5d45796aa8fbb7782bf1413bde4#r2598642330 because we can't do
Arc::make_mut(&mut Fields)andFieldsdoesn't expose mutablility. - It would require changing the public API of
TableSchema::partition_fields()
I also am not that worried about construction cost (that happens once in FileSource, it's the mutliple copies for each FIleOpener that worry me.
In any case since the current change is fully backwards compatible API wise and is internally contained we can always come back and use Fields later, so I will leave that as a followup.
| @@ -140,7 +140,18 @@ impl TableSchema { | |||
| /// into [`TableSchema::with_table_partition_cols`] if you have partition columns at construction time | |||
| /// since it avoids re-computing the table schema. | |||
| pub fn with_table_partition_cols(mut self, partition_cols: Vec<FieldRef>) -> Self { | |||
There was a problem hiding this comment.
if you used Fields you could also pass in impl Into<Fields>
| self.table_partition_cols = Arc::new(partition_cols); | ||
| } else { | ||
| // Append to existing partition columns | ||
| self.table_partition_cols = Arc::new( |
There was a problem hiding this comment.
I recommend Arc::make_mut here to avoid the allocation if possible
76a5ca5 to
0200a22
Compare
0200a22 to
b981959
Compare
This should avoid some clones.
I also fixed a bug when
with_partition_valuesis called multiple times on the same instance; it now appends the partition values.