-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The ordering of metadata is not consistent since it uses a HashMap. It can be useful in unit tests to verify an output from a known hash of it's serialized values. With metadata this is not consistent.
Describe the solution you'd like
Describe alternatives you've considered
We could switch from HashMap to another map implementation or make the HashMap generic so the user could provide the hash function. I believe the solution in the PR above is a better approach.
Additional context
Here is a minimal reproducible example. If you run it multiple times you will get different output.
use std::{hash::Hasher, sync::Arc};
use arrow::{array::RecordBatch, datatypes::Schema};
fn main() {
let schema = Arc::new(
Schema::empty().with_metadata(
[
("a".to_owned(), "1".to_owned()), //
("b".to_owned(), "2".to_owned()), //
("c".to_owned(), "3".to_owned()), //
("d".to_owned(), "4".to_owned()), //
("e".to_owned(), "5".to_owned()), //
]
.into_iter()
.collect(),
),
);
let batch = RecordBatch::new_empty(schema.clone());
dbg!(&batch.schema().metadata().keys());
let mut bytes = Vec::new();
let mut w = arrow::ipc::writer::StreamWriter::try_new(&mut bytes, &schema).unwrap();
w.write(&batch).unwrap();
w.finish().unwrap();
let mut h = std::hash::DefaultHasher::new();
h.write(&bytes);
let h = h.finish();
eprintln!("{} bytes -- h = {h:x}", bytes.len());
}alamb
Metadata
Metadata
Assignees
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelog