Skip to content

Deterministic metadata encoding #7448

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The ordering of metadata is not consistent since it uses a HashMap. It can be useful in unit tests to verify an output from a known hash of it's serialized values. With metadata this is not consistent.

Describe the solution you'd like

#7437

Describe alternatives you've considered

We could switch from HashMap to another map implementation or make the HashMap generic so the user could provide the hash function. I believe the solution in the PR above is a better approach.

Additional context

Here is a minimal reproducible example. If you run it multiple times you will get different output.

use std::{hash::Hasher, sync::Arc};

use arrow::{array::RecordBatch, datatypes::Schema};

fn main() {
    let schema = Arc::new(
        Schema::empty().with_metadata(
            [
                ("a".to_owned(), "1".to_owned()), //
                ("b".to_owned(), "2".to_owned()), //
                ("c".to_owned(), "3".to_owned()), //
                ("d".to_owned(), "4".to_owned()), //
                ("e".to_owned(), "5".to_owned()), //
            ]
            .into_iter()
            .collect(),
        ),
    );
    let batch = RecordBatch::new_empty(schema.clone());

    dbg!(&batch.schema().metadata().keys());

    let mut bytes = Vec::new();
    let mut w = arrow::ipc::writer::StreamWriter::try_new(&mut bytes, &schema).unwrap();
    w.write(&batch).unwrap();
    w.finish().unwrap();

    let mut h = std::hash::DefaultHasher::new();
    h.write(&bytes);
    let h = h.finish();

    eprintln!("{} bytes -- h = {h:x}", bytes.len());
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions