Skip to content

Performance of contiguous datasets #45

@fmckeogh

Description

@fmckeogh

On my contiguous dataset, any read (even of a single element) results in all 35GB being read into memory.

use hidefix::{prelude::Index, reader::ReaderExt};
use std::sync::Arc;

const PATH: &str = "/home/fm208/Downloads/pubmed/benchmark-dev-pubmed23.h5";

fn main() {
    let i = Arc::new(Index::index(PATH).unwrap());
    let mut r = i.reader("train").unwrap();
    let values = r.values::<f32, _>([0..1, 0..1]).unwrap();

    panic!("{values:?}");
}

In read_to on CacheReader, self.ds.chunk_slices returns one giant chunk, which when passed into read_chunk, loads the entire file. hdf5-metno does not have this behavior.

I'm not sure what changes would be needed to hidefix to improve this, do you have any suggestions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions