Skip to content

File descriptor caching for value log #106

@marvin-j97

Description

@marvin-j97

Currently, when doing a range read over an uncached bunch of blobs, each blob will incur an fopen() syscall.
If the blobs sit in the same blob file, the repeated calls of fopen() can be cached away.

Blob files should also be cached like Segment files are, this needs an adjustment of the descriptor table to allow both LSM segment files and blob files to be stored (probably some kind of compound key), perhaps simply using quick-cache. Because we want to globally cap file descriptor usage, there needs to be a single descriptor cache that houses both segment and blob files. I would recommend rewriting the DescriptorTable, because it's bad.

Blocked by https://github.com/fjall-rs/value-log/issues/9 because the value log needs a new generic parameter to acquire a file descriptor (using a compound key ValueLogId + BlobFileId).

Benchmark of current behaviour, scanning over 5 x 4K blobs.

40% of the runtime sits in fopen(). One fopen() sits at around ~1µs per call.

Image

Reproduction

use lsm_tree::{AbstractTree, BlobCache, BlockCache};
use std::{path::Path, sync::Arc};

fn main() -> lsm_tree::Result<()> {
    let path = Path::new(".lsmdata");
    if path.try_exists()? {
        std::fs::remove_dir_all(path)?;
    }

    let tree = lsm_tree::Config::new(path)
        .compression(lsm_tree::CompressionType::None)
        .blob_compression(lsm_tree::CompressionType::None)
        .block_cache(Arc::new(BlockCache::with_capacity_bytes(1_000_000_000)))
        .blob_cache(Arc::new(BlobCache::with_capacity_bytes(0)))
        .blob_file_separation_threshold(1)
        .open_as_blob_tree()?;

    {
        let value = "a".repeat(4_096);

        for k in 'a'..='e' {
            tree.insert((k as u8).to_be_bytes(), &value, 0);
        }
        tree.flush_active_memtable(0)?;
    }

    for _ in 0..1_000_000 {
        assert_eq!(5, tree.values(None, None).count());
    }

    Ok(())
}

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions