Load and write AI models at wire speed

The Hugging Face Model Loading Library (aka HMLL) provides a set of low-level, modern and highly-efficient routines to fetch bytes from a storage device to a machine learning framework such as PyTorch, JAX, etc.

Along with byte fetching capabilities, HMLL exposes functions to interact with high-level tensor formats such as safetensors or GGUF through a consistent API.

The library currently supports fetching implementations for the following platforms:

Table 1. Backend(s) supported per platform

Platform	Backend	Use case
Linux	memory-mapped file (mmap)	Single-device, sequential reads
Linux	io_uring	Multi-devices, scattered reads (Tensor parallelism, Mixture-of-Experts)
MacOS	memory-mapped file (mmap)	Single-device, sequential reads
Windows	memory-mapped file (mmap)	Single-device, sequential reads

Getting Started

C/C++ project with CMake

The most straightforward way to integrate the library is through CMake, using FetchContent module.

Source 1. Backend(s) CMakeLists.txt and main.c

include(FetchContent)
FetchContent_Declare(hmll
    GIT_REPOSITORY https://github.com/mfuntowicz/hmll
    GIT_TAG main
)

# Link your target against hmll
target_link_libraries(<target_name> PRIVATE libhmll)

// main.c
#include <stdio.h>
#include <hmll/hmll.h>

int main(int argc, char **argv) {
    hmll_t ctx = {0};
    hmll_source_t src = {0};

    // Create an hmll_source
    if (hmll_check(hmll_source_open("/path/to/some/file", &src))) {
        fprintf(stderr, "Failed to open file: %s\n", hmll_strerr(ctx.error));
        return 1;
    }

    // Create an hmll loader instance, automatically choosing the right backend to use
    if (hmll_check(hmll_loader_init(&ctx, &src, 1, HMLL_DEVICE_CPU, HMLL_FETCHER_IO_URING))) {
        fprintf(stderr, "Failed to initialize loader: %s\n", hmll_strerr(ctx.error));
        return 2;
    }

    // Create the range you'd like to fetch and allocate a buffer to store the content
    hmll_range_t range = (struct hmll_range){ 500, 4000 };
    hmll_iobuf_t buffer = hmll_get_buffer_for_range(&ctx, ctx.fetcher->device, range);

    if (hmll_check(ctx.error)) {
        fprintf(stderr, "Failed to allocate destination buffer: %s\n", hmll_strerr(ctx.error));
        return 3;
    }

    // Fetch the bytes (0 here is the index of the file to fetch from)
    if (hmll_fetch(&ctx, 0, &buffer, range.start) < range.end - range.start) {
        fprintf(stderr, "Failed to fetch data: %s\n", hmll_strerr(ctx.error));
        return 4;
    }

    printf("Successfully fetched %u bytes\n", range.end - range.start)
    return 0;
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
cmake		cmake
examples		examples
include/hmll		include/hmll
lib		lib
scripts		scripts
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.adoc		README.adoc
hmll.pc.in		hmll.pc.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Load and write AI models at wire speed

Getting Started

C/C++ project with CMake

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

mfuntowicz/hmll

Folders and files

Latest commit

History

Repository files navigation

Load and write AI models at wire speed

Getting Started

C/C++ project with CMake

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages