Skip to content

mfuntowicz/hmll

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Load and write AI models at wire speed

C
C++
Python
Rust

The Hugging Face Model Loading Library (aka HMLL) provides a set of low-level, modern and highly-efficient routines to fetch bytes from a storage device to a machine learning framework such as PyTorch, JAX, etc.

Along with byte fetching capabilities, HMLL exposes functions to interact with high-level tensor formats such as safetensors or GGUF through a consistent API.

The library currently supports fetching implementations for the following platforms:

Table 1. Backend(s) supported per platform
Platform Backend Use case

Linux

memory-mapped file (mmap)

Single-device, sequential reads

Linux

io_uring

Multi-devices, scattered reads (Tensor parallelism, Mixture-of-Experts)

MacOS

memory-mapped file (mmap)

Single-device, sequential reads

Windows

memory-mapped file (mmap)

Single-device, sequential reads

Getting Started

C/C++ project with CMake

The most straightforward way to integrate the library is through CMake, using FetchContent module.

Source 1. Backend(s) CMakeLists.txt and main.c
include(FetchContent)
FetchContent_Declare(hmll
    GIT_REPOSITORY https://github.com/mfuntowicz/hmll
    GIT_TAG main
)

# Link your target against hmll
target_link_libraries(<target_name> PRIVATE libhmll)
// main.c
#include <stdio.h>
#include <hmll/hmll.h>

int main(int argc, char **argv) {
    hmll_t ctx = {0};
    hmll_source_t src = {0};

    // Create an hmll_source
    if (hmll_check(hmll_source_open("/path/to/some/file", &src))) {
        fprintf(stderr, "Failed to open file: %s\n", hmll_strerr(ctx.error));
        return 1;
    }

    // Create an hmll loader instance, automatically choosing the right backend to use
    if (hmll_check(hmll_loader_init(&ctx, &src, 1, HMLL_DEVICE_CPU, HMLL_FETCHER_IO_URING))) {
        fprintf(stderr, "Failed to initialize loader: %s\n", hmll_strerr(ctx.error));
        return 2;
    }

    // Create the range you'd like to fetch and allocate a buffer to store the content
    hmll_range_t range = (struct hmll_range){ 500, 4000 };
    hmll_iobuf_t buffer = hmll_get_buffer_for_range(&ctx, ctx.fetcher->device, range);

    if (hmll_check(ctx.error)) {
        fprintf(stderr, "Failed to allocate destination buffer: %s\n", hmll_strerr(ctx.error));
        return 3;
    }

    // Fetch the bytes (0 here is the index of the file to fetch from)
    if (hmll_fetch(&ctx, 0, &buffer, range.start) < range.end - range.start) {
        fprintf(stderr, "Failed to fetch data: %s\n", hmll_strerr(ctx.error));
        return 4;
    }

    printf("Successfully fetched %u bytes\n", range.end - range.start)
    return 0;
}

About

HMLL - High-Performance Model Loading Library for Efficient AI Model I/O

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •