LLeaves Model Compiler

Compile LightGBM models to optimized shared libraries (.so, .dylib, or .o files) using lleaves.

Features

Single model compilation - Compile one model at a time
Batch compilation - Compile multiple models in parallel
Flexible output formats - .so (Linux), .dylib (macOS), or .o (object files)
Auto-detect linker - Works with gcc, clang, or cc
Progress tracking - Real-time compilation status and timing
Error reporting - Detailed error messages for debugging

Installation

Using uv (Recommended)

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

Using pip

python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Quick Start

Compile a Single Model

# Basic usage (creates .dylib on macOS, .so on Linux)
python main.py model.txt ./output

# Specify output format
python main.py model.txt ./output .so

# Keep intermediate .o files
python main.py model.txt ./output .so --keep-o

# Just create .o file (no linking)
python main.py model.txt ./output .o

Compile Multiple Models in Parallel

# Compile all models in a folder
python batch_compile.py ./models ./output

# Specify output format and workers
python batch_compile.py ./models ./output .so --workers 4

# Keep .o files
python batch_compile.py ./models ./output .so --keep-o

# Or use the shell script wrapper
./batch-compile.sh ./models ./output 4 .so

Usage

main.py - Single Model Compilation

Compile a single LightGBM model file to a shared library.

Syntax:

python main.py <input_file> <output_folder> [output_format] [--keep-o]

Arguments:

input_file - Path to .txt model file to compile
output_folder - Path to folder where compiled files will be saved
output_format - Output format: .dylib, .so, or .o (default: .dylib)
--keep-o - Keep .o files when creating .dylib/.so (optional)

Examples:

# Create .so file for Linux
python main.py rdt_1.txt ./output .so

# Create .dylib for macOS
python main.py rdt_1.txt ./output .dylib

# Just compile to .o (no linking)
python main.py rdt_1.txt ./output .o

# Create .so and keep .o file
python main.py rdt_1.txt ./output .so --keep-o

batch_compile.py - Parallel Batch Compilation

Compile multiple models in parallel using multiple CPU cores.

Syntax:

python batch_compile.py <models_folder> <output_folder> [output_format] [--keep-o] [--workers N]

Arguments:

models_folder - Path to folder containing .txt model files
output_folder - Path to folder where compiled files will be saved
output_format - Output format: .dylib, .so, or .o (default: .so)
--keep-o - Keep .o files when creating .dylib/.so (optional)
--workers N - Number of parallel workers (default: CPU count)

Examples:

# Compile all models using all CPU cores
python batch_compile.py ./models ./output

# Use 4 workers
python batch_compile.py ./models ./output .so --workers 4

# Compile to .o files only (fast, no linking)
python batch_compile.py ./models ./output .o

# Keep intermediate .o files
python batch_compile.py ./models ./output .so --keep-o --workers 8

Converting .o to .so

If you have .o files and want to convert them to .so files later:

# Single file
gcc -shared input.o -o output.so

# All .o files in a folder
for f in *.o; do gcc -shared "$f" -o "${f%.o}.so"; done

# Or use clang
clang -shared input.o -o output.so

# Or use cc
cc -shared input.o -o output.so

Verification

Check if .so file is valid

# Check file type
file output.so

# Expected output for Linux:
# output.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV)

# Expected output for macOS:
# output.so: Mach-O 64-bit dynamically linked shared library arm64

# Read ELF header (Linux only)
readelf -h output.so

# Check architecture
objdump -f output.so

Quick validation

# Check all .so files
file *.so

# Verify ELF format (Linux)
file output.so | grep -q "ELF 64-bit" && echo "Valid ELF" || echo "Not ELF"

Integration with Production

Loading in Go

/*
#cgo LDFLAGS: -ldl
#include <dlfcn.h>
*/
import "C"
import "unsafe"

func loadModel(path string) {
    cPath := C.CString(path)
    defer C.free(unsafe.Pointer(cPath))
    lib := C.dlopen(cPath, C.RTLD_LAZY)
    if lib == nil {
        panic("Failed to load model")
    }
    // Use the model...
}

Loading in Python

import ctypes

# Load the compiled model
lib = ctypes.CDLL("/path/to/model.so")

# Call functions from the library
# (depends on lleaves API)

How It Works

Compilation happens in two steps:

Step 1: Compile to Object File

lleaves compiles the LightGBM model to an LLVM object file (.o):

llvm_model = lleaves.Model(model_file="model.txt")
llvm_model.compile(cache="model.o")

Step 2: Link to Shared Library

A linker (gcc/clang/cc) creates the shared library:

gcc -shared model.o -o model.so

The script automatically:

Finds available linker (gcc → clang → cc)
Compiles with -fPIC flag for position-independent code
Shows detailed timing and file sizes
Verifies output files were created

File Structure

lleaves-optimiser/
├── main.py                 # Single model compiler
├── batch_compile.py        # Batch parallel compiler
├── batch-compile.sh        # Shell wrapper for batch_compile.py
├── requirements.txt        # Python dependencies
├── pyproject.toml         # Project configuration
└── README.md              # This file

Troubleshooting

"No linker found (gcc, clang, or cc required)"

Install a C compiler:

# macOS
xcode-select --install

# Ubuntu/Debian
sudo apt-get install gcc

# RHEL/CentOS
sudo yum install gcc

# Fedora
sudo dnf install gcc

"ModuleNotFoundError: No module named 'lleaves'"

Install dependencies:

uv pip install -r requirements.txt
# or
pip install -r requirements.txt

Compilation takes too long

Large models (500+ MB) can take 30-60 minutes to compile. Consider:

Compile to .o only first (much faster):

python batch_compile.py ./models ./output .o --workers 4

Link .o to .so later (fast):

for f in output/*.o; do gcc -shared "$f" -o "${f%.o}.so"; done

Batch compilation hangs or doesn't exit

This has been fixed in the latest version. If you still see issues:

Make sure you're using the updated batch_compile.py
Try reducing workers: --workers 2
Monitor memory usage - large models use significant RAM

Out of memory errors

Reduce parallel workers:

python batch_compile.py ./models ./output .so --workers 1

Or compile to .o files first (uses less memory):

python batch_compile.py ./models ./output .o --workers 2

Performance Tips

Fastest compilation workflow:

# 1. Compile all to .o in parallel (fastest)
python batch_compile.py ./models ./output .o --workers 8

# 2. Link all .o to .so (very fast)
cd output
for f in *.o; do gcc -shared "$f" -o "${f%.o}.so"; done

# 3. Verify
file *.so

Memory optimization:

Small models (<100MB): Use all cores
Medium models (100-300MB): Use 4-6 workers
Large models (>300MB): Use 2-3 workers or compile sequentially

Requirements

Python 3.8+
lleaves >= 0.2.0
lightgbm >= 3.0.0
numpy >= 1.20.0
gcc, clang, or cc (C compiler)

License

This tool is provided as-is for compiling LightGBM models using lleaves.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
batch-compile.sh		batch-compile.sh
batch_compile.py		batch_compile.py
main.py		main.py
pyproject.toml		pyproject.toml

ankushv-003/llvm-optimiser

Folders and files

Latest commit

History

Repository files navigation