Skip to content

Script to compile LightGBM models to optimized shared libraries (.so, .dylib, or .o files) using lleaves.

Notifications You must be signed in to change notification settings

ankushv-003/llvm-optimiser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLeaves Model Compiler

Compile LightGBM models to optimized shared libraries (.so, .dylib, or .o files) using lleaves.

Features

  • Single model compilation - Compile one model at a time
  • Batch compilation - Compile multiple models in parallel
  • Flexible output formats - .so (Linux), .dylib (macOS), or .o (object files)
  • Auto-detect linker - Works with gcc, clang, or cc
  • Progress tracking - Real-time compilation status and timing
  • Error reporting - Detailed error messages for debugging

Installation

Using uv (Recommended)

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

Using pip

python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Quick Start

Compile a Single Model

# Basic usage (creates .dylib on macOS, .so on Linux)
python main.py model.txt ./output

# Specify output format
python main.py model.txt ./output .so

# Keep intermediate .o files
python main.py model.txt ./output .so --keep-o

# Just create .o file (no linking)
python main.py model.txt ./output .o

Compile Multiple Models in Parallel

# Compile all models in a folder
python batch_compile.py ./models ./output

# Specify output format and workers
python batch_compile.py ./models ./output .so --workers 4

# Keep .o files
python batch_compile.py ./models ./output .so --keep-o

# Or use the shell script wrapper
./batch-compile.sh ./models ./output 4 .so

Usage

main.py - Single Model Compilation

Compile a single LightGBM model file to a shared library.

Syntax:

python main.py <input_file> <output_folder> [output_format] [--keep-o]

Arguments:

  • input_file - Path to .txt model file to compile
  • output_folder - Path to folder where compiled files will be saved
  • output_format - Output format: .dylib, .so, or .o (default: .dylib)
  • --keep-o - Keep .o files when creating .dylib/.so (optional)

Examples:

# Create .so file for Linux
python main.py rdt_1.txt ./output .so

# Create .dylib for macOS
python main.py rdt_1.txt ./output .dylib

# Just compile to .o (no linking)
python main.py rdt_1.txt ./output .o

# Create .so and keep .o file
python main.py rdt_1.txt ./output .so --keep-o

batch_compile.py - Parallel Batch Compilation

Compile multiple models in parallel using multiple CPU cores.

Syntax:

python batch_compile.py <models_folder> <output_folder> [output_format] [--keep-o] [--workers N]

Arguments:

  • models_folder - Path to folder containing .txt model files
  • output_folder - Path to folder where compiled files will be saved
  • output_format - Output format: .dylib, .so, or .o (default: .so)
  • --keep-o - Keep .o files when creating .dylib/.so (optional)
  • --workers N - Number of parallel workers (default: CPU count)

Examples:

# Compile all models using all CPU cores
python batch_compile.py ./models ./output

# Use 4 workers
python batch_compile.py ./models ./output .so --workers 4

# Compile to .o files only (fast, no linking)
python batch_compile.py ./models ./output .o

# Keep intermediate .o files
python batch_compile.py ./models ./output .so --keep-o --workers 8

Converting .o to .so

If you have .o files and want to convert them to .so files later:

# Single file
gcc -shared input.o -o output.so

# All .o files in a folder
for f in *.o; do gcc -shared "$f" -o "${f%.o}.so"; done

# Or use clang
clang -shared input.o -o output.so

# Or use cc
cc -shared input.o -o output.so

Verification

Check if .so file is valid

# Check file type
file output.so

# Expected output for Linux:
# output.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV)

# Expected output for macOS:
# output.so: Mach-O 64-bit dynamically linked shared library arm64

# Read ELF header (Linux only)
readelf -h output.so

# Check architecture
objdump -f output.so

Quick validation

# Check all .so files
file *.so

# Verify ELF format (Linux)
file output.so | grep -q "ELF 64-bit" && echo "Valid ELF" || echo "Not ELF"

Integration with Production

Loading in Go

/*
#cgo LDFLAGS: -ldl
#include <dlfcn.h>
*/
import "C"
import "unsafe"

func loadModel(path string) {
    cPath := C.CString(path)
    defer C.free(unsafe.Pointer(cPath))
    lib := C.dlopen(cPath, C.RTLD_LAZY)
    if lib == nil {
        panic("Failed to load model")
    }
    // Use the model...
}

Loading in Python

import ctypes

# Load the compiled model
lib = ctypes.CDLL("/path/to/model.so")

# Call functions from the library
# (depends on lleaves API)

How It Works

Compilation happens in two steps:

Step 1: Compile to Object File

lleaves compiles the LightGBM model to an LLVM object file (.o):

llvm_model = lleaves.Model(model_file="model.txt")
llvm_model.compile(cache="model.o")

Step 2: Link to Shared Library

A linker (gcc/clang/cc) creates the shared library:

gcc -shared model.o -o model.so

The script automatically:

  • Finds available linker (gcc → clang → cc)
  • Compiles with -fPIC flag for position-independent code
  • Shows detailed timing and file sizes
  • Verifies output files were created

File Structure

lleaves-optimiser/
├── main.py                 # Single model compiler
├── batch_compile.py        # Batch parallel compiler
├── batch-compile.sh        # Shell wrapper for batch_compile.py
├── requirements.txt        # Python dependencies
├── pyproject.toml         # Project configuration
└── README.md              # This file

Troubleshooting

"No linker found (gcc, clang, or cc required)"

Install a C compiler:

# macOS
xcode-select --install

# Ubuntu/Debian
sudo apt-get install gcc

# RHEL/CentOS
sudo yum install gcc

# Fedora
sudo dnf install gcc

"ModuleNotFoundError: No module named 'lleaves'"

Install dependencies:

uv pip install -r requirements.txt
# or
pip install -r requirements.txt

Compilation takes too long

Large models (500+ MB) can take 30-60 minutes to compile. Consider:

  1. Compile to .o only first (much faster):

    python batch_compile.py ./models ./output .o --workers 4
  2. Link .o to .so later (fast):

    for f in output/*.o; do gcc -shared "$f" -o "${f%.o}.so"; done

Batch compilation hangs or doesn't exit

This has been fixed in the latest version. If you still see issues:

  1. Make sure you're using the updated batch_compile.py
  2. Try reducing workers: --workers 2
  3. Monitor memory usage - large models use significant RAM

Out of memory errors

Reduce parallel workers:

python batch_compile.py ./models ./output .so --workers 1

Or compile to .o files first (uses less memory):

python batch_compile.py ./models ./output .o --workers 2

Performance Tips

Fastest compilation workflow:

# 1. Compile all to .o in parallel (fastest)
python batch_compile.py ./models ./output .o --workers 8

# 2. Link all .o to .so (very fast)
cd output
for f in *.o; do gcc -shared "$f" -o "${f%.o}.so"; done

# 3. Verify
file *.so

Memory optimization:

  • Small models (<100MB): Use all cores
  • Medium models (100-300MB): Use 4-6 workers
  • Large models (>300MB): Use 2-3 workers or compile sequentially

Requirements

  • Python 3.8+
  • lleaves >= 0.2.0
  • lightgbm >= 3.0.0
  • numpy >= 1.20.0
  • gcc, clang, or cc (C compiler)

License

This tool is provided as-is for compiling LightGBM models using lleaves.

About

Script to compile LightGBM models to optimized shared libraries (.so, .dylib, or .o files) using lleaves.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published