Skip to content
/ usls Public

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.

License

Notifications You must be signed in to change notification settings

jamjamjon/usls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

usls

Rust MSRV ONNXRuntime MSRV CUDA MSRV cuDNN MSRV TensorRT MSRV

Examples Documentation

Rust CI Crates.io Version Crates.io Downloads

⭐️ Star if helpful! ⭐️

usls is an evolving Rust library focused on inference for advanced vision and vision-language models, along with practical vision utilities.

  • SOTA Model Inference: Supports a wide range of state-of-the-art vision and multi-modal models (typically with fewer than 1B parameters).
  • Multi-backend Acceleration: Supports CPU, CUDA, TensorRT, and CoreML.
  • Easy Data Handling: Easily read images, video streams, and folders with iterator support.
  • Rich Result Types: Built-in containers for common vision outputs like bounding boxes (Hbb, Obb), polygons, masks, etc.
  • Annotation & Visualization: Draw and display inference results directly, similar to OpenCV's imshow().

🧩 Supported Models

Full list of supported models (click to expand)
Model Task / Description Example CoreML CUDA
FP32
CUDA
FP16
TensorRT
FP32
TensorRT
FP16
BEiT Image Classification demo
ConvNeXt Image Classification demo
FastViT Image Classification demo
MobileOne Image Classification demo
DeiT Image Classification demo
DINOv2 Vision Embedding demo
YOLOv5 Image Classification
Object Detection
Instance Segmentation
demo
YOLOv6 Object Detection demo
YOLOv7 Object Detection demo
YOLOv8
YOLO11
Object Detection
Instance Segmentation
Image Classification
Oriented Object Detection
Keypoint Detection
demo
YOLOv9 Object Detection demo
YOLOv10 Object Detection demo
YOLOv12 Object Detection demo
RT-DETR Object Detection demo
RF-DETR Object Detection demo
PP-PicoDet Object Detection demo
DocLayout-YOLO Object Detection demo
D-FINE Object Detection demo
DEIM Object Detection demo
RTMO Keypoint Detection demo
SAM Segment Anything demo
SAM2 Segment Anything demo
MobileSAM Segment Anything demo
EdgeSAM Segment Anything demo
SAM-HQ Segment Anything demo
FastSAM Instance Segmentation demo
YOLO-World Open-Set Detection With Language demo
GroundingDINO Open-Set Detection With Language demo
CLIP Vision-Language Embedding demo
jina-clip-v1 Vision-Language Embedding demo
BLIP Image Captioning demo
DB(PaddleOCR-Det) Text Detection demo
FAST Text Detection demo
LinkNet Text Detection demo
SVTR(PaddleOCR-Rec) Text Recognition demo
SLANet Tabel Recognition demo
TrOCR Text Recognition demo
YOLOPv2 Panoptic Driving Perception demo
DepthAnything v1
DepthAnything v2
Monocular Depth Estimation demo
DepthPro Monocular Depth Estimation demo
MODNet Image Matting demo
Sapiens Foundation for Human Vision Models demo
Florence2 a Variety of Vision Tasks demo
Moondream2 Open-Set Object Detection
Open-Set Keypoints Detection
Image Caption
Visual Question Answering
demo
OWLv2 Open-Set Object Detection demo
SmolVLM(256M, 500M) Visual Question Answering demo

🛠️ Installation

To get started, you'll need:

1. Protocol Buffers Compiler (protoc)

Required for building the project. Official installation guide

# Linux (apt)
sudo apt install -y protobuf-compiler

# macOS (Homebrew)
brew install protobuf

# Windows (Winget)
winget install protobuf

# Verify installation
protoc --version  # Should be 3.x or higher

2. Rust Toolchain

# Install Rust and Cargo
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

3. Add usls to Your Project

Add the following to your Cargo.toml:

[dependencies]
# Recommended: Use GitHub version
usls = { git = "https://github.com/jamjamjon/usls" }

# Alternative: Use crates.io version
usls = "latest-version"

Note: The GitHub version is recommended as it contains the latest updates.

⚡ Cargo Features

  • ONNXRuntime-related features (enabled by default), provide model inference and model zoo support:

    • ort-download-binaries (default): Automatically downloads prebuilt ONNXRuntime binaries for supported platforms. Provides core model loading and inference capabilities using the CPU execution provider.

    • ort-load-dynamic Dynamic linking. You'll need to compile ONNXRuntime from source or download a precompiled package, and then link it manually. See the guide here.

    • cuda: Enables the NVIDIA CUDA provider. Requires CUDA toolkit and cuDNN installed.

    • trt: Enables the NVIDIA TensorRT provider. Requires TensorRT libraries installed.

    • mps: Enables the Apple CoreML provider for macOS.

  • If you only need basic features (such as image/video reading, result visualization, etc.), you can disable the default features to minimize dependencies:

    usls = { git = "https://github.com/jamjamjon/usls", default-features = false }
    • video : Enable video stream reading, and video writing.(Note: Powered by video-rs and minifb. Check their repositories for potential issues.)

✨ Example

  • Model Inference

    cargo run -r --example yolo   # CPU
    cargo run -r -F cuda --example yolo -- --device cuda:0  # GPU
  • Reading Images

    // Read a single image
    let image = DataLoader::try_read_one("./assets/bus.jpg")?;
    
    // Read multiple images
    let images = DataLoader::try_read_n(&["./assets/bus.jpg", "./assets/cat.png"])?;
    
    // Read all images in a folder
    let images = DataLoader::try_read_folder("./assets")?;
    
    // Read images matching a pattern (glob)
    let images = DataLoader::try_read_pattern("./assets/*.Jpg")?;
    
    // Load images and iterate
    let dl = DataLoader::new("./assets")?.with_batch(2).build()?;
    for images in dl.iter() {
        // Code here
    }
  • Reading Video

    let dl = DataLoader::new("http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4")?
        .with_batch(1)
        .with_nf_skip(2)
        .with_progress_bar(true)
        .build()?;
    for images in dl.iter() {
        // Code here
    }
  • Annotate

    let annotator = Annotator::default();
    let image = DataLoader::try_read_one("./assets/bus.jpg")?;
    // hbb
    let hbb = Hbb::default()
            .with_xyxy(669.5233, 395.4491, 809.0367, 878.81226)
            .with_id(0)
            .with_name("person")
            .with_confidence(0.87094545);
    let _ = annotator.annotate(&image, &hbb)?;
    
    // keypoints
    let keypoints: Vec<Keypoint> = vec![
        Keypoint::default()
            .with_xy(139.35767, 443.43655)
            .with_id(0)
            .with_name("nose")
            .with_confidence(0.9739332),
        Keypoint::default()
            .with_xy(147.38545, 434.34055)
            .with_id(1)
            .with_name("left_eye")
            .with_confidence(0.9098319),
        Keypoint::default()
            .with_xy(128.5701, 434.07516)
            .with_id(2)
            .with_name("right_eye")
            .with_confidence(0.9320564),
    ];
    let _ = annotator.annotate(&image, &keypoints)?;
  • Visualizing Inference Results and Exporting Video

    let dl = DataLoader::new(args.source.as_str())?.build()?;
    let mut viewer = Viewer::default().with_window_scale(0.5);
    
    for images in &dl {
        // Check if the window exists and is open
        if viewer.is_window_exist() && !viewer.is_window_open() {
            break;
        }
    
        // Show image in window
        viewer.imshow(&images[0])?;
    
        // Handle key events and delay
        if let Some(key) = viewer.wait_key(1) {
            if key == usls::Key::Escape {
                break;
            }
        }
    
        // Your custom code here
    
        // Write video frame (requires video feature)
        // if args.save_video {
        //     viewer.write_video_frame(&images[0])?;
        // }
    }

All examples are located in the examples directory.

❓ FAQ

See issues or open a new discussion.

🤝 Contributing

Contributions are welcome! If you have suggestions, bug reports, or want to add new features or models, feel free to open an issue or submit a pull request.

📜 License

This project is licensed under LICENSE.