⭐️ Star if helpful! ⭐️
usls is an evolving Rust library focused on inference for advanced vision and vision-language models, along with practical vision utilities.
- SOTA Model Inference: Supports a wide range of state-of-the-art vision and multi-modal models (typically with fewer than 1B parameters).
- Multi-backend Acceleration: Supports CPU, CUDA, TensorRT, and CoreML.
- Easy Data Handling: Easily read images, video streams, and folders with iterator support.
- Rich Result Types: Built-in containers for common vision outputs like bounding boxes (Hbb, Obb), polygons, masks, etc.
- Annotation & Visualization: Draw and display inference results directly, similar to OpenCV's
imshow()
.
- YOLO Models: YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLOv12
- SAM Models: SAM, SAM2, MobileSAM, EdgeSAM, SAM-HQ, FastSAM
- Vision Models: RT-DETR, RTMO, Depth-Anything, DINOv2, MODNet, Sapiens, DepthPro, FastViT, BEiT, MobileOne
- Vision-Language Models: CLIP, jina-clip-v1, BLIP, GroundingDINO, YOLO-World, Florence2, Moondream2
- OCR-Related Models: FAST, DB(PaddleOCR-Det), SVTR(PaddleOCR-Rec), SLANet, TrOCR, DocLayout-YOLO
Full list of supported models (click to expand)
Model | Task / Description | Example | CoreML | CUDA FP32 |
CUDA FP16 |
TensorRT FP32 |
TensorRT FP16 |
---|---|---|---|---|---|---|---|
BEiT | Image Classification | demo | ✅ | ✅ | ✅ | ||
ConvNeXt | Image Classification | demo | ✅ | ✅ | ✅ | ||
FastViT | Image Classification | demo | ✅ | ✅ | ✅ | ||
MobileOne | Image Classification | demo | ✅ | ✅ | ✅ | ||
DeiT | Image Classification | demo | ✅ | ✅ | ✅ | ||
DINOv2 | Vision Embedding | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
YOLOv5 | Image Classification Object Detection Instance Segmentation |
demo | ✅ | ✅ | ✅ | ✅ | ✅ |
YOLOv6 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
YOLOv7 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
YOLOv8 YOLO11 |
Object Detection Instance Segmentation Image Classification Oriented Object Detection Keypoint Detection |
demo | ✅ | ✅ | ✅ | ✅ | ✅ |
YOLOv9 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
YOLOv10 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
YOLOv12 | Object Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
RT-DETR | Object Detection | demo | ✅ | ✅ | ✅ | ||
RF-DETR | Object Detection | demo | ✅ | ✅ | ✅ | ||
PP-PicoDet | Object Detection | demo | ✅ | ✅ | ✅ | ||
DocLayout-YOLO | Object Detection | demo | ✅ | ✅ | ✅ | ||
D-FINE | Object Detection | demo | ✅ | ✅ | ✅ | ||
DEIM | Object Detection | demo | ✅ | ✅ | ✅ | ||
RTMO | Keypoint Detection | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
SAM | Segment Anything | demo | ✅ | ✅ | ✅ | ||
SAM2 | Segment Anything | demo | ✅ | ✅ | ✅ | ||
MobileSAM | Segment Anything | demo | ✅ | ✅ | ✅ | ||
EdgeSAM | Segment Anything | demo | ✅ | ✅ | ✅ | ||
SAM-HQ | Segment Anything | demo | ✅ | ✅ | ✅ | ||
FastSAM | Instance Segmentation | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
YOLO-World | Open-Set Detection With Language | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
GroundingDINO | Open-Set Detection With Language | demo | ✅ | ✅ | ✅ | ||
CLIP | Vision-Language Embedding | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
jina-clip-v1 | Vision-Language Embedding | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
BLIP | Image Captioning | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
DB(PaddleOCR-Det) | Text Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
FAST | Text Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
LinkNet | Text Detection | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
SVTR(PaddleOCR-Rec) | Text Recognition | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
SLANet | Tabel Recognition | demo | ✅ | ✅ | ✅ | ||
TrOCR | Text Recognition | demo | ✅ | ✅ | ✅ | ||
YOLOPv2 | Panoptic Driving Perception | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
DepthAnything v1 DepthAnything v2 |
Monocular Depth Estimation | demo | ✅ | ✅ | ✅ | ❌ | ❌ |
DepthPro | Monocular Depth Estimation | demo | ✅ | ✅ | ✅ | ||
MODNet | Image Matting | demo | ✅ | ✅ | ✅ | ✅ | ✅ |
Sapiens | Foundation for Human Vision Models | demo | ✅ | ✅ | ✅ | ||
Florence2 | a Variety of Vision Tasks | demo | ✅ | ✅ | ✅ | ||
Moondream2 | Open-Set Object Detection Open-Set Keypoints Detection Image Caption Visual Question Answering |
demo | ✅ | ✅ | ✅ | ||
OWLv2 | Open-Set Object Detection | demo | ✅ | ✅ | ✅ | ||
SmolVLM(256M, 500M) | Visual Question Answering | demo | ✅ | ✅ | ✅ |
To get started, you'll need:
Required for building the project. Official installation guide
# Linux (apt)
sudo apt install -y protobuf-compiler
# macOS (Homebrew)
brew install protobuf
# Windows (Winget)
winget install protobuf
# Verify installation
protoc --version # Should be 3.x or higher
# Install Rust and Cargo
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Add the following to your Cargo.toml
:
[dependencies]
# Recommended: Use GitHub version
usls = { git = "https://github.com/jamjamjon/usls" }
# Alternative: Use crates.io version
usls = "latest-version"
Note: The GitHub version is recommended as it contains the latest updates.
-
ONNXRuntime-related features (enabled by default), provide model inference and model zoo support:
-
ort-download-binaries
(default): Automatically downloads prebuiltONNXRuntime
binaries for supported platforms. Provides core model loading and inference capabilities using theCPU
execution provider. -
ort-load-dynamic
Dynamic linking. You'll need to compileONNXRuntime
from source or download a precompiled package, and then link it manually. See the guide here. -
cuda
: Enables the NVIDIACUDA
provider. RequiresCUDA
toolkit andcuDNN
installed. -
trt
: Enables the NVIDIATensorRT
provider. RequiresTensorRT
libraries installed. -
mps
: Enables the AppleCoreML
provider for macOS.
-
-
If you only need basic features (such as image/video reading, result visualization, etc.), you can disable the default features to minimize dependencies:
usls = { git = "https://github.com/jamjamjon/usls", default-features = false }
-
Model Inference
cargo run -r --example yolo # CPU cargo run -r -F cuda --example yolo -- --device cuda:0 # GPU
-
Reading Images
// Read a single image let image = DataLoader::try_read_one("./assets/bus.jpg")?; // Read multiple images let images = DataLoader::try_read_n(&["./assets/bus.jpg", "./assets/cat.png"])?; // Read all images in a folder let images = DataLoader::try_read_folder("./assets")?; // Read images matching a pattern (glob) let images = DataLoader::try_read_pattern("./assets/*.Jpg")?; // Load images and iterate let dl = DataLoader::new("./assets")?.with_batch(2).build()?; for images in dl.iter() { // Code here }
-
Reading Video
let dl = DataLoader::new("http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4")? .with_batch(1) .with_nf_skip(2) .with_progress_bar(true) .build()?; for images in dl.iter() { // Code here }
-
Annotate
let annotator = Annotator::default(); let image = DataLoader::try_read_one("./assets/bus.jpg")?; // hbb let hbb = Hbb::default() .with_xyxy(669.5233, 395.4491, 809.0367, 878.81226) .with_id(0) .with_name("person") .with_confidence(0.87094545); let _ = annotator.annotate(&image, &hbb)?; // keypoints let keypoints: Vec<Keypoint> = vec![ Keypoint::default() .with_xy(139.35767, 443.43655) .with_id(0) .with_name("nose") .with_confidence(0.9739332), Keypoint::default() .with_xy(147.38545, 434.34055) .with_id(1) .with_name("left_eye") .with_confidence(0.9098319), Keypoint::default() .with_xy(128.5701, 434.07516) .with_id(2) .with_name("right_eye") .with_confidence(0.9320564), ]; let _ = annotator.annotate(&image, &keypoints)?;
-
Visualizing Inference Results and Exporting Video
let dl = DataLoader::new(args.source.as_str())?.build()?; let mut viewer = Viewer::default().with_window_scale(0.5); for images in &dl { // Check if the window exists and is open if viewer.is_window_exist() && !viewer.is_window_open() { break; } // Show image in window viewer.imshow(&images[0])?; // Handle key events and delay if let Some(key) = viewer.wait_key(1) { if key == usls::Key::Escape { break; } } // Your custom code here // Write video frame (requires video feature) // if args.save_video { // viewer.write_video_frame(&images[0])?; // } }
All examples are located in the examples directory.
See issues or open a new discussion.
Contributions are welcome! If you have suggestions, bug reports, or want to add new features or models, feel free to open an issue or submit a pull request.
This project is licensed under LICENSE.