Skip to content

Commit a921cda

Browse files
committed
Add documentation
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
1 parent 1480f57 commit a921cda

File tree

2 files changed

+218
-0
lines changed

2 files changed

+218
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,7 @@ For more information, see [💻 Getting started](https://localai.io/basics/getti
195195

196196
## 📰 Latest project news
197197

198+
- July/August 2025: 🔍 [Object Detection](https://localai.io/features/object-detection/) added to the API featuring [rf-detr](https://github.com/roboflow/rf-detr)
198199
- July 2025: All backends migrated outside of the main binary. LocalAI is now more lightweight, small, and automatically downloads the required backend to run the model. [Read the release notes](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)
199200
- June 2025: [Backend management](https://github.com/mudler/LocalAI/pull/5607) has been added. Attention: extras images are going to be deprecated from the next release! Read [the backend management PR](https://github.com/mudler/LocalAI/pull/5607).
200201
- May 2025: [Audio input](https://github.com/mudler/LocalAI/pull/5466) and [Reranking](https://github.com/mudler/LocalAI/pull/5396) in llama.cpp backend, [Realtime API](https://github.com/mudler/LocalAI/pull/5392), Support to Gemma, SmollVLM, and more multimodal models (available in the gallery).
@@ -228,6 +229,7 @@ Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3A
228229
- ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/)
229230
- 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/)
230231
- 🥽 [Vision API](https://localai.io/features/gpt-vision/)
232+
- 🔍 [Object Detection](https://localai.io/features/object-detection/)
231233
- 📈 [Reranker API](https://localai.io/features/reranker/)
232234
- 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/)
233235
- [Agentic capabilities](https://github.com/mudler/LocalAGI)
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
+++
2+
disableToc = false
3+
title = "🔍 Object detection"
4+
weight = 13
5+
url = "/features/object-detection/"
6+
+++
7+
8+
LocalAI supports object detection through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Currently, [RF-DETR](https://github.com/roboflow/rf-detr) is available as an implementation.
9+
10+
## Overview
11+
12+
Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures.
13+
14+
**Key Features:**
15+
- Real-time object detection
16+
- High accuracy detection with bounding boxes
17+
- Support for multiple hardware accelerators (CPU, NVIDIA GPU, Intel GPU, AMD GPU)
18+
- Structured detection results with confidence scores
19+
- Easy integration through the `/v1/detection` endpoint
20+
21+
## Usage
22+
23+
### Detection Endpoint
24+
25+
LocalAI provides a dedicated `/v1/detection` endpoint for object detection tasks. This endpoint is specifically designed for object detection and returns structured detection results with bounding boxes and confidence scores.
26+
27+
### API Reference
28+
29+
To perform object detection, send a POST request to the `/v1/detection` endpoint:
30+
31+
```bash
32+
curl -X POST http://localhost:8080/v1/detection \
33+
-H "Content-Type: application/json" \
34+
-d '{
35+
"model": "rfdetr-base",
36+
"image": "https://media.roboflow.com/dog.jpeg"
37+
}'
38+
```
39+
40+
### Request Format
41+
42+
The request body should contain:
43+
44+
- `model`: The name of the object detection model (e.g., "rfdetr-base")
45+
- `image`: The image to analyze, which can be:
46+
- A URL to an image
47+
- A base64-encoded image
48+
49+
### Response Format
50+
51+
The API returns a JSON response with detected objects:
52+
53+
```json
54+
{
55+
"detections": [
56+
{
57+
"x": 100.5,
58+
"y": 150.2,
59+
"width": 200.0,
60+
"height": 300.0,
61+
"confidence": 0.95,
62+
"class_name": "dog"
63+
},
64+
{
65+
"x": 400.0,
66+
"y": 200.0,
67+
"width": 150.0,
68+
"height": 250.0,
69+
"confidence": 0.87,
70+
"class_name": "person"
71+
}
72+
]
73+
}
74+
```
75+
76+
Each detection includes:
77+
- `x`, `y`: Coordinates of the bounding box top-left corner
78+
- `width`, `height`: Dimensions of the bounding box
79+
- `confidence`: Detection confidence score (0.0 to 1.0)
80+
- `class_name`: The detected object class
81+
82+
## Backends
83+
84+
### RF-DETR Backend
85+
86+
The RF-DETR backend is implemented as a Python-based gRPC service that integrates seamlessly with LocalAI. It provides object detection capabilities using the RF-DETR model architecture and supports multiple hardware configurations:
87+
88+
- **CPU**: Optimized for CPU inference
89+
- **NVIDIA GPU**: CUDA acceleration for NVIDIA GPUs
90+
- **Intel GPU**: Intel oneAPI optimization
91+
- **AMD GPU**: ROCm acceleration for AMD GPUs
92+
- **NVIDIA Jetson**: Optimized for ARM64 NVIDIA Jetson devices
93+
94+
#### Setup
95+
96+
1. **Using the Model Gallery (Recommended)**
97+
98+
The easiest way to get started is using the model gallery. The `rfdetr-base` model is available in the official LocalAI gallery:
99+
100+
```bash
101+
# Install and run the rfdetr-base model
102+
local-ai run rfdetr-base
103+
```
104+
105+
You can also install it through the web interface by navigating to the Models section and searching for "rfdetr-base".
106+
107+
2. **Manual Configuration**
108+
109+
Create a model configuration file in your `models` directory:
110+
111+
```yaml
112+
name: rfdetr
113+
backend: rfdetr
114+
parameters:
115+
model: rfdetr-base
116+
```
117+
118+
#### Available Models
119+
120+
Currently, the following model is available in the [Model Gallery]({{%relref "docs/features/model-gallery" %}}):
121+
122+
- **rfdetr-base**: Base model with balanced performance and accuracy
123+
124+
You can browse and install this model through the LocalAI web interface or using the command line.
125+
126+
## Examples
127+
128+
### Basic Object Detection
129+
130+
```bash
131+
# Detect objects in an image from URL
132+
curl -X POST http://localhost:8080/v1/detection \
133+
-H "Content-Type: application/json" \
134+
-d '{
135+
"model": "rfdetr-base",
136+
"image": "https://example.com/image.jpg"
137+
}'
138+
```
139+
140+
### Base64 Image Detection
141+
142+
```bash
143+
# Convert image to base64 and send
144+
base64_image=$(base64 -w 0 image.jpg)
145+
curl -X POST http://localhost:8080/v1/detection \
146+
-H "Content-Type: application/json" \
147+
-d "{
148+
\"model\": \"rfdetr-base\",
149+
\"image\": \"data:image/jpeg;base64,$base64_image\"
150+
}"
151+
```
152+
153+
### Local File Detection
154+
155+
```bash
156+
# Detect objects in a local image file
157+
curl -X POST http://localhost:8080/v1/detection \
158+
-H "Content-Type: application/json" \
159+
-d '{
160+
"model": "rfdetr-base",
161+
"image": "/path/to/local/image.jpg"
162+
}'
163+
```
164+
165+
## Use Cases
166+
167+
Object detection with RF-DETR is suitable for various applications:
168+
169+
- **Security and Surveillance**: Monitor security cameras for specific objects
170+
- **Retail Analytics**: Track products and customer behavior
171+
- **Autonomous Vehicles**: Detect pedestrians, vehicles, and traffic signs
172+
- **Industrial Quality Control**: Inspect products for defects
173+
- **Medical Imaging**: Identify anatomical structures or medical devices
174+
- **Agricultural Monitoring**: Detect crops, pests, or livestock
175+
176+
## Troubleshooting
177+
178+
### Common Issues
179+
180+
1. **Model Loading Errors**
181+
- Ensure the model file is properly downloaded
182+
- Check available disk space
183+
- Verify model compatibility with your backend version
184+
185+
2. **Low Detection Accuracy**
186+
- Ensure good image quality and lighting
187+
- Check if objects are clearly visible
188+
- Consider using a larger model for better accuracy
189+
190+
3. **Slow Performance**
191+
- Enable GPU acceleration if available
192+
- Use a smaller model for faster inference
193+
- Optimize image resolution
194+
195+
### Debug Mode
196+
197+
Enable debug logging for troubleshooting:
198+
199+
```bash
200+
local-ai run --debug rfdetr-base
201+
```
202+
203+
## Object Detection Category
204+
205+
LocalAI includes a dedicated **object-detection** category for models and backends that specialize in identifying and locating objects within images. This category currently includes:
206+
207+
- **RF-DETR**: Real-time transformer-based object detection
208+
209+
Additional object detection models and backends will be added to this category in the future. You can filter models by the `object-detection` tag in the model gallery to find all available object detection models.
210+
211+
## Related Features
212+
213+
- [🎨 Image generation]({{%relref "docs/features/image-generation" %}}): Generate images with AI
214+
- [📖 Text generation]({{%relref "docs/features/text-generation" %}}): Generate text with language models
215+
- [🔍 GPT Vision]({{%relref "docs/features/gpt-vision" %}}): Analyze images with language models
216+
- [🚀 GPU acceleration]({{%relref "docs/features/GPU-acceleration" %}}): Optimize performance with GPU acceleration

0 commit comments

Comments
 (0)