|
| 1 | ++++ |
| 2 | +disableToc = false |
| 3 | +title = "🔍 Object detection" |
| 4 | +weight = 13 |
| 5 | +url = "/features/object-detection/" |
| 6 | ++++ |
| 7 | + |
| 8 | +LocalAI supports object detection through various backends. This feature allows you to identify and locate objects within images with high accuracy and real-time performance. Currently, [RF-DETR](https://github.com/roboflow/rf-detr) is available as an implementation. |
| 9 | + |
| 10 | +## Overview |
| 11 | + |
| 12 | +Object detection in LocalAI is implemented through dedicated backends that can identify and locate objects within images. Each backend provides different capabilities and model architectures. |
| 13 | + |
| 14 | +**Key Features:** |
| 15 | +- Real-time object detection |
| 16 | +- High accuracy detection with bounding boxes |
| 17 | +- Support for multiple hardware accelerators (CPU, NVIDIA GPU, Intel GPU, AMD GPU) |
| 18 | +- Structured detection results with confidence scores |
| 19 | +- Easy integration through the `/v1/detection` endpoint |
| 20 | + |
| 21 | +## Usage |
| 22 | + |
| 23 | +### Detection Endpoint |
| 24 | + |
| 25 | +LocalAI provides a dedicated `/v1/detection` endpoint for object detection tasks. This endpoint is specifically designed for object detection and returns structured detection results with bounding boxes and confidence scores. |
| 26 | + |
| 27 | +### API Reference |
| 28 | + |
| 29 | +To perform object detection, send a POST request to the `/v1/detection` endpoint: |
| 30 | + |
| 31 | +```bash |
| 32 | +curl -X POST http://localhost:8080/v1/detection \ |
| 33 | + -H "Content-Type: application/json" \ |
| 34 | + -d '{ |
| 35 | + "model": "rfdetr-base", |
| 36 | + "image": "https://media.roboflow.com/dog.jpeg" |
| 37 | + }' |
| 38 | +``` |
| 39 | + |
| 40 | +### Request Format |
| 41 | + |
| 42 | +The request body should contain: |
| 43 | + |
| 44 | +- `model`: The name of the object detection model (e.g., "rfdetr-base") |
| 45 | +- `image`: The image to analyze, which can be: |
| 46 | + - A URL to an image |
| 47 | + - A base64-encoded image |
| 48 | + |
| 49 | +### Response Format |
| 50 | + |
| 51 | +The API returns a JSON response with detected objects: |
| 52 | + |
| 53 | +```json |
| 54 | +{ |
| 55 | + "detections": [ |
| 56 | + { |
| 57 | + "x": 100.5, |
| 58 | + "y": 150.2, |
| 59 | + "width": 200.0, |
| 60 | + "height": 300.0, |
| 61 | + "confidence": 0.95, |
| 62 | + "class_name": "dog" |
| 63 | + }, |
| 64 | + { |
| 65 | + "x": 400.0, |
| 66 | + "y": 200.0, |
| 67 | + "width": 150.0, |
| 68 | + "height": 250.0, |
| 69 | + "confidence": 0.87, |
| 70 | + "class_name": "person" |
| 71 | + } |
| 72 | + ] |
| 73 | +} |
| 74 | +``` |
| 75 | + |
| 76 | +Each detection includes: |
| 77 | +- `x`, `y`: Coordinates of the bounding box top-left corner |
| 78 | +- `width`, `height`: Dimensions of the bounding box |
| 79 | +- `confidence`: Detection confidence score (0.0 to 1.0) |
| 80 | +- `class_name`: The detected object class |
| 81 | + |
| 82 | +## Backends |
| 83 | + |
| 84 | +### RF-DETR Backend |
| 85 | + |
| 86 | +The RF-DETR backend is implemented as a Python-based gRPC service that integrates seamlessly with LocalAI. It provides object detection capabilities using the RF-DETR model architecture and supports multiple hardware configurations: |
| 87 | + |
| 88 | +- **CPU**: Optimized for CPU inference |
| 89 | +- **NVIDIA GPU**: CUDA acceleration for NVIDIA GPUs |
| 90 | +- **Intel GPU**: Intel oneAPI optimization |
| 91 | +- **AMD GPU**: ROCm acceleration for AMD GPUs |
| 92 | +- **NVIDIA Jetson**: Optimized for ARM64 NVIDIA Jetson devices |
| 93 | + |
| 94 | +#### Setup |
| 95 | + |
| 96 | +1. **Using the Model Gallery (Recommended)** |
| 97 | + |
| 98 | + The easiest way to get started is using the model gallery. The `rfdetr-base` model is available in the official LocalAI gallery: |
| 99 | + |
| 100 | + ```bash |
| 101 | + # Install and run the rfdetr-base model |
| 102 | + local-ai run rfdetr-base |
| 103 | + ``` |
| 104 | + |
| 105 | + You can also install it through the web interface by navigating to the Models section and searching for "rfdetr-base". |
| 106 | + |
| 107 | +2. **Manual Configuration** |
| 108 | + |
| 109 | + Create a model configuration file in your `models` directory: |
| 110 | + |
| 111 | + ```yaml |
| 112 | + name: rfdetr |
| 113 | + backend: rfdetr |
| 114 | + parameters: |
| 115 | + model: rfdetr-base |
| 116 | + ``` |
| 117 | +
|
| 118 | +#### Available Models |
| 119 | +
|
| 120 | +Currently, the following model is available in the [Model Gallery]({{%relref "docs/features/model-gallery" %}}): |
| 121 | +
|
| 122 | +- **rfdetr-base**: Base model with balanced performance and accuracy |
| 123 | +
|
| 124 | +You can browse and install this model through the LocalAI web interface or using the command line. |
| 125 | +
|
| 126 | +## Examples |
| 127 | +
|
| 128 | +### Basic Object Detection |
| 129 | +
|
| 130 | +```bash |
| 131 | +# Detect objects in an image from URL |
| 132 | +curl -X POST http://localhost:8080/v1/detection \ |
| 133 | + -H "Content-Type: application/json" \ |
| 134 | + -d '{ |
| 135 | + "model": "rfdetr-base", |
| 136 | + "image": "https://example.com/image.jpg" |
| 137 | + }' |
| 138 | +``` |
| 139 | + |
| 140 | +### Base64 Image Detection |
| 141 | + |
| 142 | +```bash |
| 143 | +# Convert image to base64 and send |
| 144 | +base64_image=$(base64 -w 0 image.jpg) |
| 145 | +curl -X POST http://localhost:8080/v1/detection \ |
| 146 | + -H "Content-Type: application/json" \ |
| 147 | + -d "{ |
| 148 | + \"model\": \"rfdetr-base\", |
| 149 | + \"image\": \"data:image/jpeg;base64,$base64_image\" |
| 150 | + }" |
| 151 | +``` |
| 152 | + |
| 153 | +### Local File Detection |
| 154 | + |
| 155 | +```bash |
| 156 | +# Detect objects in a local image file |
| 157 | +curl -X POST http://localhost:8080/v1/detection \ |
| 158 | + -H "Content-Type: application/json" \ |
| 159 | + -d '{ |
| 160 | + "model": "rfdetr-base", |
| 161 | + "image": "/path/to/local/image.jpg" |
| 162 | + }' |
| 163 | +``` |
| 164 | + |
| 165 | +## Use Cases |
| 166 | + |
| 167 | +Object detection with RF-DETR is suitable for various applications: |
| 168 | + |
| 169 | +- **Security and Surveillance**: Monitor security cameras for specific objects |
| 170 | +- **Retail Analytics**: Track products and customer behavior |
| 171 | +- **Autonomous Vehicles**: Detect pedestrians, vehicles, and traffic signs |
| 172 | +- **Industrial Quality Control**: Inspect products for defects |
| 173 | +- **Medical Imaging**: Identify anatomical structures or medical devices |
| 174 | +- **Agricultural Monitoring**: Detect crops, pests, or livestock |
| 175 | + |
| 176 | +## Troubleshooting |
| 177 | + |
| 178 | +### Common Issues |
| 179 | + |
| 180 | +1. **Model Loading Errors** |
| 181 | + - Ensure the model file is properly downloaded |
| 182 | + - Check available disk space |
| 183 | + - Verify model compatibility with your backend version |
| 184 | + |
| 185 | +2. **Low Detection Accuracy** |
| 186 | + - Ensure good image quality and lighting |
| 187 | + - Check if objects are clearly visible |
| 188 | + - Consider using a larger model for better accuracy |
| 189 | + |
| 190 | +3. **Slow Performance** |
| 191 | + - Enable GPU acceleration if available |
| 192 | + - Use a smaller model for faster inference |
| 193 | + - Optimize image resolution |
| 194 | + |
| 195 | +### Debug Mode |
| 196 | + |
| 197 | +Enable debug logging for troubleshooting: |
| 198 | + |
| 199 | +```bash |
| 200 | +local-ai run --debug rfdetr-base |
| 201 | +``` |
| 202 | + |
| 203 | +## Object Detection Category |
| 204 | + |
| 205 | +LocalAI includes a dedicated **object-detection** category for models and backends that specialize in identifying and locating objects within images. This category currently includes: |
| 206 | + |
| 207 | +- **RF-DETR**: Real-time transformer-based object detection |
| 208 | + |
| 209 | +Additional object detection models and backends will be added to this category in the future. You can filter models by the `object-detection` tag in the model gallery to find all available object detection models. |
| 210 | + |
| 211 | +## Related Features |
| 212 | + |
| 213 | +- [🎨 Image generation]({{%relref "docs/features/image-generation" %}}): Generate images with AI |
| 214 | +- [📖 Text generation]({{%relref "docs/features/text-generation" %}}): Generate text with language models |
| 215 | +- [🔍 GPT Vision]({{%relref "docs/features/gpt-vision" %}}): Analyze images with language models |
| 216 | +- [🚀 GPU acceleration]({{%relref "docs/features/GPU-acceleration" %}}): Optimize performance with GPU acceleration |
0 commit comments