Note: This is a fork of the MMDetection library, customized for Visia. The original project: open-mmlab/mmdetection
Simplified installation โข No CUDA compilation โข Pure Python/PyTorch
A dedicated hosted instance of MLFlow (more or less) is here https://visdet-mlflow-server-a7eq2wihnq-uc.a.run.app/
Our motivation is simple: to be the most useable research platform.
Simplified Installation & Dependencies
- Integrated Dependencies: MMCV and MMEngine are bundled directly into the package as
visdet.cvandvisdet.engine, eliminating complex multi-package dependency management - No Custom CUDA Required: All custom CUDA operations have been removed, making installation straightforward with just
uv pip install visdet - Python-Only Implementation: Pure Python/PyTorch implementation means faster installation and better compatibility across different environments
- Unified Namespace: All functionality accessible through a single coherent API (
visdet.cvfor computer vision ops,visdet.enginefor training infrastructure)
This makes visdet significantly easier to install and deploy compared to the original MMDetection, which required careful coordination of multiple packages and custom CUDA compilation.
What happened to MMCV and MMEngine? They've been integrated into visdet under the
visdet.cvandvisdet.enginenamespaces respectively. Instead of managing separatemmcv,mmengine, andmmdetpackages, everything you need is now in one place.
Without access full training logs (loss plots etc.), it can be impossible to know if you have your own implementation wrong or not. Ideally, eventually we integrate the docs, and the experiment results into the same one living documentation. We run hyperparameter search, you get the new best hyperparameters.
Goals of the repo:
- Even more open than your typical open-source project, logs available, roadmap available.
- Emphasis on all of DevEx, educating new users, production deployments and research.2
visdet draws inspiration from the pioneering work of fast.ai, which demonstrated that common-sense training techniques could dramatically improve both accessibility and performance in deep learning. The abandoned icevision project attempted to bring these ideas to object detection but is no longer maintained.
We're continuing that mission by porting battle-tested techniques from image classification and LLM training:
- Progressive Image Resizing: Start training with smaller images, gradually increase resolution for faster convergence and better performance
- Learning Rate Finders: Automatically discover optimal learning rates instead of manual tuning
- Discriminative Learning Rates: Apply different learning rates to different network layers
- 1cycle Learning Rate Schedules: Achieve better generalization with cyclical learning rates
- Modern Fine-tuning Techniques: Bringing approaches from LLM training (LoRA-style adaptations) to object detection
These techniques are proven in image classification and LLM fine-tuning but have been largely absent from object detection frameworks. visdet aims to make them accessible and practical for detection tasks, with sensible defaults and clear documentation.
Philosophy: If a technique works reliably for ImageNet classification or LLM fine-tuning, it should work for object detection too. We're bringing the best ideas from across deep learning to vision tasks.
See fast.ai's ImageNet training guide for an example of how these techniques work in practice.
visdet is committed to integrating cutting-edge tools that improve performance, developer experience, and training efficiency:
Differentiable computer vision library for PyTorch with geometric transformations, filtering, and augmentation pipelines. Planned integration for enhanced data augmentation capabilities with full gradient support.
OpenAI's Python-like GPU programming language for writing high-performance kernels without CUDA expertise. Could enable custom operators achieving performance comparable to expert-level CUDA code.
Meta's Scalable and Performant Data Loading library with built-in performance observability. Under evaluation for replacing current data loading bottlenecks.
NVIDIA's GPU-accelerated data loading library that offloads preprocessing to the GPU. Being considered for systems with high GPU-to-CPU ratios where CPU preprocessing becomes a bottleneck.
Serverless GPU compute platform for Python that makes cloud training and inference effortless. Zero infrastructure setup with elastic GPU scaling and 100x faster cold starts than Docker. Could enable seamless cloud-based training workflows.
Microsoft's highly optimized Mixture of Experts (MoE) implementation for PyTorch. Enables efficient sparse model training with dynamic expert routing and load balancing. Could enable scaling to much larger models while maintaining computational efficiency through conditional computation.
Microsoft's deep learning optimization library featuring ZeRO (Zero Redundancy Optimizer) for training massive models with limited GPU memory. Includes model compression techniques, efficient training optimizations, and inference acceleration. Could enable training larger detection models and faster inference through quantization and compression.
This framework is going to be designed for much better usability than your average research repo, it will max out your batch size, then find your (probably) optimal learning rate and scheduler.
# Using uv (recommended)
uv pip install visdet
# Or using pip (don't do this though you massochist)
pip install visdetFor detailed installation instructions, see the Installation Guide.
from visdet import SimpleRunner
# Simple, string-based API - just like Hugging Face or Ultralytics YOLO
runner = SimpleRunner(
model='mask_rcnn_swin_s',
dataset='coco_instance_segmentation',
optimizer='adamw_8bit',
scheduler='1cycle'
)
runner.train()Discover available presets:
SimpleRunner.list_models() # ['mask_rcnn_swin_s', ...]
SimpleRunner.list_datasets() # ['coco_instance_segmentation', ...]
SimpleRunner.show_preset('mask_rcnn_swin_s') # View full configCustomize via inheritance:
runner = SimpleRunner(
model={
'_base_': 'mask_rcnn_swin_s',
'backbone': {'embed_dims': 128} # Override specific params
},
dataset='coco_instance_segmentation'
)For more examples and tutorials, visit the Documentation.
Comprehensive guides and tutorials available at binitai.github.io/visdet
๐ Available Tutorials
Pre-trained models and benchmarks available in the Model Zoo.
We welcome contributions! Please see the Contributing Guide for details.
- ๐ Report Issues
- ๐ก Request Features
- ๐ง Submit Pull Requests
This project is released under the Apache 2.0 License.
visdet is part of a rich ecosystem of object detection frameworks. Here's how visdet compares to other notable projects:
Note: Read here for more hot takes on the ecosystem: Reddit discussion (surprise surprise, people are not a fan of ultralytics )
The original framework that visdet is based on. A comprehensive object detection toolbox with modular design, supporting 40+ architectures including detection, instance segmentation, and panoptic segmentation. Part of the OpenMMLab project with extensive model zoo and state-of-the-art implementations.
Choose MMDetection if: Don't do it, it's great, but visdet has all of the benefits with less of the pain
Facebook AI Research's production-grade detection library. Supports object detection, instance segmentation, panoptic segmentation, DensePose, and more. Known for excellent performance and deployment flexibility with TorchScript/Caffe2 export. The foundation for many research projects.
Choose Detectron2 if: Don't do it.
A specialized research platform built on top of Detectron2, focused specifically on Transformer-based detection algorithms (DETR variants). Provides unified modular design for 20+ Transformer models including DETR, Deformable-DETR, DINO, and MaskDINO. Uses LazyConfig for flexible configuration.
Choose detrex if: You're doing cutting-edge Transformer-based detection research or want to experiment with DETR variants. But I'm aiming to integrate all of the models that have operations supported by ONNX.
Ever since their integration of instance segmentation they're a valid and good alternative. We like roboflow.
A streamlined fork of MMDetection with integrated dependencies, no CUDA compilation requirements, and modern training techniques from fast.ai and LLM fine-tuning.
Choose visdet if: You want simplified installation (no CUDA compilation), pure Python/PyTorch implementation, and modern training techniques like progressive resizing, 1cycle schedules, and learning rate finders. Maybe even auto augmentations? We'll see.
visdet is built on top of the excellent MMDetection framework from OpenMMLab. We are grateful to all contributors of the original project.