GitHub - BinItAI/visdet: MMdetection fork with new vision

📘 Documentation • 🛠️ Installation • 👀 Model Zoo • 📊 Experiment Logs • 🆕 Changelog

Note: This is a fork of the MMDetection library, customized for Visia. The original project: open-mmlab/mmdetection

✨ Why visdet?

Simplified installation • No CUDA compilation • Pure Python/PyTorch

A dedicated hosted instance of MLFlow (more or less) is here https://visdet-mlflow-server-a7eq2wihnq-uc.a.run.app/

The Most Useable Research Platform

Our motivation is simple: to be the most useable research platform.

Simplified Installation & Dependencies

Integrated Dependencies: MMCV and MMEngine are bundled directly into the package as visdet.cv and visdet.engine, eliminating complex multi-package dependency management
No Custom CUDA Required: All custom CUDA operations have been removed, making installation straightforward with just uv pip install visdet
Python-Only Implementation: Pure Python/PyTorch implementation means faster installation and better compatibility across different environments
Unified Namespace: All functionality accessible through a single coherent API (visdet.cv for computer vision ops, visdet.engine for training infrastructure)

This makes visdet significantly easier to install and deploy compared to the original MMDetection, which required careful coordination of multiple packages and custom CUDA compilation.

What happened to MMCV and MMEngine? They've been integrated into visdet under the visdet.cv and visdet.engine namespaces respectively. Instead of managing separate mmcv, mmengine, and mmdet packages, everything you need is now in one place.

Without access full training logs (loss plots etc.), it can be impossible to know if you have your own implementation wrong or not. Ideally, eventually we integrate the docs, and the experiment results into the same one living documentation. We run hyperparameter search, you get the new best hyperparameters.

Goals of the repo:

Even more open than your typical open-source project, logs available, roadmap available.
Emphasis on all of DevEx, educating new users, production deployments and research.2

🧠 Modern Training Philosophy

visdet draws inspiration from the pioneering work of fast.ai, which demonstrated that common-sense training techniques could dramatically improve both accessibility and performance in deep learning. The abandoned icevision project attempted to bring these ideas to object detection but is no longer maintained.

We're continuing that mission by porting battle-tested techniques from image classification and LLM training:

Progressive Image Resizing: Start training with smaller images, gradually increase resolution for faster convergence and better performance
Learning Rate Finders: Automatically discover optimal learning rates instead of manual tuning
Discriminative Learning Rates: Apply different learning rates to different network layers
1cycle Learning Rate Schedules: Achieve better generalization with cyclical learning rates
Modern Fine-tuning Techniques: Bringing approaches from LLM training (LoRA-style adaptations) to object detection

These techniques are proven in image classification and LLM fine-tuning but have been largely absent from object detection frameworks. visdet aims to make them accessible and practical for detection tasks, with sensible defaults and clear documentation.

Philosophy: If a technique works reliably for ImageNet classification or LLM fine-tuning, it should work for object detection too. We're bringing the best ideas from across deep learning to vision tasks.

See fast.ai's ImageNet training guide for an example of how these techniques work in practice.

🔮 Future Integrations

visdet is committed to integrating cutting-edge tools that improve performance, developer experience, and training efficiency:

Kornia

Differentiable computer vision library for PyTorch with geometric transformations, filtering, and augmentation pipelines. Planned integration for enhanced data augmentation capabilities with full gradient support.

Triton

OpenAI's Python-like GPU programming language for writing high-performance kernels without CUDA expertise. Could enable custom operators achieving performance comparable to expert-level CUDA code.

SPDL

Meta's Scalable and Performant Data Loading library with built-in performance observability. Under evaluation for replacing current data loading bottlenecks.

DALI

NVIDIA's GPU-accelerated data loading library that offloads preprocessing to the GPU. Being considered for systems with high GPU-to-CPU ratios where CPU preprocessing becomes a bottleneck.

Modal

Serverless GPU compute platform for Python that makes cloud training and inference effortless. Zero infrastructure setup with elastic GPU scaling and 100x faster cold starts than Docker. Could enable seamless cloud-based training workflows.

Tutel

Microsoft's highly optimized Mixture of Experts (MoE) implementation for PyTorch. Enables efficient sparse model training with dynamic expert routing and load balancing. Could enable scaling to much larger models while maintaining computational efficiency through conditional computation.

DeepSpeed

Microsoft's deep learning optimization library featuring ZeRO (Zero Redundancy Optimizer) for training massive models with limited GPU memory. Includes model compression techniques, efficient training optimizations, and inference acceleration. Could enable training larger detection models and faster inference through quantization and compression.

This framework is going to be designed for much better usability than your average research repo, it will max out your batch size, then find your (probably) optimal learning rate and scheduler.

🚀 Quick Start

Installation

# Using uv (recommended)
uv pip install visdet

# Or using pip (don't do this though you massochist)
pip install visdet

For detailed installation instructions, see the Installation Guide.

Training a Model

from visdet import SimpleRunner

# Simple, string-based API - just like Hugging Face or Ultralytics YOLO
runner = SimpleRunner(
    model='mask_rcnn_swin_s',
    dataset='coco_instance_segmentation',
    optimizer='adamw_8bit',
    scheduler='1cycle'
)

runner.train()

Discover available presets:

SimpleRunner.list_models()       # ['mask_rcnn_swin_s', ...]
SimpleRunner.list_datasets()     # ['coco_instance_segmentation', ...]
SimpleRunner.show_preset('mask_rcnn_swin_s')  # View full config

Customize via inheritance:

runner = SimpleRunner(
    model={
        '_base_': 'mask_rcnn_swin_s',
        'backbone': {'embed_dims': 128}  # Override specific params
    },
    dataset='coco_instance_segmentation'
)

For more examples and tutorials, visit the Documentation.

📚 Documentation

Comprehensive guides and tutorials available at binitai.github.io/visdet

📖 Available Tutorials

🎯 Model Zoo

Pre-trained models and benchmarks available in the Model Zoo.

🤝 Contributing

We welcome contributions! Please see the Contributing Guide for details.

📄 License

This project is released under the Apache 2.0 License.

🔗 Related Projects

visdet is part of a rich ecosystem of object detection frameworks. Here's how visdet compares to other notable projects:

Note: Read here for more hot takes on the ecosystem: Reddit discussion (surprise surprise, people are not a fan of ultralytics )

MMDetection

The original framework that visdet is based on. A comprehensive object detection toolbox with modular design, supporting 40+ architectures including detection, instance segmentation, and panoptic segmentation. Part of the OpenMMLab project with extensive model zoo and state-of-the-art implementations.

Choose MMDetection if: Don't do it, it's great, but visdet has all of the benefits with less of the pain

Detectron2

Facebook AI Research's production-grade detection library. Supports object detection, instance segmentation, panoptic segmentation, DensePose, and more. Known for excellent performance and deployment flexibility with TorchScript/Caffe2 export. The foundation for many research projects.

Choose Detectron2 if: Don't do it.

detrex

A specialized research platform built on top of Detectron2, focused specifically on Transformer-based detection algorithms (DETR variants). Provides unified modular design for 20+ Transformer models including DETR, Deformable-DETR, DINO, and MaskDINO. Uses LazyConfig for flexible configuration.

Choose detrex if: You're doing cutting-edge Transformer-based detection research or want to experiment with DETR variants. But I'm aiming to integrate all of the models that have operations supported by ONNX.

rf-detr

Ever since their integration of instance segmentation they're a valid and good alternative. We like roboflow.

visdet (this project)

A streamlined fork of MMDetection with integrated dependencies, no CUDA compilation requirements, and modern training techniques from fast.ai and LLM fine-tuning.

Choose visdet if: You want simplified installation (no CUDA compilation), pure Python/PyTorch implementation, and modern training techniques like progressive resizing, 1cycle schedules, and learning rate finders. Maybe even auto augmentations? We'll see.

🙏 Acknowledgements

visdet is built on top of the excellent MMDetection framework from OpenMMLab. We are grateful to all contributors of the original project.

_{Built with ❤️ by the Visia ML Engineering team}

Name		Name	Last commit message	Last commit date
Latest commit History 390 Commits
.github		.github
archive/mmdet		archive/mmdet
configs		configs
docs		docs
resources		resources
scripts		scripts
tests		tests
tools		tools
visdet		visdet
.all-contributorsrc		.all-contributorsrc
.gitignore		.gitignore
.import-linter.ini		.import-linter.ini
.markdown-link-check.json		.markdown-link-check.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.skylosrc		.skylosrc
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
justfile		justfile
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
test_output.log		test_output.log
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📘 Documentation • 🛠️ Installation • 👀 Model Zoo • 📊 Experiment Logs • 🆕 Changelog

✨ Why visdet?

The Most Useable Research Platform

🧠 Modern Training Philosophy

🔮 Future Integrations

Kornia

Triton

SPDL

DALI

Modal

Tutel

DeepSpeed

🚀 Quick Start

Installation

Training a Model

📚 Documentation

🎯 Model Zoo

🤝 Contributing

📄 License

🔗 Related Projects

MMDetection

Detectron2

detrex

rf-detr

visdet (this project)

🙏 Acknowledgements

About

Uh oh!

Releases 1

Packages

Contributors 3

Uh oh!

Languages

License

BinItAI/visdet

Folders and files

Latest commit

History

Repository files navigation

📘 Documentation • 🛠️ Installation • 👀 Model Zoo • 📊 Experiment Logs • 🆕 Changelog

✨ Why visdet?

The Most Useable Research Platform

🧠 Modern Training Philosophy

🔮 Future Integrations

🚀 Quick Start

Installation

Training a Model

📚 Documentation

🎯 Model Zoo

🤝 Contributing

📄 License

🔗 Related Projects

visdet (this project)

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages