This application uses advanced AI/ML models to scan images of my storage furniture, identify different storage types, and segment individual compartments within each storage unit. It creates a structured list of all available storage spaces and displays the image with visual annotations showing all identified segments.
The purpose for creating this app was to get a sence of accuracy in different models for segmentation and also to test pretrainded weights in each model.ß
The Storage Segmentation App interface showing multiple detection models, ensemble methods, and visualization options that help me customize how storage units are identified and displayed.
- Scan images of my storage furniture
- Multiple detection models to choose from:
- YOLO for fast detection
- YOLO-NAS with 10-17% higher mAP than YOLOv8
- RT-DETR (Real-Time Detection Transformer) combining transformer accuracy with YOLO speed
- SAM 2.1 for precise segmentation
- FastSAM for optimized segmentation
- Grounding DINO for vision-language capabilities and zero-shot detection
- Mask R-CNN models from Detectron2
- DeepLabV3+ models
- Advanced detection pipelines:
- Hybrid Pipeline combining YOLO-NAS, Grounding DINO, and SAM 2.1
- Ensemble methods (uncertainty-aware, average, max, vote)
- Segment individual compartments within each storage unit
- Create a structured list of all my available storage spaces
- Display my image with visual annotations showing all identified segments
- Hierarchical classification with parent-child relationships
- Dynamic confidence thresholds based on furniture type
- Enhanced visualization with contour smoothing and corner rounding
- Python 3.8+
- Dependencies:
- ultralytics (YOLO, SAM 2.1, FastSAM)
- torch and torchvision
- groundingdino-py
- huggingface-hub
- supervision
- timm
- transformers
- streamlit
- OpenCV
- NumPy
- pandas
- Pillow
- scikit-learn
- matplotlib
For installation and usage instructions, please refer to the Installation and Usage Guide.
app.py: Main Streamlit applicationconfig.py: Configuration settings for the applicationutils.py: Visualization and helper functionsmodels.py: Data structures for storage units and compartmentsrequirements.txt: List of required dependenciesdetectors/: Directory containing all detector implementations:base_detector.py: Base class for all detectorsyolo_detector.py: YOLOv8 detector implementationyolo_nas_detector.py: YOLO-NAS detector implementationrt_detr_detector.py: RT-DETR detector implementationsam_detector.py: SAM 2.1 detector implementationfastsam_detector.py: FastSAM detector implementationgrounding_dino_detector.py: Grounding DINO detector implementationdetectron2_detector.py: Detectron2 Mask R-CNN detector implementationdeeplabv3_detector.py: DeepLabV3+ detector implementationhybrid_detector.py: Hybrid pipeline implementationensemble_detector.py: Ensemble methods implementationfactory.py: Factory for creating detector instances
data/models/: Directory for storing modelsdata/samples/: Sample images for testing
The application now uses a multi-model approach with several detection pipelines:
-
Single Model Pipeline:
- Uses a single model (YOLO, YOLO-NAS, RT-DETR, SAM 2.1, etc.) for both detection and segmentation
- Configurable for different model sizes and confidence thresholds
-
Hybrid Pipeline:
- YOLO-NAS for initial furniture unit detection
- Grounding DINO for component classification using text prompts
- SAM 2.1 for precise segmentation of the detected units
-
Ensemble Pipeline:
- Combines predictions from multiple models using different strategies:
- Uncertainty-aware ensemble: Weights predictions by uncertainty estimates
- Average ensemble: Averages predictions from all models
- Max ensemble: Takes the maximum confidence prediction
- Vote ensemble: Uses majority voting for final predictions
- Supports test-time augmentation for improved accuracy
- Combines predictions from multiple models using different strategies:
All pipelines support hierarchical classification, with storage units containing compartments, and dynamic confidence thresholds based on furniture type.
This project relies on several powerful open-source libraries:
- Ultralytics YOLO: State-of-the-art object detection and segmentation framework that powers the core detection capabilities. Documentation
- YOLO-NAS: Neural Architecture Search version of YOLO with 10-17% higher mAP. Documentation
- RT-DETR: Real-Time Detection Transformer combining transformer accuracy with YOLO speed. Documentation
- SAM 2.1: Segment Anything Model for precise segmentation. Documentation
- Grounding DINO: Vision-language model for zero-shot detection. GitHub
- Detectron2: Facebook AI Research's detection and segmentation framework. Documentation
- Streamlit: Interactive web application framework for creating data apps with minimal code. Documentation
- OpenCV: Computer vision library used for image processing and manipulation. Documentation
- NumPy: Fundamental package for scientific computing with Python, used for array operations and numerical processing. Documentation
- pandas: Data analysis and manipulation library, used for structured data handling. Documentation
- Pillow: Python Imaging Library fork, used for image opening, manipulation, and saving. Documentation
- PyTorch: Deep learning framework that powers all the models. Documentation
- Hugging Face: Platform for sharing and using machine learning models. Documentation
- Supervision: Computer vision annotation toolkit. GitHub