A comprehensive computer vision system for measuring 3D distances from single images using state-of-the-art deep learning models.
This system combines two cutting-edge technologies:
- Depth Pro: Apple's monocular depth estimation model for metric depth prediction
- GeoCalib: Single-image camera calibration for improved accuracy
The system provides both single-point and multi-point measurement capabilities with interactive and non-interactive modes.
- Depth Pro Pipeline: Monocular depth estimation in meters
- GeoCalib Pipeline: Camera intrinsics estimation (focal length, principal point)
- 3D Back-projection: Converts 2D image points to 3D world coordinates
- Distance Calculation: Euclidean distance between 3D points
- Visualization: Depth maps and measurement overlays
Input Image โ Depth Pro โ Depth Map โ Point Selection โ 3D Back-projection โ Distance Calculation
โ
GeoCalib โ Camera Intrinsics โ Improved Accuracy
VisionScale/
โโโ measure_3d_distance.py # Single-point measurement pipeline
โโโ measure_3d_distance_multi.py # Multi-point measurement pipeline (up to 20 points)
โโโ setup.py # Automated setup script
โโโ requirements.txt # Python dependencies
โโโ src/geocalib/ # GeoCalib source code
โโโ ml-depth-pro/ # Depth Pro source code
โโโ checkpoints/ # Model checkpoints
โโโ 3d_multi_image/ # Output folder for multi-point results
# Clone the repository
git clone <repository-url>
cd VisionScale
# Run automated setup
python3 setup.pyThe setup script will:
- โ Install Python dependencies
- โ Download and install Depth Pro
- โ Download and install GeoCalib
- โ Download pretrained models
- โ Verify GPU availability
# Interactive mode (click two points)
python3 measure_3d_distance.py --image your_image.jpg --mode fused
# Non-interactive mode (predefined points)
python3 measure_3d_distance.py --image your_image.jpg --point1 100 200 --point2 300 400# Interactive mode (click multiple points)
python3 measure_3d_distance_multi.py --image your_image.jpg --mode fused
# Non-interactive mode (up to 20 points)
python3 measure_3d_distance_multi.py --image your_image.jpg \
--point1 100 200 --point2 300 400 --point3 500 600 \
--point4 700 800 --point5 900 1000Purpose: Measure distance between exactly two points in an image.
Workflow:
- Image Loading: Load and validate input image
- Depth Estimation: Run Depth Pro to get depth map in meters
- Camera Calibration: Run GeoCalib to estimate camera intrinsics
- Point Selection: Interactive clicking or command-line input
- Depth Sampling: Robust depth sampling with bilinear interpolation
- 3D Back-projection: Convert 2D points to 3D using camera intrinsics
- Distance Calculation: Euclidean distance between 3D points
- Output Generation: JSON data, visualizations, and analysis
Modes:
depthpro: Uses only Depth Pro intrinsicsgeocalib: Uses only GeoCalib intrinsicsfused: Combines both methods (geometric mean)
Output:
3d_[filename]/folder containing:data.json: Complete measurement datadepth_map.png: Depth visualizationmeasurement_visualization.png: Measurement overlay
Purpose: Measure distances between multiple points (up to 20) and calculate all combinations.
Workflow:
- Image Loading: Load and validate input image
- Depth Estimation: Run Depth Pro to get depth map in meters
- Camera Calibration: Run GeoCalib to estimate camera intrinsics
- Point Selection: Interactive clicking (unlimited) or command-line (up to 20)
- Depth Sampling: Sample depths for all points
- 3D Back-projection: Convert all 2D points to 3D
- Combination Calculation: Calculate distances between all point pairs
- Output Generation: Comprehensive results and visualizations
Features:
- Unlimited Points: Interactive mode supports unlimited points
- Command-line Support: Up to 20 points via command-line arguments
- All Combinations: Calculates distances between every point pair
- Duplicate Prevention: Only calculates each unique pair once (1-2, not 2-1)
- Statistical Analysis: Min, max, mean, standard deviation of distances
Output:
3d_multi_image/folder containing:input_with_points.png: Original image with numbered pointsdepth_map.png: Depth visualizationmeasurement_visualization.png: All measurements with color codingdata.json: Complete data with all point combinations
- Points numbered 1 and 2
- Distance line between points
- Depth Pro and GeoCalib predictions
- Ground truth comparison (if provided)
- Error analysis
- All points numbered sequentially
- All connections between points
- Color-coded measurements:
- Blue: Depth Pro predictions
- Yellow: GeoCalib predictions
- Distance labels on each connection
- Summary statistics overlay
{
"image_path": "path/to/image.jpg",
"image_size": {"width": 1920, "height": 1080},
"mode": "fused",
"camera_intrinsics": {
"depth_pro_focal": 1234.56,
"geocalib_focal": {"fx": 1234.56, "fy": 1234.56},
"used_principal_point": {"cx": 960, "cy": 540}
},
"measurement_points": {
"total_points": 4,
"points_2d": [...],
"points_3d_dp": [...],
"points_3d_gc": [...]
},
"depth_pro_predictions": {
"total_distances": 6,
"distances": [...],
"summary": {
"min_distance": 1.234,
"max_distance": 5.678,
"mean_distance": 3.456,
"std_distance": 1.789
}
},
"geocalib_predictions": {...},
"depth_map_info": {...},
"output_files": {...}
}python3 measure_3d_distance.py \
--image photo.jpg \
--mode fused \
--show_depth \
--point1 100 200 \
--point2 300 400 \
--ground_truth 0.5python3 measure_3d_distance_multi.py \
--image photo.jpg \
--mode fused \
--point1 100 200 \
--point2 300 400 \
--point3 500 600 \
--point4 700 800 \
--point5 900 1000--image: Input image path (required)--mode: Measurement mode (depthpro,geocalib,fused)--show_depth: Display depth visualization--point1to--point20: Predefined point coordinates (u, v)--ground_truth: Known distance for comparison (single-point only)
- Model: Apple's Depth Pro for monocular depth estimation
- Output: Metric depth map in meters
- Focal Length: Estimated from image EXIF or model prediction
- Accuracy: State-of-the-art depth estimation
- Model: Single-image camera calibration
- Output: Camera intrinsics (fx, fy, cx, cy)
- Features: Gravity direction estimation
- Accuracy: Improved distance measurement accuracy
# Camera frame: X right, Y down, Z forward (meters)
X = (u - cx) / fx * Z
Y = (v - cy) / fy * Z
Z = depth_at_point- Bilinear Interpolation: Subpixel depth sampling
- Median Fallback: 3x3 median filter for invalid depths
- Validation: Checks for finite, positive depth values
- GPU: NVIDIA GPU with CUDA support (recommended)
- RAM: 8GB minimum, 16GB recommended
- Storage: 5GB for models and dependencies
- Python: 3.8 or higher
- CUDA: 11.0 or higher (for GPU acceleration)
- OS: Linux, macOS, or Windows
- PyTorch >= 1.9.0
- OpenCV >= 4.5.0
- NumPy >= 1.21.0
- Pillow >= 8.3.0
- Matplotlib >= 3.4.0
- SciPy >= 1.7.0
-
Import Errors
# Reinstall dependencies python3 setup.py -
GPU Issues
- System automatically falls back to CPU
- Check CUDA installation:
nvidia-smi
-
Model Download Failures
- Check internet connection
- Verify disk space
- Retry setup script
-
Depth Sampling Issues
- Avoid featureless areas (sky, glass, shadows)
- Try different points
- Use
--show_depthto visualize depth map
- GPU Usage: Ensure CUDA is properly installed
- Memory: Close other applications during processing
- Batch Processing: Use non-interactive mode for multiple images
- Depth Pro: State-of-the-art monocular depth estimation
- GeoCalib: Improved camera calibration accuracy
- Fused Mode: Best overall performance
- GPU: ~10-15 seconds per image
- CPU: ~20-30 seconds per image
- Multi-point: Scales with number of point combinations
- Batch Processing: Process multiple images simultaneously
- Real-time Processing: Video stream support
- Additional Models: Support for other depth estimation models
- Web Interface: Browser-based point selection
- API Integration: REST API for remote processing
- Depth Pro: [Apple's Depth Pro Paper]
- GeoCalib: [ECCV 2024 Paper]
- Depth Pro: https://github.com/apple/ml-depth-pro
- GeoCalib: https://github.com/cvg/GeoCalib
Note: This system combines cutting-edge research from multiple institutions to provide accurate 3D distance measurements from single images. The integration of Depth Pro and GeoCalib represents a novel approach to monocular 3D measurement.