A visionOS app that transforms 2D photos into immersive 3D spatial experiences using AI-powered depth estimation.
- AI Depth Estimation: Uses Apple's Depth Anything V2 CoreML model to infer depth from any 2D photo
- 3D Displacement Mesh: Converts depth maps into real 3D geometry with smooth normal calculation
- Immersive Display: View your spatial photos in a mixed reality immersive space on Vision Pro
- File Picker: Load photos from any folder including Downloads
- Sample Photo: Built-in test image to try the feature instantly
- Simulator Support: CPU-only inference mode for visionOS Simulator development
- Xcode 17+
- visionOS 26+ SDK
- Apple Vision Pro or visionOS Simulator
- Open
spatial-photo.xcodeprojin Xcode - Select the visionOS Simulator (Apple Vision Pro)
- Build and run (⌘R)
- Tap "Try Sample Photo" or "Select Photo" to load an image
- Wait for depth processing (~1 second on device, longer on simulator)
- Tap "View in Space" to see your 3D spatial photo
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Select │────▶│ Load │────▶│ Depth │────▶│ Generate │
│ Photo │ │ Image │ │ Inference │ │ Mesh │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Display in │
│ Immersive │
│ Space │
└─────────────┘
- Image Loading: The
ImageLoaderservice loads images with security-scoped resource access for file picker selections - Depth Inference: The
DepthProcessorruns the Depth Anything V2 Small model via CoreML's Vision framework - Mesh Generation: The
SpatialPhotoMeshGeneratorcreates a displacement mesh with bilinear depth sampling - Immersive Display: RealityKit renders the textured mesh in a mixed reality immersive space
This app uses Depth Anything V2 Small from Apple's CoreML model collection:
- Architecture: DPT-based with DINOv2 encoder
- Input Size: 518 × 392 pixels
- Model Size: ~50MB (Float16)
- Inference: Runs on Neural Engine for optimal performance (CPU on simulator)
The depth processor handles multiple Vision framework observation types:
- VNPixelBufferObservation: Common for depth models outputting image-to-image results
- VNCoreMLFeatureValueObservation: For models outputting MLMultiArray data
Supported pixel buffer formats:
OneComponent32Float- 32-bit floating point depthOneComponent16Half- 16-bit half precisionDepthFloat32- Dedicated depth formatOneComponent8- 8-bit grayscale
SpatialPhotoViewModel (@Observable, @MainActor)
│
├── ImageLoader (actor) ──────────► CGImage
│
├── DepthProcessor (actor) ───────► Depth map via CoreML
│
└── SpatialPhotoMeshGenerator ────► MeshResource with displacement
| Component | Description |
|---|---|
SpatialPhotoViewModel |
Observable state management for the processing pipeline |
ImageLoader |
Thread-safe image loading with security-scoped resource access |
DepthProcessor |
CoreML depth inference with Vision framework integration |
SpatialPhotoMeshGenerator |
Creates displacement meshes from depth data |
spatial-photo/
├── spatial-photo/
│ ├── Models/ # Data models + CoreML model
│ ├── ViewModels/ # Observable state management
│ ├── Views/ # SwiftUI views
│ ├── Services/ # ImageLoader, DepthProcessor
│ ├── Rendering/ # Mesh generation
│ └── Extensions/ # CGImage, MLMultiArray helpers
├── spatial-photoTests/ # Unit tests
└── Packages/
└── RealityKitContent/ # RealityKit assets
Run the test suite:
xcodebuild test -project spatial-photo.xcodeproj \
-scheme spatial-photo \
-destination 'platform=visionOS Simulator,name=Apple Vision Pro'| Test Suite | Tests | Description |
|---|---|---|
SpatialPhotoDataTests |
2 | Processing state validation |
MLMultiArrayExtensionTests |
4 | Depth data conversion from MLMultiArray |
SpatialPhotoMeshGeneratorTests |
2 | Mesh generator initialization |
CGImageExtensionTests |
2 | Image to CVPixelBuffer conversion |
ImageLoaderTests |
3 | Image loading and format support |
DepthProcessorTests |
3 | Model loading and depth inference |
SpatialPhotoViewModelTests |
3 | ViewModel state management |
Note: Tests that require RealityKit mesh generation have been removed as they are not supported in the visionOS Simulator.
| Feature | Simulator | Device |
|---|---|---|
| Compute Units | CPU only | Neural Engine + GPU |
| Inference Speed | Slower | ~1 second |
| Mesh Rendering | Limited | Full support |
| Immersive Space | Basic | Full mixed reality |
The app automatically detects the environment and configures CoreML appropriately:
#if targetEnvironment(simulator)
config.computeUnits = .cpuOnly
#else
config.computeUnits = .all
#endif- Depth Anything V2 - CoreML model by Apple
- Depth Anything - Original research by Lihe Yang et al.
MIT License
- v1.0 - Initial release with depth estimation and immersive display