Skip to content

drerrolb/spatial-photo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial Photo

A visionOS app that transforms 2D photos into immersive 3D spatial experiences using AI-powered depth estimation.

Features

  • AI Depth Estimation: Uses Apple's Depth Anything V2 CoreML model to infer depth from any 2D photo
  • 3D Displacement Mesh: Converts depth maps into real 3D geometry with smooth normal calculation
  • Immersive Display: View your spatial photos in a mixed reality immersive space on Vision Pro
  • File Picker: Load photos from any folder including Downloads
  • Sample Photo: Built-in test image to try the feature instantly
  • Simulator Support: CPU-only inference mode for visionOS Simulator development

Requirements

  • Xcode 17+
  • visionOS 26+ SDK
  • Apple Vision Pro or visionOS Simulator

Getting Started

  1. Open spatial-photo.xcodeproj in Xcode
  2. Select the visionOS Simulator (Apple Vision Pro)
  3. Build and run (⌘R)
  4. Tap "Try Sample Photo" or "Select Photo" to load an image
  5. Wait for depth processing (~1 second on device, longer on simulator)
  6. Tap "View in Space" to see your 3D spatial photo

How It Works

Processing Pipeline

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Select    │────▶│    Load     │────▶│   Depth     │────▶│  Generate   │
│   Photo     │     │   Image     │     │  Inference  │     │    Mesh     │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
                                                                   │
                                                                   ▼
                                                            ┌─────────────┐
                                                            │  Display in │
                                                            │  Immersive  │
                                                            │    Space    │
                                                            └─────────────┘
  1. Image Loading: The ImageLoader service loads images with security-scoped resource access for file picker selections
  2. Depth Inference: The DepthProcessor runs the Depth Anything V2 Small model via CoreML's Vision framework
  3. Mesh Generation: The SpatialPhotoMeshGenerator creates a displacement mesh with bilinear depth sampling
  4. Immersive Display: RealityKit renders the textured mesh in a mixed reality immersive space

Depth Model

This app uses Depth Anything V2 Small from Apple's CoreML model collection:

  • Architecture: DPT-based with DINOv2 encoder
  • Input Size: 518 × 392 pixels
  • Model Size: ~50MB (Float16)
  • Inference: Runs on Neural Engine for optimal performance (CPU on simulator)

Output Handling

The depth processor handles multiple Vision framework observation types:

  • VNPixelBufferObservation: Common for depth models outputting image-to-image results
  • VNCoreMLFeatureValueObservation: For models outputting MLMultiArray data

Supported pixel buffer formats:

  • OneComponent32Float - 32-bit floating point depth
  • OneComponent16Half - 16-bit half precision
  • DepthFloat32 - Dedicated depth format
  • OneComponent8 - 8-bit grayscale

Architecture

SpatialPhotoViewModel (@Observable, @MainActor)
       │
       ├── ImageLoader (actor) ──────────► CGImage
       │
       ├── DepthProcessor (actor) ───────► Depth map via CoreML
       │
       └── SpatialPhotoMeshGenerator ────► MeshResource with displacement

Key Components

Component Description
SpatialPhotoViewModel Observable state management for the processing pipeline
ImageLoader Thread-safe image loading with security-scoped resource access
DepthProcessor CoreML depth inference with Vision framework integration
SpatialPhotoMeshGenerator Creates displacement meshes from depth data

Project Structure

spatial-photo/
├── spatial-photo/
│   ├── Models/              # Data models + CoreML model
│   ├── ViewModels/          # Observable state management
│   ├── Views/               # SwiftUI views
│   ├── Services/            # ImageLoader, DepthProcessor
│   ├── Rendering/           # Mesh generation
│   └── Extensions/          # CGImage, MLMultiArray helpers
├── spatial-photoTests/      # Unit tests
└── Packages/
    └── RealityKitContent/   # RealityKit assets

Testing

Run the test suite:

xcodebuild test -project spatial-photo.xcodeproj \
  -scheme spatial-photo \
  -destination 'platform=visionOS Simulator,name=Apple Vision Pro'

Test Coverage

Test Suite Tests Description
SpatialPhotoDataTests 2 Processing state validation
MLMultiArrayExtensionTests 4 Depth data conversion from MLMultiArray
SpatialPhotoMeshGeneratorTests 2 Mesh generator initialization
CGImageExtensionTests 2 Image to CVPixelBuffer conversion
ImageLoaderTests 3 Image loading and format support
DepthProcessorTests 3 Model loading and depth inference
SpatialPhotoViewModelTests 3 ViewModel state management

Note: Tests that require RealityKit mesh generation have been removed as they are not supported in the visionOS Simulator.

Simulator vs Device

Feature Simulator Device
Compute Units CPU only Neural Engine + GPU
Inference Speed Slower ~1 second
Mesh Rendering Limited Full support
Immersive Space Basic Full mixed reality

The app automatically detects the environment and configures CoreML appropriately:

#if targetEnvironment(simulator)
config.computeUnits = .cpuOnly
#else
config.computeUnits = .all
#endif

Acknowledgments

License

MIT License

Version History

  • v1.0 - Initial release with depth estimation and immersive display

About

Apple Vision Spatial Photo Generation

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages