SAM3: Segment Anything Model 3 is an experimental, multi-functional computer vision application that provides a unified interface for text-conditioned and click-interactive segmentation. Built around Meta's advanced facebook/sam3 multi-model architecture, this application seamlessly features separate pipelines optimized for text-to-image segmentation, multi-frame video propagation, and high-speed, point-cumulative image tracking. The suite leverages specialized model variants (Sam3Model, Sam3VideoModel, and Sam3TrackerModel) to support highly precise masks over custom objects, bounding boxes, or temporal frame shifts. Fully GPU-accelerated and wrapped in a web workspace with a stylized Citrus theme, SAM3 serves as an advanced sandbox for researchers and developers deploying production-grade, state-of-the-art pixel intelligence workflows.
| Demo 1 | Demo 2 | Demo 3 |
|---|---|---|
![]() |
![]() |
![]() |
demo.mp4
- Text-Conditional Image Segmentation: Instantly isolates target objects inside static images by matching custom queries (e.g., "cat", "face", "car wheel") against SAM3 instance maps, complete with custom threshold masking.
- Temporal Video Object Propagation: Tracks and cuts targeted entities dynamically across successive video frames. By incorporating bfloat16 numerical computation, the video pipeline scales smoothly inside active VRAM constraints.
- Interactive Point Tracking: Allows users to interactively click directly on a live input canvas. The model processes each cursor selection as a positive foreground anchor and updates the colored mask overlays instantly.
- Bespolke Gradio Workspace: Features an elegant, three-tab layout powered by the Gradio theme engine, integrating clean file-drop inputs, real-time tracking logs, and automated example sets.
- Supervision and Matplotlib Styling: Implements structural contour draws and randomized colormap overlays over generated masks to ensure clear human validation on complex textures.
├── examples/
│ ├── goldencat.webp
│ ├── player.jpg
│ ├── sample_video.mp4
│ ├── sample_video2.mp4
│ └── taxi.jpg
├── app.py
├── LICENSE
├── pre-requirements.txt
├── pyproject.toml
├── README.md
└── requirements.txt
└── uv.lock
To run SAM3 locally, configure a Python environment with the following dependencies. A compatible CUDA-enabled GPU is strongly recommended to handle real-time segmentation and video processing loops.
Standard PIP Installation
- Update pip:
pip install pip>=26.1.1
- Install dependencies:
pip install -r requirements.txt
uv is an extremely fast Python package and project manager written in Rust, ensuring rapid environment setup and complete reproducibility.
Step 1 — Install uv
-
macOS / Linux:
curl -LsSf [https://astral.sh/uv/install.sh](https://astral.sh/uv/install.sh) | sh -
Windows:
powershell -c "irm [https://astral.sh/uv/install.ps1](https://astral.sh/uv/install.ps1) | iex"[or]
-
pip install uv
Step 2 — Clone the repository
git clone https://github.com/PRITHIVSAKTHIUR/SAM3-Demo.git
cd SAM3-Demo
Step 3 — Initialize the project and install dependencies
uv sync
Step 4 — Run the script
uv run app.py
The application depends on the following primary packages (defined in requirements.txt):
transformers==5.9.0
sentencepiece
opencv-python
imageio[pyav]
torchvision
matplotlib
accelerate
kernels
pillow
gradio==6.6.0
spaces
numpy
torch==2.11.0
peft
Once the application is running, open your browser to the local address provided in your terminal (typically [http://127.0.0.1:7860/](http://127.0.0.1:7860/)).
- Image Segmentation Tab: Drop an image, enter a target prompt (e.g., "player in white"), and click Segment Image to see an annotated map.
- Video Segmentation Tab: Drop an MP4 clip, configure maximum frames/timeout conditions, write a tracking prompt, and click Segment Video to run mask propagation.
- Image Click Segmentation Tab: Upload your image, and click any element on the preview canvas to see positive foreground anchors auto-generate segmentation layers instantly.
- License: SAM License
- GitHub Repository: https://github.com/PRITHIVSAKTHIUR/SAM3-Demo.git


