EmbodiedSplat 🛋️
Online Feed-Forward Semantic 3DGS
for Open-Vocabulary 3D Scene Understanding

Seungjun Lee · Zihan Wang · Yunsong Wang · Gim Hee Lee
National University of Singapore

CVPR 2026

Code | Paper | Project Page

Build and understand at Once! By taking over 300 streaming images, our EmbodiedSplat reconstructs whole-scene open-vocabulary 3DGS in online manner at up to 5-6 FPS per-frame processing time. Reconstructed scene supports diverse perception tasks such as open-vocabulary 3D semantic segmentation, 2D-rendered semantic segmentation and novel-view color synthesis with depth rendering.

Table of Contents

TODO
Installation
Data Preparation
Evaluation
Acknowledgement
Citation

News:

[2026/02/21] EmbodiedSplat is accepted to CVPR 2026 🔥. The code will be released before June.
[2026/05/19] The code and pretrained weights are released! 👊🏻

TODO

Release the code of EmbodiedSplat and pretrained weights
If time permits, we are planning to give some updates (Not for publishing another paper, but just for fun ☺️):
- Replacing the reconstruction backbone from FreeSplat++ to the most recent pose-free online 3DGS feed-forward model.
- Replacing the CLIP(OpenSeg, MaskAdapter) + SAM pipeline into the more stronger 2D VLMs such as SAM3.
- Attaching LLM to EmbodiedSplat by following the spirit of SplatTalk.
- Adopting EmbodiedSplat to real robot and release the code.

Installation

Dependencies 📝

The main dependencies of the project are the following:

python: 3.10
cuda: 11.8

You can set up a conda environment as follows:

conda create -n embodiedsplat python=3.10
conda activate embodiedsplat
conda install -c conda-forge libopenblas=0.3.31 openblas-devel=0.3.31

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

pip install "setuptools<81"
pip install -r requirements.txt --no-build-isolation

cd src/third_party/MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas

pip install --no-build-isolation src/model/encoder/submodules/simple-knn
pip install --no-build-isolation src/ops
pip install --no-build-isolation src/third_party/localagg
pip install --no-build-isolation git+https://github.com/JonathonLuiten/diff-gaussian-rasterization-w-depth
pip install --no-build-isolation src/third_party/langsplat-rasterization
pip install git+https://github.com/openai/CLIP.git

# if you face error when you run the evaluation code due to MinkowskiEngine, do:
cd $CONDA_PREFIX/lib
ln -sf libopenblasp-r0.3.31.so libopenblas.so.0
ln -sf libopenblasp-r0.3.31.so libopenblas.so
cd {YOUR_PATH}

Data Preparation

The testing scenes in ScanNet and ScanNet++, and pretrained weights are available here. You can easily download all the preprocessed data by running:

python download_data.py

Once you run the above command, two folders must be produced:

pretrained: Including all the pretrained weights of the EmbodiedSplat and auxiliary 2D models.
dataset: Including all the testing scenes and ground-truth annotations from ScanNet and ScanNet++.

Evaluation

NOTE 📌 : We make a minor update to the inference strategy. As mentioned at the end of Sec. 7.2, we apply floater removal as a post-refinement step following FreeSplat++. In our original paper, Gaussians identified as floaters are also excluded from semantic prediction on point clouds in Eq. 11. However, we empirically find that this exclusion degrades semantic performance, even though floater removal clearly improves rendered RGB quality. Hence, in the released code, floater Gaussians are excluded only during RGB rendering, while they are still used for semantic prediction. As a result, the evaluation results may be higher than the numbers reported in the paper.

NOTE 📌 : We support two types of inference strategy:

incremantal: Among all past frames, we select the N=30 images with the smallest pose differences from the current frame and use them as reference frames.
online: Simply select the past N=30 frames, i.e., [t−30,t−1], and use them as reference frames for timestep t.

The dafault setting is incremental, but it can be changed to online by setting model.encoder.recon_mode=online in the config files under config/experiment. Both settings yield similar performance.

We provide evaluation scripts for diverse settings across ScanNet and ScanNet++, with options to enable or disable GT depth. All the experiments are conducted in single NVIDIA RTX 6000 Ada GPU (48GB).

Column 1	EmbodiedSplat	EmbodiedSplat-fast
ScanNet	Here	Here
ScanNet, GT Depth	Here	Here
ScanNet++	Here	Here
ScanNet++, GT Depth	Here	Here

Generated semantic Gaussians are stored in outputs_semantic folder and subsequently used for evaluation in point clouds.

Acknowledgement

Our work is inspired a lot from the following works. We sincerely appreciate to their great contributions!

Citation

If you find our code or paper useful, please cite

@article{lee2026embodiedsplat,
  title={EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding},
  author={Lee, Seungjun and Wang, Zihan and Wang, Yunsong and Lee, Gim Hee},
  journal={arXiv preprint arXiv:2603.04254},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
config		config
documents		documents
modules		modules
scripts		scripts
sr_utils		sr_utils
src		src
static		static
README.md		README.md
download_data.py		download_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EmbodiedSplat 🛋️
Online Feed-Forward Semantic 3DGS
for Open-Vocabulary 3D Scene Understanding

CVPR 2026

Code | Paper | Project Page

News:

TODO

Installation

Dependencies 📝

Data Preparation

Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EmbodiedSplat 🛋️ Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

CVPR 2026

Code | Paper | Project Page

News:

TODO

Installation

Dependencies 📝

Data Preparation

Evaluation

Acknowledgement

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

EmbodiedSplat 🛋️
Online Feed-Forward Semantic 3DGS
for Open-Vocabulary 3D Scene Understanding

Packages