The official implementation of AAAI 2025 paper ''BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation''.

Xiaolu Hou*, Mingcheng Li*, Dingkang Yang, Jiawei Chen, Ziyun Qian, Xiao Zhao, Yue Jiang, Jinjie Wei, Qingyao Xu, Lihua Zhang
Accepted by AAAI 2025
With the widespread use of virtual reality applications, 3D scene generation has become a new challenging research frontier. 3D scenes have highly complex structures and need to ensure that the output is dense, coherent, and contains all necessary structures. Many current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators. However, the generated scenes occupy large amounts of storage space and often lack effective regularisation methods, leading to geometric distortions. To this end, we propose BloomScene, a lightweight structured 3D Gaussian splatting for crossmodal scene generation, which creates diverse and high-quality 3D scenes from text or image inputs. Specifically, a crossmodal progressive scene generation framework is proposed to generate coherent scenes utilizing incremental point cloud reconstruction and 3D Gaussian splatting. Additionally, we propose a hierarchical depth prior-based regularization mechanism that utilizes multi-level constraints on depth accuracy and smoothness to enhance the realism and continuity of the generated scenes. Ultimately, we propose a structured context-guided compression mechanism that exploits structured hash grids to model the context of unorganized anchor attributes, which significantly eliminates structural redundancy and reduces storage overhead. Comprehensive experiments across multiple scenes demonstrate the significant potential and advantages of our framework compared with several baselines.
We provide pretrained image inpainting model. The download URLs are as follows:
-
Baidu Disk URL for Image inpainting model (Runway)
-
Google Drive URL for Image inpainting model (Runway)
Please download the model file and put it under ./BloomScene/models--runwayml--stable-diffusion-inpainting
We tested our code on a server with Ubuntu 18.04, CUDA 11.4, gcc 9.4.0
conda env create --file environment.yml
conda activate bloomscene
# torch-scatter
Download https://data.pyg.org/whl/torch-2.0.0%2Bcu117/torch_scatter-2.1.2%2Bpt20cu117-cp39-cp39-linux_x86_64.whl
pip install <path_to_the_whl_file>
cd submodules/depth-diff-gaussian-rasterization
python setup.py install
cd ../simple-knn
python setup.py install
cd ../gridencoder
python setup.py install
cd ../..# Default Example
python run.py --image <path_to_image> --text <path_to_text_file> [Other options] - Replace <path_to_image> and <path_to_text_file> with the paths to your image and text files.
Other options
--image: Input image for scene generation.--text: Text prompt for scene generation.--neg_text: Optional. Negative text prompt for scene generation.--lambdae: Optional. Try variable bitrate.--seed: Manual seed for reproducibility.--dep_value: Pixel-level depth regularization.--dep_value_lbd: lambda for pixel-level depth regularization.--dep_domin: Distribution-level depth regularization.--dep_domin_lbd: lambda for distribution-level depth regularization.--dep_smooth: Depth smoothness regularization.--dep_smooth_lbd: lambda for depth smoothness regularization.--diff_steps: Optional. Number of inference steps for running Stable Diffusion Inpainting.--save_dir: Optional. Directory to save the generated scenes and videos. Specify to organize outputs.--campath_gen: Camera path for scene generation (options:rotate360).--campath_render: Camera path for video rendering (options:rotate360).
Many thanks to LucidDreamer, ZoeDepth, 3DGS, Scaffold-GS, HAC and Runway for their excellent codebase.