Skip to content

Adding scene alignment & normalization across datasets #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
05a390a
adding support for arkitscenes
gauravpradeep Feb 19, 2025
022cf89
removing image rotations
gauravpradeep Mar 28, 2025
61f7081
readme fix
gauravpradeep Apr 3, 2025
b79ddd2
updated installation instructions
gauravpradeep Apr 6, 2025
64b6547
Small config changes
sayands Apr 7, 2025
61c64fd
adding support for multiscan
gauravpradeep Feb 19, 2025
6fb8d4d
config related changes for MultiScan
gauravpradeep Mar 12, 2025
45a65de
prepare data readme fix
gauravpradeep Mar 12, 2025
35007c9
arkit open3d convention bug fix
gauravpradeep Mar 21, 2025
fb110ec
Typo change
sayands Apr 4, 2025
0fbe09f
Commit issue fix + path change
sayands Apr 18, 2025
6a17c53
1d preprocessing changes
gauravpradeep Apr 22, 2025
7e51099
2d preprocessing changes
gauravpradeep Apr 22, 2025
97bf362
3d preprocessing changes
gauravpradeep Apr 22, 2025
083a8d7
multimodal dumping changes
gauravpradeep Apr 22, 2025
2b4590c
dataset util changes for alignment
gauravpradeep Apr 22, 2025
39fc410
scanbase changes to work with npz
gauravpradeep Apr 22, 2025
752e30a
scanbase change to read npz isntead of pt
gauravpradeep Apr 29, 2025
2234160
Added normalization changes for Scannet, MultiScan & 3RScan
sayands May 27, 2025
03b4ca4
Fix npz load issue
sayands Jun 5, 2025
bd8fb5d
Redundant code remove + minor fix
sayands Jun 6, 2025
662383d
reverting to original preprocessing(untested)
gauravpradeep Jun 19, 2025
de3e6ea
reverting scannet scene centering
gauravpradeep Jun 19, 2025
a9bbfb5
config fix + redundant code remove
sayands Jun 20, 2025
8e39053
Update paths
sayands Jun 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 68 additions & 1 deletion DATA.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ We list the available data used in the current version of CrossOver in the table
| ------------ | ----------------------------- | ----------------------------------- | -------------------------- | -------------------------- |
| ScanNet | `[point, rgb, cad, referral]` | `[point, rgb, floorplan, referral]` | ❌ | ✅ |
| 3RScan | `[point, rgb, referral]` | `[point, rgb, referral]` | ✅ | ✅ |
| ARKitScenes | `[point, rgb, referral]` | `[point, rgb, referral]` | ❌ | ✅ |
| MultiScan | `[point, rgb, referral]` | `[point, rgb, referral]` | ❌ | ✅ |


We detail data download and release instructions for preprocessing with scripts for ScanNet + 3RScan.
Expand Down Expand Up @@ -110,4 +112,69 @@ Scan3R/
| │ ├── objectsDataMultimodal.pt -> object data combined from data1D.pt + data2D.pt + data3D.pt (for easier loading)
| │ └── sel_cams_on_mesh.png (visualisation of the cameras selected for computing RGB features per scan)
| └── ...
```
```
### MultiScan

#### Running preprocessing scripts
Adjust the path parameters of `MultiScan` in the config files under `configs/preprocess`. Run the following (after changing the `--config-path` in the bash file):

```bash
$ bash scripts/preprocess/process_multiscan.sh
```

Our script for MultiScan dataset performs the following additional processing:

- 3D-to-2D projection for 2D segmentation and stores as `gt-projection-seg.pt` for each scan.

Post running preprocessing, the data structure should look like the following:

```
MultiScan/
├── objects_chunked/ (object data chunked into hdf5 format for instance baseline training)
| ├── train_objects.h5
| └── val_objects.h5
├── scans/
| ├── scene_00000_00/
| │ ├── gt-projection-seg.pt -> 3D-to-2D projected data consisting of framewise 2D instance segmentation
| │ ├── data1D.pt -> all 1D data + encoded (object referrals + BLIP features)
| │ ├── data2D.pt -> all 2D data + encoded (RGB + floorplan + DinoV2 features)
| │ ├── data2D_all_images.pt (RGB features of every image of every scan)
| │ ├── data3D.pt -> all 3D data + encoded (Point Cloud + I2PMAE features - object only)
| │ ├── object_id_to_label_id_map.pt -> Instance ID to NYU40 Label mapped
| │ ├── objectsDataMultimodal.pt -> object data combined from data1D.pt + data2D.pt + data3D.pt (for easier loading)
| │ └── sel_cams_on_mesh.png (visualisation of the cameras selected for computing RGB features per scan)
| └── ...
```

### ARKitScenes

#### Running preprocessing scripts
Adjust the path parameters of `ARKitScenes` in the config files under `configs/preprocess`. Run the following (after changing the `--config-path` in the bash file):

```bash
$ bash scripts/preprocess/process_arkit.sh
```

Our script for ARKitScenes dataset performs the following additional processing:

- 3D-to-2D projection for 2D segmentation and stores as `gt-projection-seg.pt` for each scan.

Post running preprocessing, the data structure should look like the following:

```
ARKitScenes/
├── objects_chunked/ (object data chunked into hdf5 format for instance baseline training)
| ├── train_objects.h5
| └── val_objects.h5
├── scans/
| ├── 40753679/
| │ ├── gt-projection-seg.pt -> 3D-to-2D projected data consisting of framewise 2D instance segmentation
| │ ├── data1D.pt -> all 1D data + encoded (object referrals + BLIP features)
| │ ├── data2D.pt -> all 2D data + encoded (RGB + floorplan + DinoV2 features)
| │ ├── data2D_all_images.pt (RGB features of every image of every scan )
| │ ├── data3D.pt -> all 3D data + encoded (Point Cloud + I2PMAE features - object only)
| │ ├── object_id_to_label_id_map.pt -> Instance ID to NYU40 Label mapped
| │ ├── objectsDataMultimodal.pt -> object data combined from data1D.pt + data2D.pt + data3D.pt (for easier loading)
| │ └── sel_cams_on_mesh.png (visualisation of the cameras selected for computing RGB features per scan)
| └── ...
```
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,9 @@ See [DATA.MD](DATA.md) for detailed instructions on data download, preparation a
| ------------ | ----------------------------- | ----------------------------------- | -------------------------- | -------------------------- |
| Scannet | `[point, rgb, cad, referral]` | `[point, rgb, floorplan, referral]` | ❌ | ✅ |
| 3RScan | `[point, rgb, referral]` | `[point, rgb, referral]` | ✅ | ✅ |
| ARKitScenes | `[point, rgb, referral]` | `[point, rgb, referral]` | ❌ | ✅ |
| MultiScan | `[point, rgb, referral]` | `[point, rgb, referral]` | ❌ | ✅ |


> To run our demo, you only need to download generated embedding data; no need for any data preprocessing.

Expand All @@ -136,7 +139,7 @@ Various configurable parameters:
- `--database_path`: Path to the precomputed embeddings of the database scenes downloaded before (eg: `./release_data/embed_scannet.pt`).
- `--query_modality`: Modality of the query scene, Options: `point`, `rgb`, `floorplan`, `referral`
- `--database_modality`: Modality used for retrieval. Same options as above.
- `--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](#checkpoints)), example_path: `./checkpoints/scene_crossover_scannet+scan3r.pth/`).
- `--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](#checkpoints)), example_path: `./checkpoints/scene_crossover_scannet+scan3r.pth/`.

For embedding and pre-trained model download, refer to [generated embedding data](DATA.md#generated-embedding-data) and [checkpoints](#checkpoints) sections.

Expand Down
2 changes: 1 addition & 1 deletion TRAIN.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ $ bash scripts/train/train_instance_crossover.sh
```

#### Train Scene Retrieval Pipeline
Adjust path/configuration parameters in `configs/train/train_scene_crossover.yaml`. You can also add your customised dataset or choose to train on Scannet & 3RScan or either. Run the following:
Adjust path/configuration parameters in `configs/train/train_scene_crossover.yaml`. You can also add your customised dataset or choose to train on Scannet, 3RScan, MultiScan, & ARKitScenes or any combination of the same. Run the following:

```bash
$ bash scripts/train/train_scene_crossover.sh
Expand Down
14 changes: 14 additions & 0 deletions common/load_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,20 @@ def write_json(data_dict: Any, filename: str) -> None:
with open(filename, "w") as outfile:
outfile.write(json_obj)

def load_npz_as_dict(filename: str) -> dict:
with np.load(filename, allow_pickle=True) as npz:
if isinstance(npz, np.lib.npyio.NpzFile):
out = {}
for k in npz.files:
val = npz[k]
if (isinstance(val, np.ndarray) and
val.dtype == object and
val.shape == ()):
out[k] = val.item()
else:
out[k] = val
return out

def get_print_format(value: Any) -> str:
"""Determines the appropriate format string for a given value."""
if isinstance(value, int):
Expand Down
25 changes: 22 additions & 3 deletions configs/evaluation/eval_instance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ data :
voxel_size : 0.02

Scan3R:
base_dir : /drive/datasets/Scan3R/
base_dir : /media/sayan/internal/datasets/Scan3R/
process_dir : ${data.process_dir}/Scan3R/
processor3D : Scan3R3DProcessor
processor2D : Scan3R2DProcessor
Expand All @@ -43,14 +43,33 @@ data :
max_object_len : 150
voxel_size : 0.02

ARKitScenes:
base_dir : /media/sayan/internal/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

MultiScan:
base_dir : /media/sayan/internal/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

task:
name : InferenceObjectRetrieval
InferenceObjectRetrieval:
val : [Scannet]
modalities : ['rgb', 'point', 'cad', 'referral']
scene_modalities : ['rgb', 'point', 'referral', 'floorplan']
ckpt_path : /drive/dumps/multimodal-spaces/runs/release_runs/instance_crossover_scannet+scan3r.pth

ckpt_path : /drive/dumps/multimodal-spaces/runs/release_runs/instance_crossover_scannet+scan3r+multiscan.pth

inference_module: ObjectRetrieval

Expand Down
23 changes: 21 additions & 2 deletions configs/evaluation/eval_scene.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ data :
voxel_size : 0.02

Scan3R:
base_dir : /drive/datasets/Scan3R/
base_dir : /media/sayan/internal/datasets/Scan3R/
process_dir : ${data.process_dir}/Scan3R/
processor3D : Scan3R3DProcessor
processor2D : Scan3R2DProcessor
Expand All @@ -43,13 +43,32 @@ data :
max_object_len : 150
voxel_size : 0.02

ARKitScenes:
base_dir : /media/sayan/internal/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
max_object_len : 150
voxel_size : 0.02
avail_modalities : ['point', 'cad', 'rgb', 'referral']
MultiScan:
base_dir : /media/sayan/internal/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

task:
name : InferenceSceneRetrieval
InferenceSceneRetrieval:
val : [Scannet]
modalities : ['rgb', 'point', 'cad', 'referral']
scene_modalities : ['rgb', 'point', 'referral', 'floorplan'] #, 'point']
ckpt_path : /drive/dumps/multimodal-spaces/runs/release_runs/scene_crossover_scannet+scan3r.pth
ckpt_path : /drive/dumps/multimodal-spaces/runs/release_runs/scene_crossover_scannet+scan3r+multiscan.pth

inference_module: SceneRetrieval
model:
Expand Down
17 changes: 16 additions & 1 deletion configs/preprocess/process_1d.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,29 @@ data:
aggre_subfix : _vh_clean.aggregation.json

Scan3R:
base_dir : /drive/datasets/Scan3R/
base_dir : /media/sayan/internal/datasets/Scan3R/
process_dir : ${data.process_dir}/Scan3R/
processor3D : Scan3R3DProcessor
processor2D : Scan3R2DProcessor
processor1D : Scan3R1DProcessor
label_filename : labels.instances.align.annotated.v2.ply
skip_frames : 1

ARKitScenes:
base_dir : /media/sayan/internal/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
skip_frames : 1
MultiScan:
base_dir : /media/sayan/internal/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
skip_frames : 1

Shapenet:
base_dir : /drive/datasets/Shapenet/ShapeNetCore.v2/

Expand Down
20 changes: 18 additions & 2 deletions configs/preprocess/process_2d.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,30 @@ data:
skip_frames : 5

Scan3R:
base_dir : /drive/datasets/Scan3R/
base_dir : /media/sayan/internal/datasets/Scan3R/
process_dir : ${data.process_dir}/Scan3R/
processor3D : Scan3R3DProcessor
processor2D : Scan3R2DProcessor
processor1D : Scan3R1DProcessor
label_filename : labels.instances.align.annotated.v2.ply
skip_frames : 1

ARKitScenes:
base_dir : /media/sayan/internal/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
skip_frames : 1

MultiScan:
base_dir : /media/sayan/internal/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
skip_frames : 1

modality_info:
1D :
feature_extractor:
Expand Down Expand Up @@ -60,4 +76,4 @@ task:
name : Preprocess
Preprocess :
modality : '2D'
splits : ['val']
splits : ['train', 'val']
16 changes: 15 additions & 1 deletion configs/preprocess/process_3d.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,27 @@ data:
aggre_subfix : _vh_clean.aggregation.json

Scan3R:
base_dir : /drive/datasets/Scan3R/
base_dir : /media/sayan/internal/datasets/Scan3R/
process_dir : ${data.process_dir}/Scan3R/
processor3D : Scan3R3DProcessor
processor2D : Scan3R2DProcessor
processor1D : Scan3R1DProcessor
label_filename : labels.instances.align.annotated.v2.ply

ARKitScenes:
base_dir : /media/sayan/internal/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
MultiScan:
base_dir : /media/sayan/internal/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
skip_frames : 1

modality_info:
1D :
feature_extractor:
Expand Down
20 changes: 19 additions & 1 deletion configs/preprocess/process_multimodal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ data:
avail_modalities : ['point', 'cad', 'rgb', 'referral']

Scan3R:
base_dir : /drive/datasets/Scan3R
base_dir : /media/sayan/internal/datasets/Scan3R
process_dir : ${data.process_dir}/Scan3R
chunked_dir : ${data.process_dir}/Scan3R/objects_chunked/
processor3D : Scan3R3DProcessor
Expand All @@ -28,6 +28,24 @@ data:
skip_frames : 1
avail_modalities : ['point', 'rgb', 'referral']

ARKitScenes:
base_dir : /media/sayan/internal/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
chunked_dir : ${data.process_dir}/ARKitScenes/objects_chunked
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
avail_modalities : ['point', 'rgb', 'referral']

MultiScan:
base_dir : /media/sayan/internal/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan/
chunked_dir : ${data.process_dir}/MultiScan/objects_chunked
processor3D : Scan3R3DProcessor
processor2D : Scan3R2DProcessor
processor1D : Scan3R1DProcessor
avail_modalities : ['point', 'rgb', 'referral']

modality_info:
1D :
feature_extractor:
Expand Down
21 changes: 21 additions & 0 deletions configs/train/train_instance_baseline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,27 @@ data :
max_object_len : 150
voxel_size : 0.02

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
chunked_dir : ${data.process_dir}/ARKitScenes/objects_chunked
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
avail_modalities : ['point', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02
MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan/
chunked_dir : ${data.process_dir}/MultiScan/objects_chunked
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
avail_modalities : ['point', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

task:
name : ObjectLevelGrounding
ObjectLevelGrounding :
Expand Down
Loading