Trouble using `.pkl` S3DIS data to train PVCNN + Semantic Segment ZED 2i .ply

I'm trying to train the PVCNN model using `.pkl` files (preprocessed from S3DIS data) and later perform semantic segmentation on `.ply` point clouds captured by a ZED 2i camera. However, the original `prepare_data.py` script fails, and the training pipeline expects `.npy` or `.txt` formats rather than `.pkl`.

### What I Did:

#### Preprocessed `.pkl` Available:

I confirmed the following structure in my data:

```python
with open('path/to/data.pkl', 'rb') as f:
    data = pickle.load(f)

# Sample output:
# coord: (N, 3)
# color: (N, 3)
# semantic_gt: (N, 1)
# instance_gt: (N, 1)
# normal: (N, 3)
```

#### Running `prepare_data.py` gave:

```bash
FileNotFoundError: .../Area_1/office_7/xyzrgb.npy
```

Because `.npy` files are expected, but my data is already processed and in `.pkl`.

#### Accidentally shadowed `torch` module:

I had a local file named `torch.py`, which led to:

```python
ModuleNotFoundError: No module named 'torch.utils'; 'torch' is not a package
```

Fixed by renaming the file and clearing `__pycache__`.

#### Next:

I tried importing a `s3dis_pkl_dataset` class (not in repo), and got:

```bash
ModuleNotFoundError: No module named 's3dis_pkl_dataset'
```

### Solution I Built:

Created a custom dataset class:

```python
class S3DISPKLDataset(Dataset):
    ...
    def __getitem__(self, idx):
        with open(self.data_paths[idx], 'rb') as f:
            data = pickle.load(f)
        ...
        return {
            'coord': coord,
            'color': color,
            'semantic_gt': semantic_gt
        }
```

This allowed me to correctly load `.pkl` files and test `DataLoader` successfully.

### Request for Maintainers:

1. **Provide built-in support or documentation** for using `.pkl` files as input.
2. Consider refactoring `prepare_data.py` to allow choosing between `.txt`, `.npy`, or `.pkl`.
3. Add a safeguard in README against naming files `torch.py` (common mistake for new users).
4. Confirm best practices for plugging in external `.ply` files (e.g. ZED 2i output) for inference.

### Environment

* OS: Ubuntu 22.04
* Python: 3.13 (Miniconda)
* Torch: 2.7.1 (CUDA 12.6)
* PVCNN repo: latest from GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trouble using `.pkl` S3DIS data to train PVCNN + Semantic Segment ZED 2i .ply #254

What I Did:

Preprocessed `.pkl` Available:

Running `prepare_data.py` gave:

Accidentally shadowed `torch` module:

Next:

Solution I Built:

Request for Maintainers:

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Trouble using .pkl S3DIS data to train PVCNN + Semantic Segment ZED 2i .ply #254

Description

What I Did:

Preprocessed .pkl Available:

Running prepare_data.py gave:

Accidentally shadowed torch module:

Next:

Solution I Built:

Request for Maintainers:

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Trouble using `.pkl` S3DIS data to train PVCNN + Semantic Segment ZED 2i .ply #254

Preprocessed `.pkl` Available:

Running `prepare_data.py` gave:

Accidentally shadowed `torch` module: