Skip to content

Fix Windows crash: panoptic_fpn.py passing int32 instead of long #5455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dzenanz
Copy link

@dzenanz dzenanz commented Apr 2, 2025

Stack trace:

[04/02 13:07:05 d2.data.build]: Distribution of instances among all 4 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|     G      | 25           |     SG     | 8            |     T      | 746          |
|     A      | 58           |            |              |            |              |
|   total    | 837          |            |              |            |              |
[04/02 13:07:05 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[04/02 13:07:05 d2.data.common]: Serializing 40 elements to byte tensors and concatenating them all ...
[04/02 13:07:05 d2.data.common]: Serialized dataset takes 8.95 MiB
check and see
[04/02 13:07:05 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from M:/Histo/Work/model_0214999.pth ...
[04/02 13:07:05 d2.engine.train_loop]: Starting training from iteration 0
ERROR [04/02 13:07:42 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "M:\Histo\.venv38\lib\site-packages\detectron2\engine\train_loop.py", line 155, in train
    self.run_step()
  File "M:\Histo\.venv38\lib\site-packages\detectron2\engine\defaults.py", line 530, in run_step
    self._trainer.run_step()
  File "M:\Histo\.venv38\lib\site-packages\detectron2\engine\train_loop.py", line 310, in run_step
    loss_dict = self.model(data)
  File "M:\Histo\.venv38\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "M:\Histo\.venv38\lib\site-packages\detectron2\modeling\meta_arch\panoptic_fpn.py", line 127, in forward
    sem_seg_results, sem_seg_losses = self.sem_seg_head(features, gt_sem_seg)
  File "M:\Histo\.venv38\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "M:\Histo\.venv38\lib\site-packages\detectron2\modeling\meta_arch\semantic_seg.py", line 239, in forward
    return None, self.losses(x, targets)
  File "M:\Histo\.venv38\lib\site-packages\detectron2\modeling\meta_arch\semantic_seg.py", line 263, in losses
    loss = F.cross_entropy(
  File "M:\Histo\.venv38\lib\site-packages\torch\nn\functional.py", line 2846, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: expected scalar type Long but found Int

dev/linter.sh only makes unrelated changes.

dzenanz added 2 commits April 2, 2025 13:10
Stack trace:

[04/02 13:07:05 d2.data.build]: Distribution of instances among all 4 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|     G      | 25           |     SG     | 8            |     T      | 746          |
|     A      | 58           |            |              |            |              |
|   total    | 837          |            |              |            |              |
[04/02 13:07:05 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[04/02 13:07:05 d2.data.common]: Serializing 40 elements to byte tensors and concatenating them all ...
[04/02 13:07:05 d2.data.common]: Serialized dataset takes 8.95 MiB
check and see
[04/02 13:07:05 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from M:/Histo/Work/model_0214999.pth ...
[04/02 13:07:05 d2.engine.train_loop]: Starting training from iteration 0
ERROR [04/02 13:07:42 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "M:\Histo\.venv38\lib\site-packages\detectron2\engine\train_loop.py", line 155, in train
    self.run_step()
  File "M:\Histo\.venv38\lib\site-packages\detectron2\engine\defaults.py", line 530, in run_step
    self._trainer.run_step()
  File "M:\Histo\.venv38\lib\site-packages\detectron2\engine\train_loop.py", line 310, in run_step
    loss_dict = self.model(data)
  File "M:\Histo\.venv38\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "M:\Histo\.venv38\lib\site-packages\detectron2\modeling\meta_arch\panoptic_fpn.py", line 127, in forward
    sem_seg_results, sem_seg_losses = self.sem_seg_head(features, gt_sem_seg)
  File "M:\Histo\.venv38\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "M:\Histo\.venv38\lib\site-packages\detectron2\modeling\meta_arch\semantic_seg.py", line 239, in forward
    return None, self.losses(x, targets)
  File "M:\Histo\.venv38\lib\site-packages\detectron2\modeling\meta_arch\semantic_seg.py", line 263, in losses
    loss = F.cross_entropy(
  File "M:\Histo\.venv38\lib\site-packages\torch\nn\functional.py", line 2846, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: expected scalar type Long but found Int
Addressing:

(.venv38) M:\Histo\detectron2>bash
dzenan@Ryzenator:/mnt/m/Histo/detectron2$ dev/linter.sh
dev/linter.sh: line 3: $'\r': command not found
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2025
@dzenanz
Copy link
Author

dzenanz commented Apr 2, 2025

This affects both version 0.6 and current main branch. I tried it with torch 1.10.1 with CUDA 11.3, as well as torch 2.6.0 with CUDA 12.4.

dzenanz added a commit to dzenanz/Multi-Compartment-Segmentation that referenced this pull request Apr 2, 2025
They were invalid. The problem was in detectron2, fixed by:
facebookresearch/detectron2#5455

With this fix, both the old and versions of libraries work.
dzenanz added a commit to dzenanz/Multi-Compartment-Segmentation that referenced this pull request Apr 3, 2025
They were invalid. The problem was in detectron2, fixed by:
facebookresearch/detectron2#5455

With this fix, both the old and versions of libraries work.
dzenanz added a commit to dzenanz/Multi-Compartment-Segmentation that referenced this pull request Apr 3, 2025
They were invalid. The problem was in detectron2, fixed by:
facebookresearch/detectron2#5455

With this fix, both the old and versions of libraries work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants