Skip to content

RAM usage continues to grow and the training process stopped without error !!! #103

Closed
@JinYAnGHe

Description

During training, RAM usage continues to grow. Finaily, the training process stopped. It is a bug?

2021-07-23 14:46:56 | INFO | yolox.core.trainer:237 - epoch: 9/100, iter: 2000/3905, mem: 1730Mb, iter_time: 0.165s, data_time: 0.001s, total_loss: 3.3, iou_loss: 1.7, l1_loss: 0.0, conf_loss: 1.2, cls_loss: 0.5, lr: 9.955e-03, size: 320, ETA: 16:27:19
2021-07-23 14:47:04 | INFO | yolox.core.trainer:237 - epoch: 9/100, iter: 2050/3905, mem: 1730Mb, iter_time: 0.166s, data_time: 0.001s, total_loss: 3.0, iou_loss: 1.8, l1_loss: 0.0, conf_loss: 0.7, cls_loss: 0.5, lr: 9.955e-03, size: 320, ETA: 16:27:11
2021-07-23 14:47:13 | INFO | yolox.core.trainer:237 - epoch: 9/100, iter: 2100/3905, mem: 1730Mb, iter_time: 0.165s, data_time: 0.001s, total_loss: 2.9, iou_loss: 1.8, l1_loss: 0.0, conf_loss: 0.6, cls_loss: 0.5, lr: 9.954e-03, size: 320, ETA: 16:27:02
2021-07-23 14:47:21 | INFO | yolox.core.trainer:237 - epoch: 9/100, iter: 2150/3905, mem: 1730Mb, iter_time: 0.166s, data_time: 0.001s, total_loss: 3.3, iou_loss: 1.9, l1_loss: 0.0, conf_loss: 1.0, cls_loss: 0.5, lr: 9.954e-03, size: 320, ETA: 16:26:54
2021-07-23 14:47:30 | INFO | yolox.core.trainer:237 - epoch: 9/100, iter: 2200/3905, mem: 1730Mb, iter_time: 0.177s, data_time: 0.002s, total_loss: 2.3, iou_loss: 1.3, l1_loss: 0.0, conf_loss: 0.6, cls_loss: 0.4, lr: 9.954e-03, size: 320, ETA: 16:26:51
2021-07-23 14:47:38 | INFO | yolox.core.trainer:237 - epoch: 9/100, iter: 2250/3905, mem: 1730Mb, iter_time: 0.166s, data_time: 0.001s, total_loss: 2.9, iou_loss: 1.7, l1_loss: 0.0, conf_loss: 0.7, cls_loss: 0.5, lr: 9.953e-03, size: 320, ETA: 16:26:43
2021-07-23 14:47:47 | INFO | yolox.core.trainer:237 - epoch: 9/100, iter: 2300/3905, mem: 1730Mb, iter_time: 0.168s, data_time: 0.001s, total_loss: 2.4, iou_loss: 1.5, l1_loss: 0.0, conf_loss: 0.5, cls_loss: 0.4, lr: 9.953e-03, size: 320, ETA: 16:26:36
------------------------stopped here--------------------

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... On | 00000000:3D:00.0 Off | N/A |
| 28% 50C P2 109W / 250W | 2050MiB / 11019MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... On | 00000000:3E:00.0 Off | N/A |
| 27% 49C P2 103W / 250W | 2086MiB / 11019MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... On | 00000000:41:00.0 Off | N/A |
| 25% 48C P2 117W / 250W | 2086MiB / 11019MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208... On | 00000000:42:00.0 Off | N/A |
| 28% 50C P2 113W / 250W | 2086MiB / 11019MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce RTX 208... On | 00000000:44:00.0 Off | N/A |
| 16% 27C P8 21W / 250W | 11MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce RTX 208... On | 00000000:45:00.0 Off | N/A |
| 28% 50C P2 110W / 250W | 2086MiB / 11019MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce RTX 208... On | 00000000:46:00.0 Off | N/A |
| 24% 47C P2 95W / 250W | 2086MiB / 11019MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce RTX 208... On | 00000000:47:00.0 Off | N/A |
| 26% 49C P2 99W / 250W | 2086MiB / 11019MiB | 100% Default |
+-------------------------------+----------------------+----------------------+

是RAM持续增长,然后溢出,导致程序停止?

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions