Skip to content

No response when running models in benchmark/fluid using multiple GPUs #11360

Closed
@sneaxiy

Description

@sneaxiy

When running models in benchmark/fluid using multiple GPUs, there is no response and the job is finally killed after a long time.

The full logs are as follows (the example here uses the mnist model, but the other models perform the same as mnist when using multiple GPUs):

$ python fluid_benchmark.py --model mnist --device GPU --gpus 2
----------- Configuration Arguments -----------
batch_size: 32
cpus: 1
data_format: NCHW
data_path: 
data_set: flowers
device: GPU
gpus: 2
infer_only: False
iterations: 80
learning_rate: 0.001
memory_optimize: False
model: mnist
no_test: False
pass_num: 100
profile: False
skip_batch_num: 5
update_method: local
use_cprof: False
use_fake_data: False
use_nvprof: False
use_reader_op: False
------------------------------------------------
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/home/docker/runtime/overlay2/l/E37ZPPONYOSMCAEWBUTECLE7XH:/home/docker/runtime/overlay2/l/J44XLEFYM66NOFIC5IPPYM3K4B:/home/docker/runtime/overlay2/l/TD5AIZOAV4HDVDBHIYLYA5MRUL:/home/docker/runtime/overlay2/l/UYA3MLQG6SXOENF2VLWCELNMDP:/home/docker/runtime/overlay2/l/KLJNLEIE7ROJMKKQ47RAMGYCSN:/home/docker/runtime/overlay2/l/IZWN5DWNX4XJFYXEWLIXIFIKRZ:/home/docker/runtime/overlay2/l/26FH2HFFZ3E4KCBZ3LVABHDWMJ:/home/docker/runtime/overlay2/l/2MYKEYWTMFTEVD3VQGTHHGBQFX:'
Unexpected end of /proc/mounts line `/home/docker/runtime/overlay2/l/B3HS2GRKDXV2S54B77Y6OSRQQT:/home/docker/runtime/overlay2/l/RY7PSMDPDYS3Z2E6WGZXPT3PDA:/home/docker/runtime/overlay2/l/52PISTXM4OEKVDASJATIGRYKM6:/home/docker/runtime/overlay2/l/NVN7MSVHOTD46R6UB25AEAQYTH:/home/docker/runtime/overlay2/l/OEBXDOGRX6SV7AM5C6X6O3KZFA:/home/docker/runtime/overlay2/l/4RX22CUHDFVPR5BSJBMBCCXUPA:/home/docker/runtime/overlay2/l/UMY2SDMX3YOD4QCKGP7YV6M3XY:/home/docker/runtime/overlay2/l/LPAI2GCE2P6RKBPM6EMIOVQJQP:/home/docker/runtime/overlay2/l/T2DEZFB'
Unexpected end of /proc/mounts line `EAEYYE42XHYJPDEWUY2:/home/docker/runtime/overlay2/l/QUPTGODCA3UK265SVJDLOMHEA6:/home/docker/runtime/overlay2/l/A4PCMPPJRVCTFSKBRTQFFISCWN:/home/docker/runtime/overlay2/l/4UYJNH3ECSDCBKLBBLQPSGZES7:/home/docker/runtime/overlay2/l/FBHGT3GWMQ662T7M4GVHVGX6WC:/home/docker/runtime/overlay2/l/E3774UASMYNWEP56UJBTWIOQU3:/home/docker/runtime/overlay2/l/NKKTOWHYC5Q33FMISWOG2MXL76:/home/docker/runtime/overlay2/l/UPENBO6KPQAN36JVVJFJK26F5D:/home/docker/runtime/overlay2/l/JVOKLXJMTKGL3XFQAQ72QNFCFX:/home/docker/runtim'
Unexpected end of /proc/mounts line `e/overlay2/l/GGT2RDYNJYE2O44ZK4UXAUML4D:/home/docker/runtime/overlay2/l/ILDUOQZ4IBTPDC4GSE4XM52WAJ:/home/docker/runtime/overlay2/l/PANZPZDC65B7QHH4DLJVCJCXRF:/home/docker/runtime/overlay2/l/PEA7W6TUXBKYBTBBRWRUMA5SLL:/home/docker/runtime/overlay2/l/WVM37NIKDKQSYRICDKVWF24XRC:/home/docker/runtime/overlay2/l/SXQLH7XIGNOV4B4GZDU2TEXY6Q:/home/docker/runtime/overlay2/l/3PP46YBKQS2WYGYKDQJ66CIJ3J:/home/docker/runtime/overlay2/l/6VG4GBX4DQKY43QNUESKGZNETD:/home/docker/runtime/overlay2/l/I5M2XMBTVKBZZIVQVDQ2AANHLU'

After a long time, the job is killed automatically.

However, the models work well when using CPU or only one GPU. The tests are running on docker container.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions