Closed
Description
When running models in benchmark/fluid using multiple GPUs, there is no response and the job is finally killed after a long time.
The full logs are as follows (the example here uses the mnist model, but the other models perform the same as mnist when using multiple GPUs):
$ python fluid_benchmark.py --model mnist --device GPU --gpus 2
----------- Configuration Arguments -----------
batch_size: 32
cpus: 1
data_format: NCHW
data_path:
data_set: flowers
device: GPU
gpus: 2
infer_only: False
iterations: 80
learning_rate: 0.001
memory_optimize: False
model: mnist
no_test: False
pass_num: 100
profile: False
skip_batch_num: 5
update_method: local
use_cprof: False
use_fake_data: False
use_nvprof: False
use_reader_op: False
------------------------------------------------
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/home/docker/runtime/overlay2/l/E37ZPPONYOSMCAEWBUTECLE7XH:/home/docker/runtime/overlay2/l/J44XLEFYM66NOFIC5IPPYM3K4B:/home/docker/runtime/overlay2/l/TD5AIZOAV4HDVDBHIYLYA5MRUL:/home/docker/runtime/overlay2/l/UYA3MLQG6SXOENF2VLWCELNMDP:/home/docker/runtime/overlay2/l/KLJNLEIE7ROJMKKQ47RAMGYCSN:/home/docker/runtime/overlay2/l/IZWN5DWNX4XJFYXEWLIXIFIKRZ:/home/docker/runtime/overlay2/l/26FH2HFFZ3E4KCBZ3LVABHDWMJ:/home/docker/runtime/overlay2/l/2MYKEYWTMFTEVD3VQGTHHGBQFX:'
Unexpected end of /proc/mounts line `/home/docker/runtime/overlay2/l/B3HS2GRKDXV2S54B77Y6OSRQQT:/home/docker/runtime/overlay2/l/RY7PSMDPDYS3Z2E6WGZXPT3PDA:/home/docker/runtime/overlay2/l/52PISTXM4OEKVDASJATIGRYKM6:/home/docker/runtime/overlay2/l/NVN7MSVHOTD46R6UB25AEAQYTH:/home/docker/runtime/overlay2/l/OEBXDOGRX6SV7AM5C6X6O3KZFA:/home/docker/runtime/overlay2/l/4RX22CUHDFVPR5BSJBMBCCXUPA:/home/docker/runtime/overlay2/l/UMY2SDMX3YOD4QCKGP7YV6M3XY:/home/docker/runtime/overlay2/l/LPAI2GCE2P6RKBPM6EMIOVQJQP:/home/docker/runtime/overlay2/l/T2DEZFB'
Unexpected end of /proc/mounts line `EAEYYE42XHYJPDEWUY2:/home/docker/runtime/overlay2/l/QUPTGODCA3UK265SVJDLOMHEA6:/home/docker/runtime/overlay2/l/A4PCMPPJRVCTFSKBRTQFFISCWN:/home/docker/runtime/overlay2/l/4UYJNH3ECSDCBKLBBLQPSGZES7:/home/docker/runtime/overlay2/l/FBHGT3GWMQ662T7M4GVHVGX6WC:/home/docker/runtime/overlay2/l/E3774UASMYNWEP56UJBTWIOQU3:/home/docker/runtime/overlay2/l/NKKTOWHYC5Q33FMISWOG2MXL76:/home/docker/runtime/overlay2/l/UPENBO6KPQAN36JVVJFJK26F5D:/home/docker/runtime/overlay2/l/JVOKLXJMTKGL3XFQAQ72QNFCFX:/home/docker/runtim'
Unexpected end of /proc/mounts line `e/overlay2/l/GGT2RDYNJYE2O44ZK4UXAUML4D:/home/docker/runtime/overlay2/l/ILDUOQZ4IBTPDC4GSE4XM52WAJ:/home/docker/runtime/overlay2/l/PANZPZDC65B7QHH4DLJVCJCXRF:/home/docker/runtime/overlay2/l/PEA7W6TUXBKYBTBBRWRUMA5SLL:/home/docker/runtime/overlay2/l/WVM37NIKDKQSYRICDKVWF24XRC:/home/docker/runtime/overlay2/l/SXQLH7XIGNOV4B4GZDU2TEXY6Q:/home/docker/runtime/overlay2/l/3PP46YBKQS2WYGYKDQJ66CIJ3J:/home/docker/runtime/overlay2/l/6VG4GBX4DQKY43QNUESKGZNETD:/home/docker/runtime/overlay2/l/I5M2XMBTVKBZZIVQVDQ2AANHLU'
After a long time, the job is killed automatically.
However, the models work well when using CPU or only one GPU. The tests are running on docker container.
Metadata
Metadata
Assignees
Labels
No labels