@@ -214,7 +214,7 @@ After the Docker container is launched, the training with the default hyperparam
214214
215215``` bash
216216./prepare_dataset.sh
217- python -m torch.distributed.launch --nproc_per_node=8 ncf.py --data /data/cache/ml-20m
217+ python -m torch.distributed.launch --nproc_per_node=8 --use_env ncf.py --data /data/cache/ml-20m
218218```
219219
220220This will result in a checkpoint file being written to ` /data/checkpoints/model.pth ` .
@@ -225,7 +225,7 @@ This will result in a checkpoint file being written to `/data/checkpoints/model.
225225The trained model can be evaluated by passing the ` --mode ` test flag to the ` run.sh ` script:
226226
227227``` bash
228- python -m torch.distributed.launch --nproc_per_node=1 ncf.py --data /data/cache/ml-20m --mode test --load_checkpoint_path /data/checkpoints/model.pth
228+ python -m torch.distributed.launch --nproc_per_node=1 --use_env ncf.py --data /data/cache/ml-20m --mode test --load_checkpoint_path /data/checkpoints/model.pth
229229```
230230
231231
@@ -330,13 +330,13 @@ For a smaller dataset you might experience slower performance.
330330To download, preprocess and train on the ML-1m dataset run:
331331``` bash
332332./prepare_dataset.sh ml-1m
333- python -m torch.distributed.launch --nproc_per_node=8 ncf.py --data /data/cache/ml-1m
333+ python -m torch.distributed.launch --nproc_per_node=8 --use_env ncf.py --data /data/cache/ml-1m
334334```
335335
336336### Training process
337337The name of the training script is ` ncf.py ` . Because of the multi-GPU support, it should always be run with the torch distributed launcher like this:
338338``` bash
339- python -m torch.distributed.launch --nproc_per_node=< number_of_gpus> ncf.py --data < path_to_dataset> [other_parameters]
339+ python -m torch.distributed.launch --nproc_per_node=< number_of_gpus> --use_env ncf.py --data < path_to_dataset> [other_parameters]
340340```
341341
342342The main result of the training are checkpoints stored by default in ` /data/checkpoints/ ` . This location can be controlled
@@ -351,7 +351,7 @@ The HR@10 metric is the number of hits in the entire test set divided by the num
351351
352352Inference can be launched with the same script used for training by passing the ` --mode test ` flag:
353353``` bash
354- python -m torch.distributed.launch --nproc_per_node=< number_of_gpus> ncf.py --data < path_to_dataset> --mode test [other_parameters]
354+ python -m torch.distributed.launch --nproc_per_node=< number_of_gpus> --use_env ncf.py --data < path_to_dataset> --mode test [other_parameters]
355355```
356356
357357The script will then:
@@ -368,7 +368,7 @@ The script will then:
368368NCF training on NVIDIA DGX systems is very fast, therefore, in order to measure train and validation throughput, you can simply run the full training job with:
369369``` bash
370370./prepare_dataset.sh
371- python -m torch.distributed.launch --nproc_per_node=8 ncf.py --data /data/cache/ml-20m --epochs 5
371+ python -m torch.distributed.launch --nproc_per_node=8 --use_env ncf.py --data /data/cache/ml-20m --epochs 5
372372```
373373
374374At the end of the script, a line reporting the best train throughput is printed.
@@ -379,7 +379,7 @@ At the end of the script, a line reporting the best train throughput is printed.
379379Validation throughput can be measured by running the full training job with:
380380``` bash
381381./prepare_dataset.sh
382- python -m torch.distributed.launch --nproc_per_node=8 ncf.py --data /data/cache/ml-20m --epochs 5
382+ python -m torch.distributed.launch --nproc_per_node=8 --use_env ncf.py --data /data/cache/ml-20m --epochs 5
383383```
384384
385385The best validation throughput is reported to the standard output.
@@ -405,7 +405,7 @@ The training time was measured excluding data downloading, preprocessing, valida
405405To reproduce this result, start the NCF Docker container interactively and run:
406406``` bash
407407./prepare_dataset.sh
408- python -m torch.distributed.launch --nproc_per_node=8 ncf.py --data /data/cache/ml-20m
408+ python -m torch.distributed.launch --nproc_per_node=8 --use_env ncf.py --data /data/cache/ml-20m
409409```
410410
411411##### NVIDIA DGX-1 (8x V100 32G)
@@ -428,7 +428,7 @@ Here's an example validation accuracy curve for mixed precision vs single precis
428428To reproduce this result, start the NCF Docker container interactively and run:
429429``` bash
430430./prepare_dataset.sh
431- python -m torch.distributed.launch --nproc_per_node=8 ncf.py --data /data/cache/ml-20m
431+ python -m torch.distributed.launch --nproc_per_node=8 --use_env ncf.py --data /data/cache/ml-20m
432432```
433433
434434##### NVIDIA DGX-2 (16x V100 32G)
@@ -449,7 +449,7 @@ The training time was measured excluding data downloading, preprocessing, valida
449449To reproduce this result, start the NCF Docker container interactively and run:
450450``` bash
451451./prepare_dataset.sh
452- python -m torch.distributed.launch --nproc_per_node=16 ncf.py --data /data/cache/ml-20m
452+ python -m torch.distributed.launch --nproc_per_node=16 --use_env ncf.py --data /data/cache/ml-20m
453453```
454454
455455
@@ -555,7 +555,8 @@ The following table shows the best inference throughput:
5555554 . September, 2019
556556 * Adjusting for API changes in PyTorch and APEX
557557 * Checkpoints loading fix
558-
558+ 5 . January, 2020
559+ * DLLogger support added
559560
560561### Known issues
561562
0 commit comments