Anon-Artist
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/README.md‎
Lines changed: 44 additions & 38 deletions b/‎TensorFlow2/Segmentation/UNet_Medical/README.md‎
Lines changed: 44 additions & 38 deletions
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/utils/data_loader.py‎ renamed to ‎TensorFlow2/Segmentation/UNet_Medical/data_loading/data_loader.py‎ b/‎TensorFlow2/Segmentation/UNet_Medical/utils/data_loader.py‎ renamed to ‎TensorFlow2/Segmentation/UNet_Medical/data_loading/data_loader.py‎
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_1GPU.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_1GPU.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_8GPU.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_8GPU.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_INFER.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_INFER.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_INFER_BENCHMARK.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_INFER_BENCHMARK.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_INFER_BENCHMARK_TF-AMP.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_INFER_BENCHMARK_TF-AMP.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_INFER_TF-AMP.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_INFER_TF-AMP.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_TF-AMP_1GPU.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_TF-AMP_1GPU.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_TF-AMP_8GPU.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow2/Segmentation/UNet_Medical/examples/unet_TF-AMP_8GPU.sh‎
Lines changed: 1 addition & 1 deletion
@@ -231,20 +231,20 @@ For the specifics concerning training and inference, see the [Advanced](#advance
 
    This script will launch a training on a single fold and store the model’s checkpoint in the <path/to/checkpoint> directory. 
 
-   The script can be run directly by modifying flags if necessary, especially the number of GPUs, which is defined after the `-np` flag. Since the test volume does not have labels, 20% of the training data is used for validation in 5-fold cross-validation manner. The number of fold can be changed using `--crossvalidation_idx` with an integer in range 0-4. For example, to run with 4 GPUs using fold 1 use:
+   The script can be run directly by modifying flags if necessary, especially the number of GPUs, which is defined after the `-np` flag. Since the test volume does not have labels, 20% of the training data is used for validation in 5-fold cross-validation manner. The number of fold can be changed using `--fold` with an integer in range 0-4. For example, to run with 4 GPUs using fold 1 use:
 
    ```bash
-   horovodrun -np 4 python main.py --data_dir /data --model_dir /results --batch_size 1 --exec_mode train --crossvalidation_idx 1 --xla --amp
+   horovodrun -np 4 python main.py --data_dir /data --model_dir /results --batch_size 1 --exec_mode train --fold 1 --xla --amp
    ```
 
    Training will result in a checkpoint file being written to `./results` on the host machine.
 
 6. Start validation/evaluation.
 
-   The trained model can be evaluated by passing the `--exec_mode evaluate` flag. Since evaluation is carried out on a validation dataset, the `--crossvalidation_idx` parameter should be filled. For example:
+   The trained model can be evaluated by passing the `--exec_mode evaluate` flag. Since evaluation is carried out on a validation dataset, the `--fold` parameter should be filled. For example:
 
    ```bash
-   python main.py --data_dir /data --model_dir /results --batch_size 1 --exec_mode evaluate --crossvalidation_idx 0 --xla --amp
+   python main.py --data_dir /data --model_dir /results --batch_size 1 --exec_mode evaluate --fold 0 --xla --amp
    ```
 
    Evaluation can also be triggered jointly after training by passing the `--exec_mode train_and_evaluate` flag.
@@ -291,19 +291,20 @@ Other folders included in the root directory are:
 The complete list of the available parameters for the `main.py` script contains:
 * `--exec_mode`: Select the execution mode to run the model (default: `train`). Modes available:
   * `train` - trains model from scratch.
-  * `evaluate` - loads checkpoint (if available) and performs evaluation on validation subset (requires `--crossvalidation_idx` other than `None`).
-  * `train_and_evaluate` - trains model from scratch and performs validation at the end (requires `--crossvalidation_idx` other than `None`).
+  * `evaluate` - loads checkpoint (if available) and performs evaluation on validation subset (requires `--fold` other than `None`).
+  * `train_and_evaluate` - trains model from scratch and performs validation at the end (requires `--fold` other than `None`).
   * `predict` - loads checkpoint (if available) and runs inference on the test set. Stores the results in `--model_dir` directory.
   * `train_and_predict` - trains model from scratch and performs inference.
 * `--model_dir`: Set the output directory for information related to the model (default: `/results`).
 * `--log_dir`: Set the output directory for logs (default: None).
 * `--data_dir`: Set the input directory containing the dataset (default: `None`).
 * `--batch_size`: Size of each minibatch per GPU (default: `1`).
-* `--crossvalidation_idx`: Selected fold for cross-validation (default: `None`).
+* `--fold`: Selected fold for cross-validation (default: `None`).
 * `--max_steps`: Maximum number of steps (batches) for training (default: `1000`).
 * `--seed`: Set random seed for reproducibility (default: `0`).
 * `--weight_decay`: Weight decay coefficient (default: `0.0005`).
 * `--log_every`: Log performance every n steps (default: `100`).
+* `--evaluate_every`: Evaluate every n steps (default: `0` - evaluate once at the end).
 * `--learning_rate`: Model’s learning rate (default: `0.0001`).
 * `--augment`: Enable data augmentation (default: `False`).
 * `--benchmark`: Enable performance benchmarking (default: `False`). If the flag is set, the script runs in a benchmark mode - each iteration is timed and the performance result (in images per second) is printed at the end. Works for both `train` and `predict` execution modes.
@@ -324,43 +325,48 @@ usage: main.py [-h]
               [--exec_mode {train,train_and_predict,predict,evaluate,train_and_evaluate}]
               [--model_dir MODEL_DIR] --data_dir DATA_DIR [--log_dir LOG_DIR]
               [--batch_size BATCH_SIZE] [--learning_rate LEARNING_RATE]
-              [--crossvalidation_idx CROSSVALIDATION_IDX]
-              [--max_steps MAX_STEPS] [--weight_decay WEIGHT_DECAY]
+              [--fold FOLD] [--max_steps MAX_STEPS]
+              [--evaluate_every EVALUATE_EVERY] [--weight_decay WEIGHT_DECAY]
               [--log_every LOG_EVERY] [--warmup_steps WARMUP_STEPS]
               [--seed SEED] [--augment] [--benchmark]
               [--amp] [--xla]
 
 UNet-medical
 
 optional arguments:
- -h, --help            show this help message and exit
- --exec_mode {train,train_and_predict,predict,evaluate,train_and_evaluate}
-                       Execution mode of running the model
- --model_dir MODEL_DIR
-                       Output directory for information related to the model
- --data_dir DATA_DIR   Input directory containing the dataset for training
-                       the model
- --log_dir LOG_DIR     Output directory for training logs
- --batch_size BATCH_SIZE
-                       Size of each minibatch per GPU
- --learning_rate LEARNING_RATE
-                       Learning rate coefficient for AdamOptimizer
- --crossvalidation_idx CROSSVALIDATION_IDX
-                       Chosen fold for cross-validation. Use None to disable
-                       cross-validation
- --max_steps MAX_STEPS
-                       Maximum number of steps (batches) used for training
- --weight_decay WEIGHT_DECAY
-                       Weight decay coefficient
- --log_every LOG_EVERY
-                       Log performance every n steps
- --warmup_steps WARMUP_STEPS
-                       Number of warmup steps
- --seed SEED           Random seed
- --augment             Perform data augmentation during training
- --benchmark           Collect performance metrics during training
- --amp                 Train using TF-AMP
- --xla                 Train using XLA
+  -h, --help            show this help message and exit
+  --exec_mode {train,train_and_predict,predict,evaluate,train_and_evaluate}
+                        Execution mode of running the model
+  --model_dir MODEL_DIR
+                        Output directory for information related to the model
+  --data_dir DATA_DIR   Input directory containing the dataset for training
+                        the model
+  --log_dir LOG_DIR     Output directory for training logs
+  --batch_size BATCH_SIZE
+                        Size of each minibatch per GPU
+  --learning_rate LEARNING_RATE
+                        Learning rate coefficient for AdamOptimizer
+  --fold FOLD           Chosen fold for cross-validation. Use None to disable
+                        cross-validation
+  --max_steps MAX_STEPS
+                        Maximum number of steps (batches) used for training
+  --weight_decay WEIGHT_DECAY
+                        Weight decay coefficient
+  --log_every LOG_EVERY
+                        Log performance every n steps
+  --evaluate_every EVALUATE_EVERY
+                        Evaluate every n steps
+  --warmup_steps WARMUP_STEPS
+                        Number of warmup steps
+  --seed SEED           Random seed
+  --augment             Perform data augmentation during training
+  --no-augment
+  --benchmark           Collect performance metrics during training
+  --no-benchmark
+  --use_amp, --amp      Train using TF-AMP
+  --use_xla, --xla      Train using XLA
+  --use_trt             Use TF-TRT
+  --resume_training     Resume training from a checkpoint
 ```
 
 
@@ -420,7 +426,7 @@ horovodrun -np <number/of/gpus> python main.py --data_dir /data [other parameter
 The main result of the training are checkpoints stored by default in `./results/` on the host machine, and in the `/results` in the container. This location can be controlled
 by the `--model_dir` command-line argument, if a different location was mounted while starting the container. In the case when the training is run in `train_and_predict` mode, the inference will take place after the training is finished, and inference results will be stored to the `/results` directory.
 
-If the `--exec_mode train_and_evaluate` parameter was used, and if `--crossvalidation_idx` parameter is set to an integer value of {0, 1, 2, 3, 4}, the evaluation of the validation set takes place after the training is completed. The results of the evaluation will be printed to the console.
+If the `--exec_mode train_and_evaluate` parameter was used, and if `--fold` parameter is set to an integer value of {0, 1, 2, 3, 4}, the evaluation of the validation set takes place after the training is completed. The results of the evaluation will be printed to the console.
 
 ### Inference process
 
 
@@ -15,4 +15,4 @@
 # This script launches U-Net run in FP32 on 1 GPU and trains for 6400 iterations with batch_size 8. Usage:
 # bash unet_FP32_1GPU.sh <path to dataset> <path to results directory>
 
-horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --log_every 100 --max_steps 6400 --batch_size 8 --exec_mode train_and_evaluate --crossvalidation_idx 0 --augment --xla --log_dir $2/log.json
+horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --log_every 100 --max_steps 6400 --batch_size 8 --exec_mode train_and_evaluate --fold 0 --augment --xla --log_dir $2/log.json
@@ -15,4 +15,4 @@
 # This script launches U-Net run in FP32 on 8 GPUs and trains for 6400 iterations with batch_size 8. Usage:
 # bash unet_FP32_8GPU.sh <path to dataset> <path to results directory>
 
-horovodrun -np 8 python main.py --data_dir $1 --model_dir $2 --log_every 100 --max_steps 6400 --batch_size 8 --exec_mode train_and_evaluate --crossvalidation_idx 0 --augment --xla --log_dir $2/log.json
+horovodrun -np 8 python main.py --data_dir $1 --model_dir $2 --log_every 100 --max_steps 6400 --batch_size 8 --exec_mode train_and_evaluate --fold 0 --augment --xla --log_dir $2/log.json
@@ -15,4 +15,4 @@
 # This script launches U-Net run in FP32 on 1 GPU for inference batch_size 1. Usage:
 # bash unet_INFER_FP32.sh <path to this repository> <path to dataset> <path to results directory>
 
-horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --batch_size 1 --exec_mode predict --xla
+horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --batch_size 1 --exec_mode predict --xla --fold 0
@@ -15,4 +15,4 @@
 # This script launches U-Net run in FP32 on 1 GPU for inference benchmarking. Usage:
 # bash unet_INFER_BENCHMARK_FP32.sh <path to dataset> <path to results directory> <batch size>
 
-horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --batch_size $3 --exec_mode predict --benchmark --warmup_steps 200 --max_steps 600 --xla
+horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --batch_size $3 --exec_mode predict --benchmark --warmup_steps 200 --max_steps 600 --xla --fold 0
@@ -15,4 +15,4 @@
 # This script launches U-Net run in FP16 on 1 GPU for inference benchmarking. Usage:
 # bash unet_INFER_BENCHMARK_TF-AMP.sh <path to dataset> <path to results directory> <batch size>
 
-horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --batch_size $3 --exec_mode predict --benchmark --warmup_steps 200 --max_steps 600 --xla --amp
+horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --batch_size $3 --exec_mode predict --benchmark --warmup_steps 200 --max_steps 600 --xla --amp --fold 0
@@ -15,4 +15,4 @@
 # This script launches U-Net run in FP16 on 1 GPU for inference batch_size 1. Usage:
 # bash unet_INFER_TF-AMP.sh <path to dataset> <path to results directory>
 
-horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --batch_size 1 --exec_mode predict --xla --amp
+horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --batch_size 1 --exec_mode predict --xla --amp --fold 0
@@ -15,4 +15,4 @@
 # This script launches U-Net run in FP16 on 1 GPU and trains for 6400 iterations batch_size 8. Usage:
 # bash unet_TF-AMP_1GPU.sh <path to dataset> <path to results directory>
 
-horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --log_every 100 --max_steps 6400 --batch_size 8 --exec_mode train_and_evaluate --crossvalidation_idx 0 --augment --xla --amp --log_dir $2/log.json
+horovodrun -np 1 python main.py --data_dir $1 --model_dir $2 --log_every 100 --max_steps 6400 --batch_size 8 --exec_mode train_and_evaluate --fold 0 --augment --xla --amp --log_dir $2/log.json
@@ -15,4 +15,4 @@
 # This script launches U-Net run in FP16 on 8 GPUs and trains for 6400 iterations batch_size 8. Usage:
 # bash unet_TF-AMP_8GPU.sh <path to dataset> <path to results directory>
 
-horovodrun -np 8 python main.py --data_dir $1 --model_dir $2 --log_every 100 --max_steps 6400 --batch_size 8 --exec_mode train_and_evaluate --crossvalidation_idx 0 --augment --xla --amp --log_dir $2/log.json
+horovodrun -np 8 python main.py --data_dir $1 --model_dir $2 --log_every 100 --max_steps 6400 --batch_size 8 --exec_mode train_and_evaluate --fold 0 --augment --xla --amp --log_dir $2/log.json