Skip to content

Commit

Permalink
Ensure delayed evaluation of SLURM_PROCID
Browse files Browse the repository at this point in the history
  • Loading branch information
minsik-ai committed Nov 12, 2024
1 parent e59fd97 commit 4580b8f
Show file tree
Hide file tree
Showing 5 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion scripts/training/train_embonly.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ LAUNCHER="accelerate launch \
--main_process_ip "$MASTER_ADDR" \
--main_process_port $MASTER_PORT \
--num_processes $WORLD_SIZE \
--machine_rank $SLURM_PROCID \
--machine_rank \$SLURM_PROCID \
--role $SLURMD_NODENAME: \
--rdzv_conf rdzv_backend=c10d \
--max_restarts 0 \
Expand Down
2 changes: 1 addition & 1 deletion scripts/training/train_genonly.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ LAUNCHER="accelerate launch \
--main_process_ip "$MASTER_ADDR" \
--main_process_port $MASTER_PORT \
--num_processes $WORLD_SIZE \
--machine_rank $SLURM_PROCID \
--machine_rank \$SLURM_PROCID \
--role $SLURMD_NODENAME: \
--rdzv_conf rdzv_backend=c10d \
--max_restarts 0 \
Expand Down
2 changes: 1 addition & 1 deletion scripts/training/train_gritlm_7b.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ LAUNCHER="accelerate launch \
--main_process_ip "$MASTER_ADDR" \
--main_process_port $MASTER_PORT \
--num_processes $WORLD_SIZE \
--machine_rank $SLURM_PROCID \
--machine_rank \$SLURM_PROCID \
--role $SLURMD_NODENAME: \
--rdzv_conf rdzv_backend=c10d \
--max_restarts 0 \
Expand Down
2 changes: 1 addition & 1 deletion scripts/training/train_gritlm_8x7b.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ LAUNCHER="accelerate launch \
--main_process_ip "$MASTER_ADDR" \
--main_process_port $MASTER_PORT \
--num_processes $WORLD_SIZE \
--machine_rank $SLURM_PROCID \
--machine_rank \$SLURM_PROCID \
--role $SLURMD_NODENAME: \
--rdzv_conf rdzv_backend=c10d \
--max_restarts 0 \
Expand Down
2 changes: 1 addition & 1 deletion scripts/training/train_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ LAUNCHER="accelerate launch \
--main_process_ip "$MASTER_ADDR" \
--main_process_port $MASTER_PORT \
--num_processes $WORLD_SIZE \
--machine_rank $SLURM_PROCID \
--machine_rank \$SLURM_PROCID \
--role $SLURMD_NODENAME: \
--rdzv_conf rdzv_backend=c10d \
--max_restarts 0 \
Expand Down

0 comments on commit 4580b8f

Please sign in to comment.