Skip to content

Commit 4fe1b3e

Browse files
Add Workaround for Megascale Num Slices Issue on PW (#375)
* Support custom Pathways args. (#364) * Remove Pathways specific args from workload create flow. (#365) * Remove Pathways specific optional args from workload create. * Deprecate --use-pathways from workload create, set it in create-pathways. * Remove --enable-pathways from cluster create flow. * Adding --use-pathways and --enable-pathways args - they are used to determine Pathways flows. * Add Workaround for Megascale Num Slices Issue on PW --------- Co-authored-by: Roshani Narasimhan <roshanin@google.com>
1 parent 81004f7 commit 4fe1b3e

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

src/xpk/commands/workload.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,28 @@
298298
- mountPath: /tmp
299299
name: shared-tmp
300300
{storage_volume_mounts}
301+
env:
302+
# Workaround for v6e
303+
- name: MEGASCALE_GRPC_ENABLE_XOR_TRACER
304+
value: "false"
305+
- name: MEGASCALE_NUM_SLICES
306+
valueFrom:
307+
fieldRef:
308+
fieldPath: "metadata.labels['jobset.sigs.k8s.io/replicatedjob-replicas']"
309+
- name: JOBSET_NAME
310+
valueFrom:
311+
fieldRef:
312+
fieldPath: metadata.annotations['jobset.sigs.k8s.io/jobset-name']
313+
- name: REPLICATED_JOB_NAME
314+
valueFrom:
315+
fieldRef:
316+
fieldPath: metadata.annotations['jobset.sigs.k8s.io/replicatedjob-name']
317+
- name: MEGASCALE_SLICE_ID
318+
valueFrom:
319+
fieldRef:
320+
fieldPath: "metadata.labels['jobset.sigs.k8s.io/job-index']"
321+
- name: MEGASCALE_COORDINATOR_ADDRESS
322+
value: "$(JOBSET_NAME)-$(REPLICATED_JOB_NAME)-$(MEGASCALE_SLICE_ID)-0.$(JOBSET_NAME)"
301323
{pathways_sidecar_container}
302324
nodeSelector:
303325
{accelerator_label}
@@ -397,6 +419,13 @@ def workload_create_pathways(args) -> None:
397419
0 if successful and 1 otherwise.
398420
"""
399421
args.use_pathways = True
422+
if args.headless:
423+
xpk_print(
424+
'Please use kubectl port forwarding to connect to the Pathways proxy.'
425+
' kubectl get pods kubectl port-forward <proxy-pod-name> 29000:29000'
426+
' JAX_PLATFORMS=proxy JAX_BACKEND_TARGET=grpc://127.0.0.1:29000 python'
427+
" -c 'import pathwaysutils; import jax; print(jax.devices())'"
428+
)
400429
workload_create(args)
401430

402431

0 commit comments

Comments
 (0)