[train][docs] update Jax doc to include GPU and multislice TPU support #60593

liulehui · 2026-01-29T23:15:16Z

Description

We added GPU (#58322) and multislice TPU (#58629) support for JaxTrainer, this PR is to update the corresponding docs.

Additional information

tested with make develop && make local

Signed-off-by: Lehui Liu <lehui@anyscale.com>

gemini-code-assist

Code Review

The pull request successfully updates the documentation and JaxTrainer implementation to include GPU and multislice TPU support. The changes are consistent across the documentation files and the Python code, providing clearer explanations and examples for users. The removal of outdated JAX environment variables and the correction of dataset shard access in the examples are positive improvements.

gemini-code-assist · 2026-01-29T23:17:08Z

doc/source/train/getting-started-jax.rst

+For GPU training, `ScalingConfig` is similar to other frameworks. Key fields include:
+
+* :class:`num_workers <ray.train.ScalingConfig>`: The number of distributed training worker processes.
+* :class:`use_gpu <ray.train.ScalingConfig>`: Whether each worker should use a GPU (or CPU).


The phrase "(or CPU)" is redundant here. If use_gpu is True, it means GPU. If False, it implies CPU. Removing it will make the description more concise.

Suggested change

* :class:`use_gpu <ray.train.ScalingConfig>`: Whether each worker should use a GPU (or CPU).

* :class:`use_gpu <ray.train.ScalingConfig>`: Whether each worker should use a GPU.

matthewdeng · 2026-01-29T23:58:57Z

doc/source/train/getting-started-jax.rst

-Together, these configurations provide a declarative API for defining your entire distributed JAX
-training environment, allowing Ray Train to handle the complex task of launching and coordinating
-workers across a TPU slice.
+For GPU training, `ScalingConfig` is similar to other frameworks. Key fields include:


I would edit this to not assume that the user knows how to set up other frameworks.

matthewdeng · 2026-01-30T00:56:05Z

doc/source/train/getting-started-jax.rst


-* `use_tpu`: This is a new field added in Ray 2.49.0 to the V2 `ScalingConfig`. This boolean flag explicitly tells Ray Train to initialize the JAX backend for TPU execution.
-* `topology`: This is a new field added in Ray 2.49.0 to the V2 `ScalingConfig`. Topology is a string defining the physical arrangement of the TPU chips (e.g., "4x4"). This is required for multi-host training and ensures Ray places workers correctly across the slice. For a list of supported TPU topologies by generation,
+* :class:`use_tpu <ray.train.ScalingConfig>`: It's a new field added in Ray 2.49.0 to the V2 `ScalingConfig`. This boolean flag tells Ray Train to initialize the JAX backend for TPU execution.


I prefer the previous wording ("This is...")

matthewdeng · 2026-01-30T00:57:40Z

doc/source/train/getting-started-jax.rst

-Together, these configurations provide a declarative API for defining your entire distributed JAX
-training environment, allowing Ray Train to handle the complex task of launching and coordinating
-workers across a TPU slice.
+For GPU training, `ScalingConfig` is similar to other frameworks. Key fields include:


A big thing that's missing here is what the relationship is between workers and resources, i.e. should one worker map to one GPU, node, or something else?

matthewdeng · 2026-01-30T01:06:22Z

python/ray/train/v2/jax/jax_trainer.py

+            # If you want to use GPUs, specify the GPU scaling config like below.
+            # gpu_scaling_config = ScalingConfig(
+            #     use_gpu=True,
+            #     num_workers=4,
+            #     resources_per_worker={"GPU": 1},
+            # )


nit: Right now gpu_scaling_config and tpu_scaling_config have different names, so simply uncomment this wouldn't enable GPU training, you'd also have to update scaling_config=....

Either you can:

Name these back to just scaling_config, or:

Uncomment this and add logic/a comment on switching between these two to enable GPU training.

matthewdeng · 2026-01-30T01:07:24Z

doc/source/train/getting-started-jax.rst

+            # gpu_scaling_config = ScalingConfig(
+            #     num_workers=4,
+            #     use_gpu=True,
+            #     resources_per_worker={"GPU": 1},


Do we want to show resources_per_worker for this since it matches the default?

liulehui added 3 commits January 29, 2026 13:52

change jax doc to include GPU examples

660b4f2

Signed-off-by: Lehui Liu <lehui@anyscale.com>

change jax docs to include GPU training

4a2c212

Signed-off-by: Lehui Liu <lehui@anyscale.com>

fix some typos

8a60aec

Signed-off-by: Lehui Liu <lehui@anyscale.com>

liulehui requested a review from a team as a code owner January 29, 2026 23:15

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

matthewdeng reviewed Jan 30, 2026

View reviewed changes

ray-gardener bot added docs An issue or change related to documentation train Ray Train Related Issue labels Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train][docs] update Jax doc to include GPU and multislice TPU support #60593

[train][docs] update Jax doc to include GPU and multislice TPU support #60593

liulehui commented Jan 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Uh oh!

matthewdeng Jan 29, 2026

Uh oh!

matthewdeng Jan 30, 2026

Uh oh!

matthewdeng Jan 30, 2026

Uh oh!

matthewdeng Jan 30, 2026

Uh oh!

matthewdeng Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	* :class:`use_gpu <ray.train.ScalingConfig>`: Whether each worker should use a GPU (or CPU).
	* :class:`use_gpu <ray.train.ScalingConfig>`: Whether each worker should use a GPU.

[train][docs] update Jax doc to include GPU and multislice TPU support #60593

Are you sure you want to change the base?

[train][docs] update Jax doc to include GPU and multislice TPU support #60593

Conversation

liulehui commented Jan 29, 2026

Description

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

matthewdeng Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

matthewdeng Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

matthewdeng Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

matthewdeng Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

matthewdeng Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants