Skip to content

Commit

Permalink
[train/docs] Mini reference fixes (ray-project#39463)
Browse files Browse the repository at this point in the history
Fixing some references to API docs or other documents in the PyTorch getting started guides.

Signed-off-by: Kai Fricke <kai@anyscale.com>
  • Loading branch information
krfricke authored Sep 22, 2023
1 parent f311495 commit eeaf0ae
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 15 deletions.
7 changes: 4 additions & 3 deletions doc/source/train/getting-started-pytorch-lightning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ For reference, the final code is as follows:
trainer = TorchTrainer(train_func, scaling_config=scaling_config)
result = trainer.fit()
1. Your `train_func` is the Python code that each distributed training :ref:`worker <train-overview-worker>` executes.
2. Your `ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
3. Your `TorchTrainer` launches the distributed training job.
1. `train_func` is the Python code that executes on each distributed training worker.
2. :class:`~ray.train.ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
3. :class:`~ray.train.torch.TorchTrainer` launches the distributed training job.

Compare a PyTorch Lightning training script with and without Ray Train.

Expand Down Expand Up @@ -84,6 +84,7 @@ Compare a PyTorch Lightning training script with and without Ray Train.
.. group-tab:: PyTorch Lightning + Ray Train

.. code-block:: python
:emphasize-lines: 8-10, 34, 43, 48-50, 52, 53, 55-60
import torch
from torchvision.models import resnet18
Expand Down
11 changes: 6 additions & 5 deletions doc/source/train/getting-started-pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ For reference, the final code is as follows:
result = trainer.fit()
1. `train_func` is the Python code that executes on each distributed training worker.
2. `ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
3. `TorchTrainer` launches the distributed training job.
2. :class:`~ray.train.ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
3. :class:`~ray.train.torch.TorchTrainer` launches the distributed training job.

Compare a PyTorch training script with and without Ray Train.

Expand Down Expand Up @@ -80,7 +80,8 @@ Compare a PyTorch training script with and without Ray Train.
.. group-tab:: PyTorch + Ray Train

.. code-block:: python
:emphasize-lines: 9, 10, 12, 17, 18, 26, 27, 41, 42, 44-49
import tempfile
import torch
from torchvision.models import resnet18
Expand Down Expand Up @@ -244,8 +245,8 @@ Configure scale and GPUs

Outside of your training function, create a :class:`~ray.train.ScalingConfig` object to configure:

1. `num_workers` - The number of distributed training worker processes.
2. `use_gpu` - Whether each worker should use a GPU (or CPU).
1. :class:`num_workers <ray.train.ScalingConfig>` - The number of distributed training worker processes.
2. :class:`use_gpu <ray.train.ScalingConfig>` - Whether each worker should use a GPU (or CPU).

.. code-block:: python
Expand Down
5 changes: 3 additions & 2 deletions doc/source/train/getting-started-transformers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ For reference, the final code follows:
trainer = TorchTrainer(train_func, scaling_config=scaling_config)
result = trainer.fit()
1. `train_func` is the Python code that executes on each distributed training :ref:`worker <train-overview-worker>`.
2. :class:`~ray.train.ScalingConfig` defines the number of distributed training workers and computing resources (e.g. GPUs).
1. `train_func` is the Python code that executes on each distributed training worker.
2. :class:`~ray.train.ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
3. :class:`~ray.train.torch.TorchTrainer` launches the distributed training job.

Compare a Hugging Face Transformers training script with and without Ray Train.
Expand Down Expand Up @@ -96,6 +96,7 @@ Compare a Hugging Face Transformers training script with and without Ray Train.
.. group-tab:: Hugging Face Transformers + Ray Train

.. code-block:: python
:emphasize-lines: 11-13, 15-18, 55-72
import numpy as np
import evaluate
Expand Down
10 changes: 5 additions & 5 deletions doc/source/train/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Worker

Ray Train distributes model training compute to individual worker processes across the cluster.
Each worker is a process that executes the `train_func`.
The number of workers determines the parallelism of the training job and is configured in the `ScalingConfig`.
The number of workers determines the parallelism of the training job and is configured in the :class:`~ray.train.ScalingConfig`.

.. _train-overview-scaling-config:

Expand All @@ -57,8 +57,8 @@ Scaling configuration
The :class:`~ray.train.ScalingConfig` is the mechanism for defining the scale of the training job.
Specify two basic parameters for worker parallelism and compute resources:

* `num_workers`: The number of workers to launch for a distributed training job.
* `use_gpu`: Whether each worker should use a GPU or CPU.
* :class:`num_workers <ray.train.ScalingConfig>`: The number of workers to launch for a distributed training job.
* :class:`use_gpu <ray.train.ScalingConfig>`: Whether each worker should use a GPU or CPU.

.. code-block:: python
Expand All @@ -80,9 +80,9 @@ Trainer

The Trainer ties the previous three concepts together to launch distributed training jobs.
Ray Train provides :ref:`Trainer classes <train-api>` for different frameworks.
Calling the `fit()` method executes the training job by:
Calling the :meth:`fit() <ray.train.trainer.BaseTrainer.fit>` method executes the training job by:

#. Launching workers as defined by the `scaling_config`.
#. Launching workers as defined by the :ref:`scaling_config <train-overview-scaling-config>`.
#. Setting up the framework's distributed environment on all workers.
#. Running the `train_func` on all workers.

Expand Down

0 comments on commit eeaf0ae

Please sign in to comment.