Skip to content

Commit 90aa201

Browse files
committed
Merge branch 'master' of https://github.com/ray-project/ray into depr
2 parents 26f2b1d + cc0b719 commit 90aa201

File tree

17 files changed

+445
-220
lines changed

17 files changed

+445
-220
lines changed

.buildkite/pipeline.ml.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
- label: ":steam_locomotive: Train tests and examples"
3333
conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_TRAIN_AFFECTED"]
3434
instance_size: large
35-
parallelism: 4
35+
parallelism: 3
3636
commands:
3737
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
3838
# Todo (krfricke): Move mosaicml to train-test-requirements.txt
@@ -343,7 +343,7 @@
343343
- label: ":steam_locomotive: :floppy_disk: New persistence mode: Train tests and examples"
344344
conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_TRAIN_AFFECTED"]
345345
instance_size: large
346-
parallelism: 4
346+
parallelism: 3
347347
commands:
348348
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
349349
# Todo (krfricke): Move mosaicml to train-test-requirements.txt

LICENSE

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@
186186
same "printed page" as the copyright notice for easier
187187
identification within third-party archives.
188188

189-
Copyright {yyyy} {name of copyright owner}
189+
Copyright 2023 Ray Authors
190190

191191
Licensed under the Apache License, Version 2.0 (the "License");
192192
you may not use this file except in compliance with the License.
@@ -447,4 +447,4 @@ Unless required by applicable law or agreed to in writing, software
447447
distributed under the License is distributed on an "AS IS" BASIS,
448448
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
449449
See the License for the specific language governing permissions and
450-
limitations under the License.
450+
limitations under the License.

doc/source/serve/api/index.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,8 @@ This is fixed by added custom filename mappings in `source/conf.py` (look for "a
5353
5454
serve.get_replica_context
5555
serve.get_multiplexed_model_id
56+
serve.get_app_handle
57+
serve.get_deployment_handle
5658
```
5759

5860
### Running Applications
@@ -66,6 +68,7 @@ This is fixed by added custom filename mappings in `source/conf.py` (look for "a
6668
serve.delete
6769
serve.start
6870
serve.shutdown
71+
serve.status
6972
```
7073

7174
(serve-cli)=

doc/source/train/examples/lightning/vicuna_13b_lightning_deepspeed_finetune.ipynb

Lines changed: 102 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@
77
"source": [
88
"(vicuna_lightning_deepspeed_finetuning)=\n",
99
"\n",
10-
"# Fine-tune `vicuna-13b` with Ray LightningTrainer and DeepSpeed\n",
10+
"# Fine-tune `vicuna-13b` with Lightning and DeepSpeed\n",
1111
"\n",
12-
"In this example, we will demonstrate how to perform full fine-tuning for a [`vicuna-13b-v1.3`](https://huggingface.co/lmsys/vicuna-13b-v1.3) model using LightningTrainer with the DeepSpeed ZeRO-3 strategy.\n",
12+
"In this example, we will demonstrate how to perform full fine-tuning for a [`vicuna-13b-v1.3`](https://huggingface.co/lmsys/vicuna-13b-v1.3) model using Ray Train PyTorch Lightning integrations with the DeepSpeed ZeRO-3 strategy.\n",
1313
"\n",
1414
"- [DeepSpeed](<https://github.com/microsoft/DeepSpeed>) is an open-source deep learning optimization library for PyTorch. It's designed to reduce computing power and memory usage, and to train large distributed models by leveraging state-of-the-art innovations like ZeRO, 3D-Parallelism, DeepSpeed-MoE, and ZeRO-Infinity. \n",
1515
"- PyTorch Lightning offers a [DeepSpeed integration](https://lightning.ai/docs/pytorch/stable/api/pytorch_lightning.strategies.DeepSpeedStrategy.html), which provides a simple interface to configure the knobs for DeepSpeed and automatically trigger your training process with the DeepSpeed Engine.\n",
16-
"- {class}`Ray LightningTrainer <ray.train.lightning.LightningTrainer>` allows you to easily scale your PyTorch Lightning job across multiple nodes in a Ray cluster, without worrying about the underlying cluster management, autoscaling, and distributed process group settings.\n",
16+
"- {class}`Ray TorchTrainer <ray.train.torch.TorchTrainer>` allows you to easily scale your PyTorch Lightning job across multiple nodes in a Ray cluster, without worrying about the underlying cluster management, autoscaling, and distributed process group settings.\n",
1717
"\n",
1818
"Our demo aims to illustrate how these three tools can be combined effectively to finetune the Vicuna-13B model, leveraging the strengths of each to create an efficient and high-performance deep learning solution.\n"
1919
]
@@ -24,11 +24,11 @@
2424
"metadata": {},
2525
"source": [
2626
"```{note}\n",
27-
"This is an advanced example of Large Language Model fine-tuning with Ray Train. If you're a beginner or new to the concepts of Ray Train and LightningTrainer, it would be beneficial to first explore the introductory documentation below to build a foundational understanding. \n",
27+
"This is an advanced example of Large Language Model fine-tuning with Ray Train. If you're a beginner or new to the concepts of Ray Train and our Lightning integrations, it would be beneficial to first explore the introductory documentation below to build a foundational understanding. \n",
2828
"- [Ray Train Key Concepts](train-key-concepts) \n",
2929
"- [Ray Data Key Concepts](data_key_concepts)\n",
30-
"- {ref}`[Basic] Image Classification with LightningTrainer <lightning_mnist_example>`\n",
31-
"- {ref}`[Intermediate] Using LightningTrainer with Ray Data <lightning_advanced_example>`\n",
30+
"- {ref}`[Basic] Image Classification with PyTorch Lightning and Ray Train <lightning_mnist_example>`\n",
31+
"- {ref}`[Intermediate] Fine-tuning Lightning Modules with with Ray Data <lightning_advanced_example>`\n",
3232
"```\n"
3333
]
3434
},
@@ -81,6 +81,21 @@
8181
"```"
8282
]
8383
},
84+
{
85+
"cell_type": "code",
86+
"execution_count": null,
87+
"metadata": {
88+
"tags": [
89+
"remove-cell"
90+
]
91+
},
92+
"outputs": [],
93+
"source": [
94+
"# TODO(@justinvyu): Remove it\n",
95+
"import os\n",
96+
"os.environ[\"RAY_AIR_NEW_PERSISTENCE_MODE\"] = \"1\""
97+
]
98+
},
8499
{
85100
"cell_type": "code",
86101
"execution_count": null,
@@ -102,7 +117,8 @@
102117
" \"accelerate==0.20.3\",\n",
103118
" \"transformers==4.30.2\",\n",
104119
" \"pytorch_lightning==2.0.3\",\n",
105-
" ]\n",
120+
" ],\n",
121+
" \"env_vars\": {\"RAY_AIR_NEW_PERSISTENCE_MODE\": \"1\"} # TODO(@justinvyu): Remove it\n",
106122
" }\n",
107123
")"
108124
]
@@ -219,12 +235,26 @@
219235
"processed_ds = ray_ds.map_batches(fill_prompt, batch_format=\"pandas\").map_batches(tokenize, batch_format=\"pandas\")"
220236
]
221237
},
238+
{
239+
"cell_type": "code",
240+
"execution_count": null,
241+
"metadata": {
242+
"tags": [
243+
"remove-cell"
244+
]
245+
},
246+
"outputs": [],
247+
"source": [
248+
"# To accelerate release tests\n",
249+
"processed_ds = processed_ds.limit(16 * 8 * 16) # each worker has 16 batches"
250+
]
251+
},
222252
{
223253
"attachments": {},
224254
"cell_type": "markdown",
225255
"metadata": {},
226256
"source": [
227-
"## Define your model\n",
257+
"## Define a Lightning Module\n",
228258
"\n",
229259
"Here we load the pre-trained model weights from HuggingFace Model Hub, and wrap them into `pl.LightningModule`. We adopted the efficient model initialization techniques introduced in [Lightning-transformers](https://github.com/Lightning-Universe/lightning-transformers) to avoid unnecessary full weights loading."
230260
]
@@ -306,7 +336,7 @@
306336
"cell_type": "markdown",
307337
"metadata": {},
308338
"source": [
309-
"## Training Configurations\n",
339+
"## DeepSpeed Configurations\n",
310340
"\n",
311341
"Before training, let's calculate the memory usage of finetuning a `vicuna-13b` model. Assume we are using FP16 mixed-precision training, and the optimizer is Adam with FP32 states.\n",
312342
"\n",
@@ -324,7 +354,6 @@
324354
"metadata": {},
325355
"outputs": [],
326356
"source": [
327-
"from ray.train.lightning import LightningTrainer, LightningConfigBuilder\n",
328357
"from transformers import AutoConfig\n",
329358
"\n",
330359
"config = AutoConfig.from_pretrained(MODEL_NAME)\n",
@@ -342,63 +371,24 @@
342371
" \"stage3_prefetch_bucket_size\": 0.9 * HIDDEN_SIZE * HIDDEN_SIZE,\n",
343372
" \"stage3_param_persistence_threshold\": 10 * HIDDEN_SIZE,\n",
344373
" },\n",
345-
"}\n",
346-
"\n",
347-
"lightning_config = (\n",
348-
" LightningConfigBuilder()\n",
349-
" .module(cls=Vicuna13BModel)\n",
350-
" .trainer(\n",
351-
" max_epochs=1,\n",
352-
" accelerator=\"gpu\",\n",
353-
" precision=\"bf16-mixed\",\n",
354-
" accumulate_grad_batches=2,\n",
355-
" )\n",
356-
" .strategy(name=\"deepspeed\", config=deepspeed_configs)\n",
357-
" .checkpointing(save_top_k=0, save_weights_only=True, save_last=True)\n",
358-
")"
359-
]
360-
},
361-
{
362-
"cell_type": "code",
363-
"execution_count": null,
364-
"metadata": {
365-
"tags": [
366-
"remove-cell"
367-
]
368-
},
369-
"outputs": [],
370-
"source": [
371-
"from pytorch_lightning.callbacks import TQDMProgressBar\n",
372-
"\n",
373-
"# Create a customized progress bar for LightningTrainer\n",
374-
"class VicunaProgressBar(TQDMProgressBar):\n",
375-
" def __init__(self, num_iters_per_epoch, *args, **kwargs):\n",
376-
" super().__init__(*args, **kwargs)\n",
377-
" self.num_iters_per_epoch = num_iters_per_epoch\n",
378-
"\n",
379-
" def on_train_epoch_start(self, trainer, *_):\n",
380-
" super().on_train_epoch_start(trainer, *_)\n",
381-
" self.train_progress_bar.reset(self.num_iters_per_epoch)\n",
382-
"\n",
383-
"\n",
384-
"total_batches = processed_ds.count()\n",
385-
"num_iters_per_epoch = total_batches // (NUM_WORKERS * BATCH_SIZE_PER_WORKER)\n",
386-
"progress_bar = VicunaProgressBar(num_iters_per_epoch)\n",
387-
"\n",
388-
"\n",
389-
"lightning_config.trainer(\n",
390-
" callbacks=[progress_bar],\n",
391-
" # Take a subset to accelerate release tests\n",
392-
" limit_train_batches=20,\n",
393-
")"
374+
"}"
394375
]
395376
},
396377
{
397378
"attachments": {},
398379
"cell_type": "markdown",
399380
"metadata": {},
400381
"source": [
401-
"Finally, combine all the configurations with {class}`LightningConfigBuilder <ray.train.lightning.LightningConfigBuilder>` and instantiate a LightningTrainer. "
382+
"## Define your training function\n",
383+
"\n",
384+
"Finally, define the training function that will be launched on multiple workers. The training function is generally the same as the pure pytorch Lightning training code, with additional Ray Train utilities:\n",
385+
"\n",
386+
"- {class}`~ray.train.lightning.RayDeepSpeedStrategy`: Same argument list as Lightning DeepSpeedStrategy but integrated with Ray Train.\n",
387+
"- {class}`~ray.train.lightning.RayLightningEnvironment`: Lightning environments for Ray cluster.\n",
388+
"- {class}`~ray.train.lightning.RayTrainReportCallback`: On each epoch end, it reports the checkpoint from each worker to the ray train (distributed checkpointing).\n",
389+
"- {meth}`~ray.train.lightning.prepare_trainer`: Validate your lightning Trainer configurations.\n",
390+
"\n",
391+
"For Ray Data ingestion, we fetched the preprocessed and sharded dataset with {meth}`~ray.train.get_dataset_shard`, and created a dataloader with {meth}`~ray.data.Dataset.iter_torch_batches`. It returns a custom iterator that replaces the Torch DataLoader.\n"
402392
]
403393
},
404394
{
@@ -407,50 +397,76 @@
407397
"metadata": {},
408398
"outputs": [],
409399
"source": [
400+
"import ray.train\n",
410401
"from ray.train import CheckpointConfig, RunConfig, ScalingConfig\n",
402+
"from ray.train.torch import TorchTrainer\n",
403+
"from ray.train.lightning import (\n",
404+
" prepare_trainer,\n",
405+
" RayDeepSpeedStrategy, \n",
406+
" RayLightningEnvironment, \n",
407+
" RayTrainReportCallback\n",
408+
")\n",
411409
"\n",
412-
"trainer = LightningTrainer(\n",
413-
" lightning_config=lightning_config.build(),\n",
410+
"\n",
411+
"def train_func(config):\n",
412+
" \"\"\"Training function for each worker.\"\"\"\n",
413+
"\n",
414+
" # Unpack the `train_loop_config`\n",
415+
" max_epochs = config[\"max_epochs\"]\n",
416+
" batch_size = config[\"batch_size\"]\n",
417+
" accumulate_grad_batches = config[\"accumulate_grad_batches\"]\n",
418+
"\n",
419+
" model = Vicuna13BModel()\n",
420+
" \n",
421+
" # Prepare Ray Data Ingestion\n",
422+
" train_ds = ray.train.get_dataset_shard(\"train\")\n",
423+
" train_dataloader = train_ds.iter_torch_batches(batch_size=batch_size)\n",
424+
" \n",
425+
" pl_trainer = pl.Trainer(\n",
426+
" devices=\"auto\",\n",
427+
" accelerator=\"auto\",\n",
428+
" strategy=RayDeepSpeedStrategy(config=deepspeed_configs),\n",
429+
" plugins=[RayLightningEnvironment()],\n",
430+
" callbacks=[RayTrainReportCallback()],\n",
431+
" enable_checkpointing=False, # RayTrainReportCallback will save the checkpoints\n",
432+
" max_epochs=max_epochs,\n",
433+
" precision=\"bf16-mixed\",\n",
434+
" accumulate_grad_batches=accumulate_grad_batches,\n",
435+
" )\n",
436+
" pl_trainer = prepare_trainer(pl_trainer)\n",
437+
"\n",
438+
" pl_trainer.fit(model, train_dataloaders=train_dataloader)\n",
439+
" \n",
440+
"\n",
441+
"trainer = TorchTrainer(\n",
442+
" train_loop_per_worker=train_func,\n",
443+
" train_loop_config={\n",
444+
" \"max_epochs\": 1,\n",
445+
" \"batch_size\": BATCH_SIZE_PER_WORKER,\n",
446+
" \"accumulate_grad_batches\": 2\n",
447+
" },\n",
414448
" run_config=RunConfig(\n",
415449
" name=\"vicuna-13b-finetune\",\n",
416450
" storage_path=\"s3://anyscale-staging-data-cld-kvedzwag2qa8i5bjxuevf5i7/air-release-tests\",\n",
417-
" checkpoint_config=CheckpointConfig(\n",
418-
" num_to_keep=1,\n",
419-
" # Enable distributed checkpointing\n",
420-
" _checkpoint_keep_all_ranks=True,\n",
421-
" _checkpoint_upload_from_workers=True,\n",
422-
" ),\n",
451+
" checkpoint_config=CheckpointConfig(num_to_keep=1),\n",
423452
" ),\n",
424453
" scaling_config=ScalingConfig(\n",
425454
" num_workers=NUM_WORKERS,\n",
426455
" use_gpu=True,\n",
427456
" resources_per_worker={\"CPU\": 15, \"GPU\": 1},\n",
428457
" ),\n",
429458
" datasets={\"train\": processed_ds},\n",
430-
" datasets_iter_config={\"batch_size\": BATCH_SIZE_PER_WORKER},\n",
431459
")"
432460
]
433461
},
434-
{
435-
"attachments": {},
436-
"cell_type": "markdown",
437-
"metadata": {},
438-
"source": [
439-
"```{tip}\n",
440-
"\n",
441-
"Here, we highly recommend saving checkpoints with cloud storage and enabling distributed checkpointing by setting `_checkpoint_keep_all_ranks` and `_checkpoint_upload_from_workers` to True when training huge models. Otherwise, all checkpoint shards will be synced to the head node, which may introduce enormous syncing overhead and even cause out-of-memory.\n",
442-
"\n",
443-
"```"
444-
]
445-
},
446462
{
447463
"attachments": {},
448464
"cell_type": "markdown",
449465
"metadata": {},
450466
"source": [
451467
"## Model Fine-tuning\n",
452468
"\n",
453-
"Once everything is configured in LightningTrainer, training becomes easy. Simply call `trainer.fit()`, and your workload will be scaled to the Ray cluster, initiating ZeRO-3 parallel training."
469+
"Once everything is configured in TorchTrainer, training becomes easy. Simply call `trainer.fit()`, and your workload will be scaled to the Ray cluster, initiating ZeRO-3 parallel training."
454470
]
455471
},
456472
{
@@ -1022,7 +1038,7 @@
10221038
"- Training takes: 36:06 = 2166s\n",
10231039
"- Training + initialization + checkpointing takes 2473s\n",
10241040
"\n",
1025-
"Therefore, the model initialization and checkpoint syncing takes 307s. It will be amortized when you have larger datasets and spend more time on training."
1041+
"Model initialization and checkpoint synchronization took 307 seconds. It will be amortized as you have larger datasets and take more time to train."
10261042
]
10271043
},
10281044
{
@@ -1091,7 +1107,7 @@
10911107
"source": [
10921108
"import os\n",
10931109
"\n",
1094-
"os.system(f\"awsv2 s3 sync {result.checkpoint.uri} /mnt/local_storage/checkpoint\")"
1110+
"os.system(f\"awsv2 s3 sync s3://{result.checkpoint.path} /mnt/local_storage\")"
10951111
]
10961112
},
10971113
{
@@ -1136,8 +1152,8 @@
11361152
" torch.save(vicuna_state_dict, os.path.join(zero_ckpt_dir, \"full_model.pt\"))\n",
11371153
"\n",
11381154
"\n",
1139-
"full_model_ckpt_path = \"/mnt/local_storage/checkpoint/model/full_model.pt\"\n",
1140-
"extract_fp32_ckpt_from_zero(\"/mnt/local_storage/checkpoint/model\")"
1155+
"full_model_ckpt_path = \"/mnt/local_storage/checkpoint.ckpt/full_model.pt\"\n",
1156+
"extract_fp32_ckpt_from_zero(\"/mnt/local_storage/checkpoint.ckpt\")"
11411157
]
11421158
},
11431159
{

python/ray/serve/BUILD

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,14 @@ py_test(
177177
deps = [":serve_lib"],
178178
)
179179

180+
py_test(
181+
name = "test_telemetry_2",
182+
size = "large",
183+
srcs = serve_tests_srcs,
184+
tags = ["exclusive", "team:serve"],
185+
deps = [":serve_lib"],
186+
)
187+
180188
py_test(
181189
name = "test_batching",
182190
size = "small",

python/ray/serve/_private/usage.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ class ServeUsageTag(Enum):
2929
MULTIPLEXED_API_USED = TagKey.SERVE_MULTIPLEXED_API_USED
3030
HTTP_PROXY_USED = TagKey.SERVE_HTTP_PROXY_USED
3131
GRPC_PROXY_USED = TagKey.SERVE_GRPC_PROXY_USED
32+
SERVE_STATUS_API_USED = TagKey.SERVE_STATUS_API_USED
33+
SERVE_GET_APP_HANDLE_API_USED = TagKey.SERVE_GET_APP_HANDLE_API_USED
34+
SERVE_GET_DEPLOYMENT_HANDLE_API_USED = TagKey.SERVE_GET_DEPLOYMENT_HANDLE_API_USED
3235

3336
def record(self, value: str):
3437
"""Record telemetry value."""

0 commit comments

Comments
 (0)