ray-project
diff --git a/‎.buildkite/pipeline.ml.yml‎
Lines changed: 2 additions & 2 deletions b/‎.buildkite/pipeline.ml.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎LICENSE‎
Lines changed: 2 additions & 2 deletions b/‎LICENSE‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/source/serve/api/index.md‎
Lines changed: 3 additions & 0 deletions b/‎doc/source/serve/api/index.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎doc/source/train/examples/lightning/vicuna_13b_lightning_deepspeed_finetune.ipynb‎
Lines changed: 102 additions & 86 deletions b/‎doc/source/train/examples/lightning/vicuna_13b_lightning_deepspeed_finetune.ipynb‎
Lines changed: 102 additions & 86 deletions
diff --git a/‎python/ray/serve/BUILD‎
Lines changed: 8 additions & 0 deletions b/‎python/ray/serve/BUILD‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎python/ray/serve/_private/usage.py‎
Lines changed: 3 additions & 0 deletions b/‎python/ray/serve/_private/usage.py‎
Lines changed: 3 additions & 0 deletions
@@ -32,7 +32,7 @@
 - label: ":steam_locomotive: Train tests and examples"
   conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_TRAIN_AFFECTED"]
   instance_size: large
-  parallelism: 4
+  parallelism: 3
   commands:
     - cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
     # Todo (krfricke): Move mosaicml to train-test-requirements.txt
@@ -343,7 +343,7 @@
 - label: ":steam_locomotive: :floppy_disk: New persistence mode: Train tests and examples"
   conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_TRAIN_AFFECTED"]
   instance_size: large
-  parallelism: 4
+  parallelism: 3
   commands:
     - cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
     # Todo (krfricke): Move mosaicml to train-test-requirements.txt
 
@@ -186,7 +186,7 @@
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
 
-   Copyright {yyyy} {name of copyright owner}
+   Copyright 2023 Ray Authors
 
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
@@ -447,4 +447,4 @@ Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
-limitations under the License.
+limitations under the License.
@@ -53,6 +53,8 @@ This is fixed by added custom filename mappings in `source/conf.py` (look for "a
 
    serve.get_replica_context
    serve.get_multiplexed_model_id
+   serve.get_app_handle
+   serve.get_deployment_handle
 ```
 
 ### Running Applications
@@ -66,6 +68,7 @@ This is fixed by added custom filename mappings in `source/conf.py` (look for "a
    serve.delete
    serve.start
    serve.shutdown
+   serve.status
 ```
 
 (serve-cli)=
 
@@ -7,13 +7,13 @@
    "source": [
     "(vicuna_lightning_deepspeed_finetuning)=\n",
     "\n",
-    "# Fine-tune `vicuna-13b` with Ray LightningTrainer and DeepSpeed\n",
+    "# Fine-tune `vicuna-13b` with Lightning and DeepSpeed\n",
     "\n",
-    "In this example, we will demonstrate how to perform full fine-tuning for a [`vicuna-13b-v1.3`](https://huggingface.co/lmsys/vicuna-13b-v1.3) model using LightningTrainer with the DeepSpeed ZeRO-3 strategy.\n",
+    "In this example, we will demonstrate how to perform full fine-tuning for a [`vicuna-13b-v1.3`](https://huggingface.co/lmsys/vicuna-13b-v1.3) model using Ray Train PyTorch Lightning integrations with the DeepSpeed ZeRO-3 strategy.\n",
     "\n",
     "- [DeepSpeed](<https://github.com/microsoft/DeepSpeed>) is an open-source deep learning optimization library for PyTorch. It's designed to reduce computing power and memory usage, and to train large distributed models by leveraging state-of-the-art innovations like ZeRO, 3D-Parallelism, DeepSpeed-MoE, and ZeRO-Infinity. \n",
     "- PyTorch Lightning offers a [DeepSpeed integration](https://lightning.ai/docs/pytorch/stable/api/pytorch_lightning.strategies.DeepSpeedStrategy.html), which provides a simple interface to configure the knobs for DeepSpeed and automatically trigger your training process with the DeepSpeed Engine.\n",
-    "- {class}`Ray LightningTrainer <ray.train.lightning.LightningTrainer>` allows you to easily scale your PyTorch Lightning job across multiple nodes in a Ray cluster, without worrying about the underlying cluster management, autoscaling, and distributed process group settings.\n",
+    "- {class}`Ray TorchTrainer <ray.train.torch.TorchTrainer>` allows you to easily scale your PyTorch Lightning job across multiple nodes in a Ray cluster, without worrying about the underlying cluster management, autoscaling, and distributed process group settings.\n",
     "\n",
     "Our demo aims to illustrate how these three tools can be combined effectively to finetune the Vicuna-13B model, leveraging the strengths of each to create an efficient and high-performance deep learning solution.\n"
    ]
@@ -24,11 +24,11 @@
    "metadata": {},
    "source": [
     "```{note}\n",
-    "This is an advanced example of Large Language Model fine-tuning with Ray Train. If you're a beginner or new to the concepts of Ray Train and LightningTrainer, it would be beneficial to first explore the introductory documentation below to build a foundational understanding. \n",
+    "This is an advanced example of Large Language Model fine-tuning with Ray Train. If you're a beginner or new to the concepts of Ray Train and our Lightning integrations, it would be beneficial to first explore the introductory documentation below to build a foundational understanding. \n",
     "- [Ray Train Key Concepts](train-key-concepts) \n",
     "- [Ray Data Key Concepts](data_key_concepts)\n",
-    "- {ref}`[Basic] Image Classification with LightningTrainer <lightning_mnist_example>`\n",
-    "- {ref}`[Intermediate] Using LightningTrainer with Ray Data <lightning_advanced_example>`\n",
+    "- {ref}`[Basic] Image Classification with PyTorch Lightning and Ray Train <lightning_mnist_example>`\n",
+    "- {ref}`[Intermediate] Fine-tuning Lightning Modules with with Ray Data <lightning_advanced_example>`\n",
     "```\n"
    ]
   },
@@ -81,6 +81,21 @@
     "```"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "remove-cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# TODO(@justinvyu): Remove it\n",
+    "import os\n",
+    "os.environ[\"RAY_AIR_NEW_PERSISTENCE_MODE\"] = \"1\""
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -102,7 +117,8 @@
     "            \"accelerate==0.20.3\",\n",
     "            \"transformers==4.30.2\",\n",
     "            \"pytorch_lightning==2.0.3\",\n",
-    "        ]\n",
+    "        ],\n",
+    "        \"env_vars\": {\"RAY_AIR_NEW_PERSISTENCE_MODE\": \"1\"} # TODO(@justinvyu): Remove it\n",
     "    }\n",
     ")"
    ]
@@ -219,12 +235,26 @@
     "processed_ds = ray_ds.map_batches(fill_prompt, batch_format=\"pandas\").map_batches(tokenize, batch_format=\"pandas\")"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "remove-cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# To accelerate release tests\n",
+    "processed_ds = processed_ds.limit(16 * 8 * 16)  # each worker has 16 batches"
+   ]
+  },
   {
    "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Define your model\n",
+    "## Define a Lightning Module\n",
     "\n",
     "Here we load the pre-trained model weights from HuggingFace Model Hub, and wrap them into `pl.LightningModule`. We adopted the efficient model initialization techniques introduced in [Lightning-transformers](https://github.com/Lightning-Universe/lightning-transformers) to avoid unnecessary full weights loading."
    ]
@@ -306,7 +336,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Training Configurations\n",
+    "## DeepSpeed Configurations\n",
     "\n",
     "Before training, let's calculate the memory usage of finetuning a `vicuna-13b` model. Assume we are using FP16 mixed-precision training, and the optimizer is Adam with FP32 states.\n",
     "\n",
@@ -324,7 +354,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from ray.train.lightning import LightningTrainer, LightningConfigBuilder\n",
     "from transformers import AutoConfig\n",
     "\n",
     "config = AutoConfig.from_pretrained(MODEL_NAME)\n",
@@ -342,63 +371,24 @@
     "        \"stage3_prefetch_bucket_size\": 0.9 * HIDDEN_SIZE * HIDDEN_SIZE,\n",
     "        \"stage3_param_persistence_threshold\": 10 * HIDDEN_SIZE,\n",
     "    },\n",
-    "}\n",
-    "\n",
-    "lightning_config = (\n",
-    "    LightningConfigBuilder()\n",
-    "    .module(cls=Vicuna13BModel)\n",
-    "    .trainer(\n",
-    "        max_epochs=1,\n",
-    "        accelerator=\"gpu\",\n",
-    "        precision=\"bf16-mixed\",\n",
-    "        accumulate_grad_batches=2,\n",
-    "    )\n",
-    "    .strategy(name=\"deepspeed\", config=deepspeed_configs)\n",
-    "    .checkpointing(save_top_k=0, save_weights_only=True, save_last=True)\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": [
-     "remove-cell"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "from pytorch_lightning.callbacks import TQDMProgressBar\n",
-    "\n",
-    "# Create a customized progress bar for LightningTrainer\n",
-    "class VicunaProgressBar(TQDMProgressBar):\n",
-    "    def __init__(self, num_iters_per_epoch, *args, **kwargs):\n",
-    "        super().__init__(*args, **kwargs)\n",
-    "        self.num_iters_per_epoch = num_iters_per_epoch\n",
-    "\n",
-    "    def on_train_epoch_start(self, trainer, *_):\n",
-    "        super().on_train_epoch_start(trainer, *_)\n",
-    "        self.train_progress_bar.reset(self.num_iters_per_epoch)\n",
-    "\n",
-    "\n",
-    "total_batches = processed_ds.count()\n",
-    "num_iters_per_epoch = total_batches // (NUM_WORKERS * BATCH_SIZE_PER_WORKER)\n",
-    "progress_bar = VicunaProgressBar(num_iters_per_epoch)\n",
-    "\n",
-    "\n",
-    "lightning_config.trainer(\n",
-    "    callbacks=[progress_bar],\n",
-    "    # Take a subset to accelerate release tests\n",
-    "    limit_train_batches=20,\n",
-    ")"
+    "}"
    ]
   },
   {
    "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Finally, combine all the configurations with {class}`LightningConfigBuilder <ray.train.lightning.LightningConfigBuilder>` and instantiate a LightningTrainer. "
+    "## Define your training function\n",
+    "\n",
+    "Finally, define the training function that will be launched on multiple workers. The training function is generally the same as the pure pytorch Lightning training code, with additional Ray Train utilities:\n",
+    "\n",
+    "- {class}`~ray.train.lightning.RayDeepSpeedStrategy`: Same argument list as Lightning DeepSpeedStrategy but integrated with Ray Train.\n",
+    "- {class}`~ray.train.lightning.RayLightningEnvironment`: Lightning environments for Ray cluster.\n",
+    "- {class}`~ray.train.lightning.RayTrainReportCallback`: On each epoch end, it reports the checkpoint from each worker to the ray train (distributed checkpointing).\n",
+    "- {meth}`~ray.train.lightning.prepare_trainer`: Validate your lightning Trainer configurations.\n",
+    "\n",
+    "For Ray Data ingestion, we fetched the preprocessed and sharded dataset with {meth}`~ray.train.get_dataset_shard`, and created a dataloader with {meth}`~ray.data.Dataset.iter_torch_batches`. It returns a custom iterator that replaces the Torch DataLoader.\n"
    ]
   },
   {
@@ -407,50 +397,76 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "import ray.train\n",
     "from ray.train import CheckpointConfig, RunConfig, ScalingConfig\n",
+    "from ray.train.torch import TorchTrainer\n",
+    "from ray.train.lightning import (\n",
+    "    prepare_trainer,\n",
+    "    RayDeepSpeedStrategy, \n",
+    "    RayLightningEnvironment, \n",
+    "    RayTrainReportCallback\n",
+    ")\n",
     "\n",
-    "trainer = LightningTrainer(\n",
-    "    lightning_config=lightning_config.build(),\n",
+    "\n",
+    "def train_func(config):\n",
+    "    \"\"\"Training function for each worker.\"\"\"\n",
+    "\n",
+    "    # Unpack the `train_loop_config`\n",
+    "    max_epochs = config[\"max_epochs\"]\n",
+    "    batch_size = config[\"batch_size\"]\n",
+    "    accumulate_grad_batches = config[\"accumulate_grad_batches\"]\n",
+    "\n",
+    "    model = Vicuna13BModel()\n",
+    "    \n",
+    "    # Prepare Ray Data Ingestion\n",
+    "    train_ds = ray.train.get_dataset_shard(\"train\")\n",
+    "    train_dataloader = train_ds.iter_torch_batches(batch_size=batch_size)\n",
+    "    \n",
+    "    pl_trainer = pl.Trainer(\n",
+    "        devices=\"auto\",\n",
+    "        accelerator=\"auto\",\n",
+    "        strategy=RayDeepSpeedStrategy(config=deepspeed_configs),\n",
+    "        plugins=[RayLightningEnvironment()],\n",
+    "        callbacks=[RayTrainReportCallback()],\n",
+    "        enable_checkpointing=False, # RayTrainReportCallback will save the checkpoints\n",
+    "        max_epochs=max_epochs,\n",
+    "        precision=\"bf16-mixed\",\n",
+    "        accumulate_grad_batches=accumulate_grad_batches,\n",
+    "    )\n",
+    "    pl_trainer = prepare_trainer(pl_trainer)\n",
+    "\n",
+    "    pl_trainer.fit(model, train_dataloaders=train_dataloader)\n",
+    "    \n",
+    "\n",
+    "trainer = TorchTrainer(\n",
+    "    train_loop_per_worker=train_func,\n",
+    "    train_loop_config={\n",
+    "        \"max_epochs\": 1,\n",
+    "        \"batch_size\": BATCH_SIZE_PER_WORKER,\n",
+    "        \"accumulate_grad_batches\": 2\n",
+    "    },\n",
     "    run_config=RunConfig(\n",
     "        name=\"vicuna-13b-finetune\",\n",
     "        storage_path=\"s3://anyscale-staging-data-cld-kvedzwag2qa8i5bjxuevf5i7/air-release-tests\",\n",
-    "        checkpoint_config=CheckpointConfig(\n",
-    "            num_to_keep=1,\n",
-    "            # Enable distributed checkpointing\n",
-    "            _checkpoint_keep_all_ranks=True,\n",
-    "            _checkpoint_upload_from_workers=True,\n",
-    "        ),\n",
+    "        checkpoint_config=CheckpointConfig(num_to_keep=1),\n",
     "    ),\n",
     "    scaling_config=ScalingConfig(\n",
     "        num_workers=NUM_WORKERS,\n",
     "        use_gpu=True,\n",
     "        resources_per_worker={\"CPU\": 15, \"GPU\": 1},\n",
     "    ),\n",
     "    datasets={\"train\": processed_ds},\n",
-    "    datasets_iter_config={\"batch_size\": BATCH_SIZE_PER_WORKER},\n",
     ")"
    ]
   },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "```{tip}\n",
-    "\n",
-    "Here, we highly recommend saving checkpoints with cloud storage and enabling distributed checkpointing by setting `_checkpoint_keep_all_ranks` and `_checkpoint_upload_from_workers` to True when training huge models. Otherwise, all checkpoint shards will be synced to the head node, which may introduce enormous syncing overhead and even cause out-of-memory.\n",
-    "\n",
-    "```"
-   ]
-  },
   {
    "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Model Fine-tuning\n",
     "\n",
-    "Once everything is configured in LightningTrainer, training becomes easy. Simply call `trainer.fit()`, and your workload will be scaled to the Ray cluster, initiating ZeRO-3 parallel training."
+    "Once everything is configured in TorchTrainer, training becomes easy. Simply call `trainer.fit()`, and your workload will be scaled to the Ray cluster, initiating ZeRO-3 parallel training."
    ]
   },
   {
@@ -1022,7 +1038,7 @@
     "- Training takes: 36:06 = 2166s\n",
     "- Training + initialization + checkpointing takes 2473s\n",
     "\n",
-    "Therefore, the model initialization and checkpoint syncing takes 307s. It will be amortized when you have larger datasets and spend more time on training."
+    "Model initialization and checkpoint synchronization took 307 seconds. It will be amortized as you have larger datasets and take more time to train."
    ]
   },
   {
@@ -1091,7 +1107,7 @@
    "source": [
     "import os\n",
     "\n",
-    "os.system(f\"awsv2 s3 sync {result.checkpoint.uri} /mnt/local_storage/checkpoint\")"
+    "os.system(f\"awsv2 s3 sync s3://{result.checkpoint.path} /mnt/local_storage\")"
    ]
   },
   {
@@ -1136,8 +1152,8 @@
     "    torch.save(vicuna_state_dict, os.path.join(zero_ckpt_dir, \"full_model.pt\"))\n",
     "\n",
     "\n",
-    "full_model_ckpt_path = \"/mnt/local_storage/checkpoint/model/full_model.pt\"\n",
-    "extract_fp32_ckpt_from_zero(\"/mnt/local_storage/checkpoint/model\")"
+    "full_model_ckpt_path = \"/mnt/local_storage/checkpoint.ckpt/full_model.pt\"\n",
+    "extract_fp32_ckpt_from_zero(\"/mnt/local_storage/checkpoint.ckpt\")"
    ]
   },
   {
 
@@ -177,6 +177,14 @@ py_test(
     deps = [":serve_lib"],
 )
 
+py_test(
+    name = "test_telemetry_2",
+    size = "large",
+    srcs = serve_tests_srcs,
+    tags = ["exclusive", "team:serve"],
+    deps = [":serve_lib"],
+)
+
 py_test(
     name = "test_batching",
     size = "small",
 
@@ -29,6 +29,9 @@ class ServeUsageTag(Enum):
     MULTIPLEXED_API_USED = TagKey.SERVE_MULTIPLEXED_API_USED
     HTTP_PROXY_USED = TagKey.SERVE_HTTP_PROXY_USED
     GRPC_PROXY_USED = TagKey.SERVE_GRPC_PROXY_USED
+    SERVE_STATUS_API_USED = TagKey.SERVE_STATUS_API_USED
+    SERVE_GET_APP_HANDLE_API_USED = TagKey.SERVE_GET_APP_HANDLE_API_USED
+    SERVE_GET_DEPLOYMENT_HANDLE_API_USED = TagKey.SERVE_GET_DEPLOYMENT_HANDLE_API_USED
 
     def record(self, value: str):
         """Record telemetry value."""