Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
nkwangleiGIT committed Nov 16, 2024
2 parents c1726ae + 88813ce commit 0a72839
Show file tree
Hide file tree
Showing 30 changed files with 970 additions and 444 deletions.
2 changes: 1 addition & 1 deletion docs/source/_static/custom.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ document.addEventListener('DOMContentLoaded', function () {
script.setAttribute('data-project-logo', 'https://avatars.githubusercontent.com/u/109387420?s=100&v=4');
script.setAttribute('data-modal-disclaimer', 'Results are automatically generated and may be inaccurate or contain inappropriate information. Do not include any sensitive information in your query.\n**To get further assistance, you can chat directly with the development team** by joining the [SkyPilot Slack](https://slack.skypilot.co/).');
script.setAttribute('data-modal-title', 'SkyPilot Docs AI - Ask a Question.');
script.setAttribute('data-button-position-bottom', '85px');
script.setAttribute('data-button-position-bottom', '100px');
script.async = true;
document.head.appendChild(script);
});
Expand Down
93 changes: 44 additions & 49 deletions docs/source/examples/managed-jobs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,49 +78,47 @@ We can launch it with the following:

.. code-block:: console
$ git clone https://github.com/huggingface/transformers.git ~/transformers -b v4.30.1
$ sky jobs launch -n bert-qa bert_qa.yaml
.. code-block:: yaml
# bert_qa.yaml
name: bert-qa
resources:
accelerators: V100:1
# Use spot instances to save cost.
use_spot: true
# Assume your working directory is under `~/transformers`.
# To make this example work, please run the following command:
# git clone https://github.com/huggingface/transformers.git ~/transformers -b v4.30.1
workdir: ~/transformers
use_spot: true # Use spot instances to save cost.
setup: |
envs:
# Fill in your wandb key: copy from https://wandb.ai/authorize
# Alternatively, you can use `--env WANDB_API_KEY=$WANDB_API_KEY`
# to pass the key in the command line, during `sky jobs launch`.
echo export WANDB_API_KEY=[YOUR-WANDB-API-KEY] >> ~/.bashrc
WANDB_API_KEY:
# Assume your working directory is under `~/transformers`.
workdir: ~/transformers
setup: |
pip install -e .
cd examples/pytorch/question-answering/
pip install -r requirements.txt torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install wandb
run: |
cd ./examples/pytorch/question-answering/
cd examples/pytorch/question-answering/
python run_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 50 \
--max_seq_length 384 \
--doc_stride 128 \
--report_to wandb
--model_name_or_path bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 50 \
--max_seq_length 384 \
--doc_stride 128 \
--report_to wandb \
--output_dir /tmp/bert_qa/
.. note::

Expand Down Expand Up @@ -162,55 +160,52 @@ An End-to-End Example
Below we show an `example <https://github.com/skypilot-org/skypilot/blob/master/examples/spot/bert_qa.yaml>`_ for fine-tuning a BERT model on a question-answering task with HuggingFace.

.. code-block:: yaml
:emphasize-lines: 13-16,42-45
:emphasize-lines: 8-11,41-44
# bert_qa.yaml
name: bert-qa
resources:
accelerators: V100:1
use_spot: true
# Assume your working directory is under `~/transformers`.
# To make this example work, please run the following command:
# git clone https://github.com/huggingface/transformers.git ~/transformers -b v4.30.1
workdir: ~/transformers
use_spot: true # Use spot instances to save cost.
file_mounts:
/checkpoint:
name: # NOTE: Fill in your bucket name
mode: MOUNT
setup: |
envs:
# Fill in your wandb key: copy from https://wandb.ai/authorize
# Alternatively, you can use `--env WANDB_API_KEY=$WANDB_API_KEY`
# to pass the key in the command line, during `sky jobs launch`.
echo export WANDB_API_KEY=[YOUR-WANDB-API-KEY] >> ~/.bashrc
WANDB_API_KEY:
# Assume your working directory is under `~/transformers`.
workdir: ~/transformers
setup: |
pip install -e .
cd examples/pytorch/question-answering/
pip install -r requirements.txt
pip install -r requirements.txt torch==1.12.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install wandb
run: |
cd ./examples/pytorch/question-answering/
cd examples/pytorch/question-answering/
python run_qa.py \
--model_name_or_path bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 50 \
--max_seq_length 384 \
--doc_stride 128 \
--report_to wandb \
--run_name $SKYPILOT_TASK_ID \
--output_dir /checkpoint/bert_qa/ \
--save_total_limit 10 \
--save_steps 1000
--model_name_or_path bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 50 \
--max_seq_length 384 \
--doc_stride 128 \
--report_to wandb \
--output_dir /checkpoint/bert_qa/ \
--run_name $SKYPILOT_TASK_ID \
--save_total_limit 10 \
--save_steps 1000
As HuggingFace has built-in support for periodically checkpointing, we only need to pass the highlighted arguments for setting up
the output directory and frequency of checkpointing (see more
Expand Down
35 changes: 23 additions & 12 deletions docs/source/running-jobs/environment-variables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,26 @@ User-specified environment variables

User-specified environment variables are useful for passing secrets and any arguments or configurations needed for your tasks. They are made available in ``file_mounts``, ``setup``, and ``run``.

You can specify environment variables to be made available to a task in two ways:
You can specify environment variables to be made available to a task in several ways:

- ``envs`` field (dict) in a :ref:`task YAML <yaml-spec>`:

.. code-block:: yaml
envs:
MYVAR: val
- ``--env-file`` flag in ``sky launch/exec`` :ref:`CLI <cli>`, which is a path to a `dotenv` file (takes precedence over the above):

.. code-block:: text
# sky launch example.yaml --env-file my_app.env
# cat my_app.env
MYVAR=val
WANDB_API_KEY=MY_WANDB_API_KEY
HF_TOKEN=MY_HF_TOKEN
- ``--env`` flag in ``sky launch/exec`` :ref:`CLI <cli>` (takes precedence over the above)

.. tip::
Expand Down Expand Up @@ -145,9 +156,9 @@ Environment variables for ``setup``
- 0
* - ``SKYPILOT_SETUP_NODE_IPS``
- A string of IP addresses of the nodes in the cluster with the same order as the node ranks, where each line contains one IP address.

Note that this is not necessarily the same as the nodes in ``run`` stage: the ``setup`` stage runs on all nodes of the cluster, while the ``run`` stage can run on a subset of nodes.
-
-
.. code-block:: text
1.2.3.4
Expand All @@ -158,19 +169,19 @@ Environment variables for ``setup``
- 2
* - ``SKYPILOT_TASK_ID``
- A unique ID assigned to each task.
This environment variable is available only when the task is submitted

This environment variable is available only when the task is submitted
with :code:`sky launch --detach-setup`, or run as a managed spot job.

Refer to the description in the :ref:`environment variables for run <env-vars-for-run>`.
- sky-2023-07-06-21-18-31-563597_myclus_1

For managed spot jobs: sky-managed-2023-07-06-21-18-31-563597_my-job-name_1-0
* - ``SKYPILOT_CLUSTER_INFO``
- A JSON string containing information about the cluster. To access the information, you could parse the JSON string in bash ``echo $SKYPILOT_CLUSTER_INFO | jq .cloud`` or in Python :

.. code-block:: python
import json
json.loads(
os.environ['SKYPILOT_CLUSTER_INFO']
Expand Down Expand Up @@ -200,7 +211,7 @@ Environment variables for ``run``
- 0
* - ``SKYPILOT_NODE_IPS``
- A string of IP addresses of the nodes reserved to execute the task, where each line contains one IP address. Read more :ref:`here <dist-jobs>`.
-
-
.. code-block:: text
1.2.3.4
Expand All @@ -221,13 +232,13 @@ Environment variables for ``run``
If a task is run as a :ref:`managed spot job <spot-jobs>`, then all
recoveries of that job will have the same ID value. The ID is in the format "sky-managed-<timestamp>_<job-name>(_<task-name>)_<job-id>-<task-id>", where ``<task-name>`` will appear when a pipeline is used, i.e., more than one task in a managed spot job. Read more :ref:`here <spot-jobs-end-to-end>`.
- sky-2023-07-06-21-18-31-563597_myclus_1

For managed spot jobs: sky-managed-2023-07-06-21-18-31-563597_my-job-name_1-0
* - ``SKYPILOT_CLUSTER_INFO``
- A JSON string containing information about the cluster. To access the information, you could parse the JSON string in bash ``echo $SKYPILOT_CLUSTER_INFO | jq .cloud`` or in Python :

.. code-block:: python
import json
json.loads(
os.environ['SKYPILOT_CLUSTER_INFO']
Expand Down
11 changes: 11 additions & 0 deletions examples/oci/serve-http-cpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
service:
readiness_probe: /
replicas: 2

resources:
cloud: oci
region: us-sanjose-1
ports: 8080
cpus: 2+

run: python -m http.server 8080
25 changes: 25 additions & 0 deletions examples/oci/serve-qwen-7b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# service.yaml
service:
readiness_probe: /v1/models
replicas: 2

# Fields below describe each replica.
resources:
cloud: oci
region: us-sanjose-1
ports: 8080
accelerators: {A10:1}

setup: |
conda create -n vllm python=3.12 -y
conda activate vllm
pip install vllm
pip install vllm-flash-attn
run: |
conda activate vllm
python -u -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 --port 8080 \
--model Qwen/Qwen2-7B-Instruct \
--served-model-name Qwen2-7B-Instruct \
--device=cuda --dtype auto --max-model-len=2048
Loading

0 comments on commit 0a72839

Please sign in to comment.