Skip to content

Commit

Permalink
PyPI config and instruction (#999)
Browse files Browse the repository at this point in the history
* pyproject.toml and instructions for publishing

* older version of setuptools

* Fix move yapf config into pyproject

* Address comments

* Address comments

* update pyproject.tom for open sourced

* Remove authors

* Remove duplication of setup

* format

* Add wheel

* remove wheel

* Fix readme path

* format

* Fix readme path for wheel

* Add comments

* Address comments

* Fix

* fix quote

* Add version option

* format

* Fix image path and footnote for pypi

* format

* Add env for all tests

* fix pytest action

* Fix message

* add print for smoke test

* change version to rc1

* Fix wheel version

* fix wheel

* Add retry for azure

* Fix smoke test

* increase version number

* 1.0.0-dev0

* normalize version

* Fix logging for retry
  • Loading branch information
Michaelvll authored Aug 8, 2022
1 parent bb284b8 commit bccc512
Show file tree
Hide file tree
Showing 25 changed files with 141 additions and 48 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install ".[all]"
pip install pytest
pip install pytest pytest-xdist pytest-env>=0.6
- name: Run tests with pytest
run: SKY_DISABLE_USAGE_COLLECTION=1 pytest ${{ matrix.test-path }}
3 changes: 2 additions & 1 deletion .github/workflows/yapf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install yapf==0.32.0
pip install toml==0.10.2
- name: Running yapf
run: |
yapf --diff --style .style.yapf --recursive ./ --exclude 'sky/skylet/ray_patches/**' --exclude 'sky/skylet/providers/**'
yapf --diff --recursive ./ --exclude 'sky/skylet/ray_patches/**' --exclude 'sky/skylet/providers/**'
3 changes: 0 additions & 3 deletions .style.yapf

This file was deleted.

6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
<p align="center">
<img src="docs/source/images/SkyPilot-logo-wide.png" alt="SkyPilot" width=55%/>
<img src="https://github.com/skypilot-org/skypilot/raw/master/docs/source/images/SkyPilot-logo-wide.png" alt="SkyPilot" width=55%/>
</p>

![pytest](https://github.com/skypilot-org/skypilot/actions/workflows/pytest.yml/badge.svg)
[![Documentation Status](https://readthedocs.org/projects/skypilot/badge/?version=latest)](https://skypilot.readthedocs.io/en/latest/?badge=latest)

SkyPilot is a framework for easily running machine learning[^1] workloads on any cloud through a unified interface. No knowledge of cloud offerings is required or expected – you simply define the workload and its resource requirements, and SkyPilot will automatically execute it on AWS, Google Cloud Platform or Microsoft Azure.

[^1]: SkyPilot is primarily targeted at machine learning workloads, but it can also support many general workloads. We're excited to hear about your use case and would love to hear more about how we can better support your requirements - please join us in [this discussion](https://github.com/skypilot-org/skypilot/discussions/1016)!

### Key features
* **Run existing projects on the cloud** with zero code changes
Expand Down Expand Up @@ -75,3 +74,6 @@ We are excited to hear your feedback! SkyPilot has two channels for engaging wit

## Contributing
We welcome and value all contributions to the project! Please refer to the [contribution guide](CONTRIBUTING.md) for more on how to get involved.

<!-- Footnote -->
[^1]: SkyPilot is primarily targeted at machine learning workloads, but it can also support many general workloads. We're excited to hear about your use case and would love to hear more about how we can better support your requirements - please join us in [this discussion](https://github.com/skypilot-org/skypilot/discussions/1016)!
4 changes: 2 additions & 2 deletions examples/managed_spot_with_storage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ file_mounts:
~/sky_workdir:
# Change this to the your own globally unique bucket name.
name: sky-workdir-zhwu
source: .
source: ./examples
persistent: false
mode: COPY
/imagenet-image:
source: s3://sky-imagenet-data

run: |
set -ex
ls ~/sky_workdir/sky
ls ~/sky_workdir/managed_spot_with_storage.yaml
ls -l /imagenet-image/datasets
1 change: 0 additions & 1 deletion format.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ tool_version_check "pylint" $PYLINT_VERSION "2.8.2"
tool_version_check "pylint-quotes" $PYLINT_QUOTES_VERSION "0.2.3"

YAPF_FLAGS=(
'--style' "$ROOT/.style.yapf"
'--recursive'
'--parallel'
)
Expand Down
18 changes: 18 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[build-system]
requires = ["setuptools>=58.0"]
build-backend = "setuptools.build_meta"


[tool.yapf]
based_on_style = "google"
allow_split_before_dict_value = false

[tool.pytest.ini_options]
required_plugins = [
"pytest-xdist",
"pytest-env>=0.6"
]
env = [
"SKYPILOT_DEBUG=1",
"SKYPILOT_DISABLE_USAGE_COLLECTION=1"
]
2 changes: 2 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ yapf==0.32.0
pylint==2.8.2
# https://github.com/edaniszewski/pylint-quotes
pylint-quotes==0.2.3
toml==0.10.2

# testing
pytest
pytest-xdist
pytest-env>=0.6
8 changes: 6 additions & 2 deletions sky/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
"""The SkyPilot package."""
import os

# Replaced with the current commit when building the wheels.
__commit__ = '{{SKYPILOT_COMMIT_SHA}}'
__version__ = '1.0.0-dev0'
__root_dir__ = os.path.dirname(os.path.abspath(__file__))

# Keep this order to avoid cyclic imports
from sky import backends
from sky import benchmark
Expand All @@ -18,8 +23,6 @@
tail_logs, download_logs, job_status, spot_status,
spot_cancel, storage_ls, storage_delete)

__root_dir__ = os.path.dirname(os.path.abspath(__file__))

# Aliases.
AWS = clouds.AWS
Azure = clouds.Azure
Expand All @@ -28,6 +31,7 @@
optimize = Optimizer.optimize

__all__ = [
'__version__',
'AWS',
'Azure',
'GCP',
Expand Down
8 changes: 6 additions & 2 deletions sky/backends/backend_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,6 +628,7 @@ def write_cluster_config(to_provision: 'resources.Resources',
# Sky remote utils.
'sky_remote_path': SKY_REMOTE_PATH,
'sky_local_path': str(local_wheel_path),
'sky_version': common_utils.normalize_version(sky.__version__),
# Local IP handling (optional).
'head_ip': None if ip_list is None else ip_list[0],
'worker_ips': None if ip_list is None else ip_list[1:],
Expand Down Expand Up @@ -983,8 +984,11 @@ def get_node_ips(cluster_yaml: str,
raise exceptions.FetchIPError(
exceptions.FetchIPError.Reason.WORKER) from e
# Retry if the ssh is not ready for the workers yet.
logger.debug('Retrying to get worker ip.')
time.sleep(backoff.current_backoff())
backoff_time = backoff.current_backoff()
logger.debug('Retrying to get worker ip '
f'[{retry_cnt}/{worker_ip_max_attempts}] in '
f'{backoff_time} seconds.')
time.sleep(backoff_time)
worker_ips = re.findall(IP_ADDR_REGEX, out)
# Ray Autoscaler On-prem Bug: ray-get-worker-ips outputs nothing!
# Workaround: List of IPs are shown in Stderr
Expand Down
26 changes: 16 additions & 10 deletions sky/backends/cloud_vm_ray_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -1085,12 +1085,18 @@ def need_ray_up(
if returncode == 0:
return False

if ('Head node fetch timed out. Failed to create head node.'
in stderr and isinstance(to_provision_cloud, clouds.Azure)):
logger.info(
'Retrying head node provisioning due to head fetching '
'timeout.')
return True
if isinstance(to_provision_cloud, clouds.Azure):
if 'Failed to invoke the Azure CLI' in stderr:
logger.info(
'Retrying head node provisioning due to Azure CLI '
'issues.')
return True
if ('Head node fetch timed out. Failed to create head node.'
in stderr):
logger.info(
'Retrying head node provisioning due to head fetching '
'timeout.')
return True
if ('Processing file mounts' in stdout and
'Running setup commands' not in stdout and
'Failed to setup head node.' in stderr):
Expand Down Expand Up @@ -1527,11 +1533,11 @@ def _provision(self,
to_provision_config.num_nodes, to_provision_config.resources)
usage_lib.messages.usage.update_cluster_status(prev_cluster_status)

# TODO(suquark): once we have sky on PYPI, we should directly
# install sky from PYPI.
# TODO(suquark): once we have sky on PyPI, we should directly
# install sky from PyPI.
with timeline.Event('backend.provision.wheel_build'):
# TODO(suquark): once we have sky on PYPI, we should directly
# install sky from PYPI.
# TODO(suquark): once we have sky on PyPI, we should directly
# install sky from PyPI.
local_wheel_path = wheel_utils.build_sky_wheel()
backoff = common_utils.Backoff(_RETRY_UNTIL_UP_INIT_GAP_SECONDS)
attempt_cnt = 1
Expand Down
5 changes: 4 additions & 1 deletion sky/backends/wheel_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

import sky
from sky.backends import backend_utils
from sky.utils import common_utils

# Local wheel path is same as the remote path.
WHEEL_DIR = pathlib.Path(os.path.expanduser(backend_utils.SKY_REMOTE_PATH))
Expand All @@ -37,7 +38,9 @@ def cleanup_wheels_dir(wheel_dir: pathlib.Path,

def _get_latest_built_wheel() -> pathlib.Path:
try:
latest_wheel = max(WHEEL_DIR.glob(f'{_PACKAGE_WHEEL_NAME}-*.whl'),
latest_wheel = max(WHEEL_DIR.glob(
f'{_PACKAGE_WHEEL_NAME}-'
f'{common_utils.normalize_version(sky.__version__)}-*.whl'),
key=os.path.getctime)
except ValueError:
raise FileNotFoundError('Could not find built Sky wheels.') from None
Expand Down
5 changes: 4 additions & 1 deletion sky/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@

logger = sky_logging.init_logger(__name__)

_CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help'])

_CLUSTER_FLAG_HELP = """\
A cluster name. If provided, either reuse an existing cluster with that name or
provision a new cluster with that name. Otherwise provision a new cluster with
Expand Down Expand Up @@ -801,7 +803,8 @@ def get_help(self, ctx):
return super().get_help(ctx)


@click.group(cls=_NaturalOrderGroup)
@click.group(cls=_NaturalOrderGroup, context_settings=_CONTEXT_SETTINGS)
@click.version_option(sky.__version__, '--version', '-v', prog_name='skypilot')
def cli():
pass

Expand Down
2 changes: 1 addition & 1 deletion sky/clouds/aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ def check_credentials(self) -> Tuple[bool, Optional[str]]:
return False, (
'AWS CLI is not installed properly.'
' Run the following commands under sky folder:'
# TODO(zhwu): after we publish sky to pypi,
# TODO(zhwu): after we publish sky to PyPI,
# change this to `pip install sky[aws]`
'\n $ pip install .[aws]'
'\n Credentials may also need to be set.' + help_str)
Expand Down
2 changes: 1 addition & 1 deletion sky/clouds/azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ def check_credentials(self) -> Tuple[bool, Optional[str]]:
return False, (
'Azure CLI returned error. Run the following commands '
'under sky folder:'
# TODO(zhwu): after we publish sky to pypi, change this to
# TODO(zhwu): after we publish sky to PyPI, change this to
# `pip install sky[azure]`
'\n $ pip install .[azure]'
'\n Credentials may also need to be set.' + help_str)
Expand Down
4 changes: 2 additions & 2 deletions sky/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -472,7 +472,7 @@ def spot_status(refresh: bool) -> List[Dict[str, Any]]:
returncode, code, 'Failed to fetch managed job statuses',
job_table_json + stderr)
except exceptions.CommandError as e:
raise RuntimeError(e.message) from e
raise RuntimeError(e.error_msg) from e

jobs = spot.load_spot_job_queue(job_table_json)
return jobs
Expand Down Expand Up @@ -522,7 +522,7 @@ def spot_cancel(name: Optional[str] = None,
'Failed to cancel managed spot job',
stdout)
except exceptions.CommandError as e:
raise RuntimeError(e.message) from e
raise RuntimeError(e.error_msg) from e

logger.info(stdout)
if 'Multiple jobs found with name' in stdout:
Expand Down
68 changes: 58 additions & 10 deletions sky/setup_files/setup.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,20 @@
"""SkyPilot.
SkyPilot is a tool to run any workload seamlessly across different cloud
providers through a unified interface. No knowledge of cloud offerings is
required or expected – you simply define the workload and its resource
requirements, and SkyPilot will automatically execute it on AWS, Google Cloud
Platform or Microsoft Azure.
SkyPilot is a framework for easily running machine learning* workloads on any cloud
through a unified interface. No knowledge of cloud offerings is required or expected –
you simply define the workload and its resource requirements, and SkyPilot will
automatically execute it on AWS, Google Cloud Platform or Microsoft Azure.
*: SkyPilot is primarily targeted at machine learning workloads, but it can also
support many general workloads. We're excited to hear about your use case and would
love to hear more about how we can better support your requirements - please join us
in [this discussion](https://github.com/skypilot-org/skypilot/discussions/1016)
"""

import io
import os
import platform
import re
import warnings

import setuptools
Expand All @@ -23,8 +29,27 @@
mac_minor = int(mac_minor)
if mac_major < 10 or (mac_major == 10 and mac_minor >= 15):
warnings.warn(
f"\'Detected MacOS version {mac_version}. MacOS version >=10.15 "
"is required to install ray>=1.9\'")
f'\'Detected MacOS version {mac_version}. MacOS version >=10.15 '
'is required to install ray>=1.9\'')


def find_version(*filepath):
# Extract version information from filepath
# Adapted from: https://github.com/ray-project/ray/blob/master/python/setup.py
with open(os.path.join(ROOT_DIR, *filepath)) as fp:
version_match = re.search(r'^__version__ = [\'"]([^\'"]*)[\'"]',
fp.read(), re.M)
if version_match:
return version_match.group(1)
raise RuntimeError('Unable to find version string.')


def parse_footnote(readme: str) -> str:
"""Parse the footnote from the README.md file."""
readme = readme.replace('<!-- Footnote -->', '#')
footnote_re = re.compile(r'\[\^([0-9]+)\]')
return footnote_re.sub(r'<sup>[\1]</sup>', readme)


install_requires = [
'wheel',
Expand Down Expand Up @@ -68,14 +93,29 @@

extras_require['all'] = sum(extras_require.values(), [])

long_description = ''
readme_filepath = 'README.md'
# When sky/backends/wheel_utils.py builds wheels, it will not contain the README.
# Skip the description for that case.
if os.path.exists(readme_filepath):
long_description = io.open(readme_filepath, 'r', encoding='utf-8').read()
long_description = parse_footnote(long_description)

setuptools.setup(
# NOTE: this affects the package.whl wheel name. When changing this (if
# ever), you must grep for '.whl' and change all corresponding wheel paths
# (templates/*.j2 and wheel_utils.py).
name='skypilot',
version='0.1.0',
version=find_version('sky', '__init__.py'),
packages=setuptools.find_packages(),
author='SkyPilot Team',
license='Apache 2.0',
readme='README.md',
description='SkyPilot: An intercloud broker for the clouds',
long_description=long_description,
long_description_content_type='text/markdown',
setup_requires=['wheel'],
requires_python='>=3.6',
install_requires=install_requires,
extras_require=extras_require,
entry_points={
Expand All @@ -88,7 +128,15 @@
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'License :: OSI Approved :: Apache Software License',
'Operating System :: OS Independent',
'Topic :: Software Development :: Libraries :: Python Modules',
'Topic :: System :: Distributed Computing',
],
description='SkyPilot',
long_description=__doc__.replace('\n', ' '),
project_urls={
'Homepage': 'https://github.com/skypilot-org/skypilot',
'Issues': 'https://github.com/skypilot-org/skypilot/issues',
'Discussion': 'https://github.com/skypilot-org/skypilot/discussions',
'Documentation': 'https://skypilot.readthedocs.io/en/latest/',
},
)
2 changes: 1 addition & 1 deletion sky/templates/aws-ray.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ setup_commands:
- (type -a pip | grep -q pip3) || echo 'alias pip=pip3' >> ~/.bashrc;
(pip3 list | grep ray | grep {{ray_version}} 2>&1 > /dev/null || pip3 install -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
pip3 uninstall skypilot -y &> /dev/null;
pip3 install "$(echo {{sky_remote_path}}/skypilot-*.whl)[aws]";
pip3 install "$(echo {{sky_remote_path}}/skypilot-{{sky_version}}*.whl)[aws]";
python3 -c "from sky.skylet.ray_patches import patch; patch()";
sudo systemctl stop unattended-upgrades;
sudo kill -9 `sudo lsof /var/lib/dpkg/lock-frontend | awk '{print $2}' | tail -n 1` || true;
Expand Down
2 changes: 1 addition & 1 deletion sky/templates/azure-ray.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ setup_commands:
- (type -a pip | grep -q pip3) || echo 'alias pip=pip3' >> ~/.bashrc;
(pip3 list | grep ray | grep {{ray_version}} 2>&1 > /dev/null || pip3 install -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app && touch ~/.sudo_as_admin_successful;
pip3 uninstall skypilot -y &> /dev/null;
pip3 install "$(echo {{sky_remote_path}}/skypilot-*.whl)[azure]";
pip3 install "$(echo {{sky_remote_path}}/skypilot-{{sky_version}}*.whl)[azure]";
python3 -c "from sky.skylet.ray_patches import patch; patch()";
sudo systemctl stop unattended-upgrades;
sudo kill -9 `sudo lsof /var/lib/dpkg/lock-frontend | awk '{print $2}' | tail -n 1` || true;
Expand Down
2 changes: 1 addition & 1 deletion sky/templates/gcp-ray.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ setup_commands:
# patch the buggy ray files and enable `-o allow_other` option for `goofys`
- (pip3 list | grep ray | grep {{ray_version}} 2>&1 > /dev/null || pip3 install -U ray[default]=={{ray_version}}) && mkdir -p ~/sky_workdir && mkdir -p ~/.sky/sky_app;
pip3 uninstall skypilot -y &> /dev/null;
pip3 install "$(echo {{sky_remote_path}}/skypilot-*.whl)[gcp]";
pip3 install "$(echo {{sky_remote_path}}/skypilot-{{sky_version}}*.whl)[gcp]";
python3 -c "from sky.skylet.ray_patches import patch; patch()";
[ -f /etc/fuse.conf ] && sudo sed -i 's/#user_allow_other/user_allow_other/g' /etc/fuse.conf || (sudo sh -c 'echo "user_allow_other" > /etc/fuse.conf'); # This is needed for `-o allow_other` option for `gcsfuse`;
# For TPU VM
Expand Down
Loading

0 comments on commit bccc512

Please sign in to comment.