Skip to content

Commit

Permalink
Install Daft on existing Kubernetes cluster and submit jobs using daf…
Browse files Browse the repository at this point in the history
…t-launcher (#44)

* Instructions to install Ray and Daft on an existing Kubernetes cluster, and support BYOC k8s clusters in daft-launcher

Added docs for kuberay + daft installation, fixed minor linter issue

* Addressed PR comments, command groups, provisioned and byoc instead of aws and k8s

* switched to DaftProvider enum

* Addressed PR comments

* Commented out tests

---------

Co-authored-by: Jessie Young <jessie@eventualcomputing.com>
  • Loading branch information
jessie-young and jessie-young authored Jan 23, 2025
1 parent 8fd7a17 commit db3b824
Show file tree
Hide file tree
Showing 14 changed files with 1,508 additions and 675 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ toml = "0.8"
comfy-table = "7.1"
regex = "1.11"
open = "5.3"
semver = "1.0"

[dependencies.anyhow]
version = "1.0"
Expand Down
233 changes: 186 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,45 +10,124 @@
[![Latest](https://img.shields.io/github/v/tag/Eventual-Inc/daft-launcher?label=latest&logo=GitHub)](https://github.com/Eventual-Inc/daft-launcher/tags)
[![License](https://img.shields.io/badge/daft_launcher-docs-red.svg)](https://eventual-inc.github.io/daft-launcher)

# Daft Launcher
# Daft Launcher CLI Tool

`daft-launcher` is a simple launcher for spinning up and managing Ray clusters for [`daft`](https://github.com/Eventual-Inc/Daft).
It abstracts away all the complexities of dealing with Ray yourself, allowing you to focus on running `daft` in a distributed manner.

## Capabilities
## Goal

Getting started with Daft in a local environment is easy.
However, getting started with Daft in a cloud environment is substantially more difficult.
So much more difficult, in fact, that users end up spending more time setting up their environment than actually playing with our query engine.

1. Spinning up clusters.
2. Listing all available clusters (as well as their statuses).
3. Submitting jobs to a cluster.
4. Connecting to the cluster (to view the Ray dashboard and submit jobs using the Ray protocol).
5. Spinning down clusters.
6. Creating configuration files.
7. Running raw SQL statements using Daft's SQL API.
Daft Launcher aims to solve this problem by providing a simple CLI tool to remove all of this unnecessary heavy-lifting.

## Currently supported cloud providers
## Capabilities

- [x] AWS
- [ ] GCP
- [ ] Azure
What Daft Launcher is capable of:
1. Spinning up clusters (Provisioned mode only)
2. Listing all available clusters as well as their statuses (Provisioned mode only)
3. Submitting jobs to a cluster (Both Provisioned and BYOC modes)
4. Connecting to the cluster (Provisioned mode only)
5. Spinning down clusters (Provisioned mode only)
6. Creating configuration files (Both modes)
7. Running raw SQL statements (BYOC mode only)

## Operation Modes

Daft Launcher supports two modes of operation:
- **Provisioned**: Automatically provisions and manages Ray clusters in AWS
- **BYOC (Bring Your Own Cluster)**: Connects to existing Ray clusters in Kubernetes

### Command Groups and Support Matrix

| Command Group | Command | Provisioned | BYOC |
|--------------|---------|-------------|------|
| cluster | up |||
| | down |||
| | kill |||
| | list |||
| | connect |||
| | ssh |||
| job | submit |||
| | sql |||
| | status |||
| | logs |||
| config | init |||
| | check |||
| | export |||

## Usage

You'll need a python package manager installed.
We highly recommend using [`uv`](https://astral.sh/blog/uv) for all things python!
### Pre-requisites

### AWS
You'll need some python package manager installed.
We recommend using [`uv`](https://astral.sh/blog/uv) for all things python.

If you're using AWS, you'll need:
#### For Provisioned Mode (AWS)
1. A valid AWS account with the necessary IAM role to spin up EC2 instances.
This IAM role can either be created by you (assuming you have the appropriate permissions).
Or this IAM role will need to be created by your administrator.
2. The [AWS CLI](https://aws.amazon.com/cli) installed and configured on your machine.
3. To login using the AWS CLI.
For full instructions, please look [here](https://google.com).

## Installation

Using `uv` (recommended):
This IAM role can either be created by you (assuming you have the appropriate permissions)
or will need to be created by your administrator.
2. The [AWS CLI](https://aws.amazon.com/cli/) installed and configured on your machine.
3. Login using the AWS CLI.

#### For BYOC Mode (Kubernetes)
1. A Kubernetes cluster with Ray already deployed
- Can be local (minikube/kind), cloud-managed (EKS/GKE/AKS), or on-premise.
- See our [BYOC setup guides](./docs/byoc/README.md) for detailed instructions
2. Ray cluster running in your Kubernetes cluster
- Must be installed and configured using Helm
- See provider-specific guides for installation steps
3. Daft installed on the Ray cluster
4. `kubectl` installed and configured with the correct context
5. Appropriate permissions to access the namespace where Ray is deployed

### SSH Key Setup for Provisioned Mode

To enable SSH access and port forwarding for provisioned clusters, you need to:

1. Create an SSH key pair (if you don't already have one):
```bash
# Generate a new key pair
ssh-keygen -t rsa -b 2048 -f ~/.ssh/daft-key

# This will create:
# ~/.ssh/daft-key (private key)
# ~/.ssh/daft-key.pub (public key)
```

2. Import the public key to AWS:
```bash
# Import the public key to AWS
aws ec2 import-key-pair \
--key-name "daft-key" \
--public-key-material fileb://~/.ssh/daft-key.pub
```

3. Set proper permissions on your private key:
```bash
chmod 600 ~/.ssh/daft-key
```

4. Update your daft configuration to use this key:
```toml
[setup.provisioned]
# ... other config ...
ssh-private-key = "~/.ssh/daft-key" # Path to your private key
ssh-user = "ubuntu" # User depends on the AMI (ubuntu for Ubuntu AMIs)
```

Notes:
- The key name in AWS must match the name of your key file (without the extension)
- The private key must be readable only by you (hence the chmod 600)
- Different AMIs use different default users:
- Ubuntu AMIs: use "ubuntu"
- Amazon Linux AMIs: use "ec2-user"
- Make sure this matches your `ssh-user` configuration

### Installation

Using `uv`:

```bash
# create project
Expand All @@ -64,32 +143,92 @@ source .venv/bin/activate
uv pip install daft-launcher
```

## Example
### Example Usage

All interactions with Daft Launcher are primarily communicated via a configuration file.
By default, Daft Launcher will look inside your `$CWD` for a file named `.daft.toml`.
You can override this behaviour by specifying a custom configuration file.

#### Provisioned Mode (AWS)

```sh
# create a new configuration file
daft init
```bash
# Initialize a new provisioned mode configuration
daft config init --provider provisioned
# or use the default provider (provisioned)
daft config init

# Cluster management
daft provisioned up
daft provisioned list
daft provisioned connect
daft provisioned ssh
daft provisioned down
daft provisioned kill

# Job management (works in both modes)
daft job submit example-job
daft job status example-job
daft job logs example-job

# Configuration management
daft config check
daft config export
```
That should create a configuration file for you.
Feel free to modify some of the configuration values.
If you have any confusions on a value, you can always run `daft check` to check the syntax and schema of your configuration file.

Once you're content with your configuration file, go back to your terminal and run the following:
#### BYOC Mode (Kubernetes)

```sh
# spin your cluster up
daft up
```bash
# Initialize a new BYOC mode configuration
daft config init --provider byoc
```

# list all the active clusters
daft list
### Configuration Files

# submit a directory and command to run on the cluster
# (where `my-job-name` should be an entry in your .daft.toml file)
daft submit my-job-name
You can specify a custom configuration file path with the `-c` flag:
```bash
daft -c my-config.toml job submit example-job
```

# run a direct SQL query on daft
daft sql "SELECT * FROM my_table WHERE column = 'value'"
Example Provisioned mode configuration:
```toml
[setup]
name = "my-daft-cluster"
version = "0.1.0"
provider = "provisioned"
dependencies = [] # Optional additional Python packages to install

[setup.provisioned]
region = "us-west-2"
number-of-workers = 4
ssh-user = "ubuntu"
ssh-private-key = "~/.ssh/daft-key"
instance-type = "i3.2xlarge"
image-id = "ami-04dd23e62ed049936"
iam-instance-profile-name = "YourInstanceProfileName" # Optional

[run]
pre-setup-commands = []
post-setup-commands = []

[[job]]
name = "example-job"
command = "python my_script.py"
working-dir = "~/my_project"
```

# finally, once you're done, spin the cluster down
daft down
Example BYOC mode configuration:
```toml
[setup]
name = "my-daft-cluster"
version = "0.1.0"
provider = "byoc"
dependencies = [] # Optional additional Python packages to install

[setup.byoc]
namespace = "default" # Optional, defaults to "default"

[[job]]
name = "example-job"
command = "python my_script.py"
working-dir = "~/my_project"
```
16 changes: 16 additions & 0 deletions docs/byoc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# BYOC (Bring Your Own Cluster) Mode Setup for Daft

This directory contains guides for setting up Ray and Daft on various Kubernetes environments for BYOC mode:

- [Local Development](./local.md) - Setting up a local Kubernetes cluster for development
- [Cloud Providers](./cloud.md) - Instructions for EKS, GKE, and AKS setups
- [On-Premises](./on-prem.md) - Guide for on-premises Kubernetes deployments

## Prerequisites

Before using `daft-launcher` in BYOC mode with Kubernetes, you must:
1. Have a running Kubernetes cluster (local, cloud-managed, or on-premise)
2. Install and configure Ray on your Kubernetes cluster
3. Install Daft on your cluster

Please follow the appropriate guide above for your environment.
50 changes: 50 additions & 0 deletions docs/byoc/cloud.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Cloud Provider Kubernetes Setup

This guide covers using Ray and Daft with managed Kubernetes services from major cloud providers.

## Prerequisites

### General Requirements
- `kubectl` installed and configured
- `helm` installed
- A running Kubernetes cluster in one of the following cloud providers:
- Amazon Elastic Kubernetes Service (EKS)
- Google Kubernetes Engine (GKE)
- Azure Kubernetes Service (AKS)

### Cloud-Specific Requirements

#### For AWS EKS
- AWS CLI installed and configured
- Access to an existing EKS cluster
- `kubectl` configured for your EKS cluster:
```bash
aws eks update-kubeconfig --name your-cluster-name --region your-region
```

#### For Google GKE
- Google Cloud SDK installed
- Access to an existing GKE cluster
- `kubectl` configured for your GKE cluster:
```bash
gcloud container clusters get-credentials your-cluster-name --zone your-zone
```

#### For Azure AKS
- Azure CLI installed
- Access to an existing AKS cluster
- `kubectl` configured for your AKS cluster:
```bash
az aks get-credentials --resource-group your-resource-group --name your-cluster-name
```

## Installing Ray and Daft

Once your cloud Kubernetes cluster is running and `kubectl` is configured, follow the [Ray Installation Guide](./ray-installation.md) to:
1. Install KubeRay Operator
2. Deploy Ray cluster
3. Install Daft
4. Set up port forwarding
5. Submit test jobs

> **Note**: For cloud providers, you'll typically use x86/AMD64 images unless you're specifically using ARM-based instances (like AWS Graviton).
Loading

0 comments on commit db3b824

Please sign in to comment.