Install Daft on existing Kubernetes cluster and submit jobs using daf…

…t-launcher (#44) * Instructions to install Ray and Daft on an existing Kubernetes cluster, and support BYOC k8s clusters in daft-launcher Added docs for kuberay + daft installation, fixed minor linter issue * Addressed PR comments, command groups, provisioned and byoc instead of aws and k8s * switched to DaftProvider enum * Addressed PR comments * Commented out tests --------- Co-authored-by: Jessie Young <jessie@eventualcomputing.com>
Eventual-Inc · Jan 23, 2025 · db3b824 · db3b824
1 parent 8fd7a17
commit db3b824
Show file tree

Hide file tree

Showing 14 changed files with 1,508 additions and 675 deletions.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -15,6 +15,7 @@ toml = "0.8"
 comfy-table = "7.1"
 regex = "1.11"
 open = "5.3"
+semver = "1.0"
 
 [dependencies.anyhow]
 version = "1.0"

diff --git a/README.md b/README.md
@@ -10,45 +10,124 @@
 [![Latest](https://img.shields.io/github/v/tag/Eventual-Inc/daft-launcher?label=latest&logo=GitHub)](https://github.com/Eventual-Inc/daft-launcher/tags)
 [![License](https://img.shields.io/badge/daft_launcher-docs-red.svg)](https://eventual-inc.github.io/daft-launcher)
 
-# Daft Launcher
+# Daft Launcher CLI Tool
 
 `daft-launcher` is a simple launcher for spinning up and managing Ray clusters for [`daft`](https://github.com/Eventual-Inc/Daft).
-It abstracts away all the complexities of dealing with Ray yourself, allowing you to focus on running `daft` in a distributed manner.
 
-## Capabilities
+## Goal
+
+Getting started with Daft in a local environment is easy.
+However, getting started with Daft in a cloud environment is substantially more difficult.
+So much more difficult, in fact, that users end up spending more time setting up their environment than actually playing with our query engine.
 
-1. Spinning up clusters.
-2. Listing all available clusters (as well as their statuses).
-3. Submitting jobs to a cluster.
-4. Connecting to the cluster (to view the Ray dashboard and submit jobs using the Ray protocol).
-5. Spinning down clusters.
-6. Creating configuration files.
-7. Running raw SQL statements using Daft's SQL API.
+Daft Launcher aims to solve this problem by providing a simple CLI tool to remove all of this unnecessary heavy-lifting.
 
-## Currently supported cloud providers
+## Capabilities
 
-- [x] AWS
-- [ ] GCP
-- [ ] Azure
+What Daft Launcher is capable of:
+1. Spinning up clusters (Provisioned mode only)
+2. Listing all available clusters as well as their statuses (Provisioned mode only)
+3. Submitting jobs to a cluster (Both Provisioned and BYOC modes)
+4. Connecting to the cluster (Provisioned mode only)
+5. Spinning down clusters (Provisioned mode only)
+6. Creating configuration files (Both modes)
+7. Running raw SQL statements (BYOC mode only)
+
+## Operation Modes
+
+Daft Launcher supports two modes of operation:
+- **Provisioned**: Automatically provisions and manages Ray clusters in AWS
+- **BYOC (Bring Your Own Cluster)**: Connects to existing Ray clusters in Kubernetes
+
+### Command Groups and Support Matrix
+
+| Command Group | Command | Provisioned | BYOC |
+|--------------|---------|-------------|------|
+| cluster      | up      | ✅          | ❌   |
+|              | down    | ✅          | ❌   |
+|              | kill    | ✅          | ❌   |
+|              | list    | ✅          | ❌   |
+|              | connect | ✅          | ❌   |
+|              | ssh     | ✅          | ❌   |
+| job          | submit  | ✅          | ✅   |
+|              | sql     | ✅          | ❌   |
+|              | status  | ✅          | ❌   |
+|              | logs    | ✅          | ❌   |
+| config       | init    | ✅          | ✅   |
+|              | check   | ✅          | ❌   |
+|              | export  | ✅          | ❌   |
 
 ## Usage
 
-You'll need a python package manager installed.
-We highly recommend using [`uv`](https://astral.sh/blog/uv) for all things python!
+### Pre-requisites
 
-### AWS
+You'll need some python package manager installed.
+We recommend using [`uv`](https://astral.sh/blog/uv) for all things python.
 
-If you're using AWS, you'll need:
+#### For Provisioned Mode (AWS)
 1. A valid AWS account with the necessary IAM role to spin up EC2 instances.
-  This IAM role can either be created by you (assuming you have the appropriate permissions).
-  Or this IAM role will need to be created by your administrator.
-2. The [AWS CLI](https://aws.amazon.com/cli) installed and configured on your machine.
-3. To login using the AWS CLI.
-  For full instructions, please look [here](https://google.com).
-
-## Installation
-
-Using `uv` (recommended):
+   This IAM role can either be created by you (assuming you have the appropriate permissions)
+   or will need to be created by your administrator.
+2. The [AWS CLI](https://aws.amazon.com/cli/) installed and configured on your machine.
+3. Login using the AWS CLI.
+
+#### For BYOC Mode (Kubernetes)
+1. A Kubernetes cluster with Ray already deployed
+   - Can be local (minikube/kind), cloud-managed (EKS/GKE/AKS), or on-premise.
+   - See our [BYOC setup guides](./docs/byoc/README.md) for detailed instructions
+2. Ray cluster running in your Kubernetes cluster
+   - Must be installed and configured using Helm
+   - See provider-specific guides for installation steps
+3. Daft installed on the Ray cluster
+4. `kubectl` installed and configured with the correct context
+5. Appropriate permissions to access the namespace where Ray is deployed
+
+### SSH Key Setup for Provisioned Mode
+
+To enable SSH access and port forwarding for provisioned clusters, you need to:
+
+1. Create an SSH key pair (if you don't already have one):
+   ```bash
+   # Generate a new key pair
+   ssh-keygen -t rsa -b 2048 -f ~/.ssh/daft-key
+
+   # This will create:
+   #   ~/.ssh/daft-key     (private key)
+   #   ~/.ssh/daft-key.pub (public key)
+   ```
+
+2. Import the public key to AWS:
+   ```bash
+   # Import the public key to AWS
+   aws ec2 import-key-pair \
+     --key-name "daft-key" \
+     --public-key-material fileb://~/.ssh/daft-key.pub
+   ```
+
+3. Set proper permissions on your private key:
+   ```bash
+   chmod 600 ~/.ssh/daft-key
+   ```
+
+4. Update your daft configuration to use this key:
+   ```toml
+   [setup.provisioned]
+   # ... other config ...
+   ssh-private-key = "~/.ssh/daft-key"  # Path to your private key
+   ssh-user = "ubuntu"                   # User depends on the AMI (ubuntu for Ubuntu AMIs)
+   ```
+
+Notes:
+- The key name in AWS must match the name of your key file (without the extension)
+- The private key must be readable only by you (hence the chmod 600)
+- Different AMIs use different default users:
+  - Ubuntu AMIs: use "ubuntu"
+  - Amazon Linux AMIs: use "ec2-user"
+  - Make sure this matches your `ssh-user` configuration
+
+### Installation
+
+Using `uv`:
 
 ```bash
 # create project
@@ -64,32 +143,92 @@ source .venv/bin/activate
 uv pip install daft-launcher
 ```
 
-## Example
+### Example Usage
+
+All interactions with Daft Launcher are primarily communicated via a configuration file.
+By default, Daft Launcher will look inside your `$CWD` for a file named `.daft.toml`.
+You can override this behaviour by specifying a custom configuration file.
+
+#### Provisioned Mode (AWS)
 
-```sh
-# create a new configuration file
-daft init
+```bash
+# Initialize a new provisioned mode configuration
+daft config init --provider provisioned
+# or use the default provider (provisioned)
+daft config init
+
+# Cluster management
+daft provisioned up
+daft provisioned list
+daft provisioned connect
+daft provisioned ssh
+daft provisioned down
+daft provisioned kill
+
+# Job management (works in both modes)
+daft job submit example-job
+daft job status example-job
+daft job logs example-job
+
+# Configuration management
+daft config check
+daft config export
 ```
-That should create a configuration file for you.
-Feel free to modify some of the configuration values.
-If you have any confusions on a value, you can always run `daft check` to check the syntax and schema of your configuration file.
 
-Once you're content with your configuration file, go back to your terminal and run the following:
+#### BYOC Mode (Kubernetes)
 
-```sh
-# spin your cluster up
-daft up
+```bash
+# Initialize a new BYOC mode configuration
+daft config init --provider byoc
+```
 
-# list all the active clusters
-daft list
+### Configuration Files
 
-# submit a directory and command to run on the cluster
-# (where `my-job-name` should be an entry in your .daft.toml file)
-daft submit my-job-name
+You can specify a custom configuration file path with the `-c` flag:
+```bash
+daft -c my-config.toml job submit example-job
+```
 
-# run a direct SQL query on daft
-daft sql "SELECT * FROM my_table WHERE column = 'value'"
+Example Provisioned mode configuration:
+```toml
+[setup]
+name = "my-daft-cluster"
+version = "0.1.0"
+provider = "provisioned"
+dependencies = []  # Optional additional Python packages to install
+
+[setup.provisioned]
+region = "us-west-2"
+number-of-workers = 4
+ssh-user = "ubuntu"
+ssh-private-key = "~/.ssh/daft-key"
+instance-type = "i3.2xlarge"
+image-id = "ami-04dd23e62ed049936"
+iam-instance-profile-name = "YourInstanceProfileName"  # Optional
+
+[run]
+pre-setup-commands = []
+post-setup-commands = []
+
+[[job]]
+name = "example-job"
+command = "python my_script.py"
+working-dir = "~/my_project"
+```
 
-# finally, once you're done, spin the cluster down
-daft down
+Example BYOC mode configuration:
+```toml
+[setup]
+name = "my-daft-cluster"
+version = "0.1.0"
+provider = "byoc"
+dependencies = []  # Optional additional Python packages to install
+
+[setup.byoc]
+namespace = "default"  # Optional, defaults to "default"
+
+[[job]]
+name = "example-job"
+command = "python my_script.py"
+working-dir = "~/my_project"
 ```
diff --git a/docs/byoc/README.md b/docs/byoc/README.md
@@ -0,0 +1,16 @@
+# BYOC (Bring Your Own Cluster) Mode Setup for Daft
+
+This directory contains guides for setting up Ray and Daft on various Kubernetes environments for BYOC mode:
+
+- [Local Development](./local.md) - Setting up a local Kubernetes cluster for development
+- [Cloud Providers](./cloud.md) - Instructions for EKS, GKE, and AKS setups
+- [On-Premises](./on-prem.md) - Guide for on-premises Kubernetes deployments
+
+## Prerequisites
+
+Before using `daft-launcher` in BYOC mode with Kubernetes, you must:
+1. Have a running Kubernetes cluster (local, cloud-managed, or on-premise)
+2. Install and configure Ray on your Kubernetes cluster
+3. Install Daft on your cluster
+
+Please follow the appropriate guide above for your environment. 
diff --git a/docs/byoc/cloud.md b/docs/byoc/cloud.md
@@ -0,0 +1,50 @@
+# Cloud Provider Kubernetes Setup
+
+This guide covers using Ray and Daft with managed Kubernetes services from major cloud providers.
+
+## Prerequisites
+
+### General Requirements
+- `kubectl` installed and configured
+- `helm` installed
+- A running Kubernetes cluster in one of the following cloud providers:
+  - Amazon Elastic Kubernetes Service (EKS)
+  - Google Kubernetes Engine (GKE)
+  - Azure Kubernetes Service (AKS)
+
+### Cloud-Specific Requirements
+
+#### For AWS EKS
+- AWS CLI installed and configured
+- Access to an existing EKS cluster
+- `kubectl` configured for your EKS cluster:
+  ```bash
+  aws eks update-kubeconfig --name your-cluster-name --region your-region
+  ```
+
+#### For Google GKE
+- Google Cloud SDK installed
+- Access to an existing GKE cluster
+- `kubectl` configured for your GKE cluster:
+  ```bash
+  gcloud container clusters get-credentials your-cluster-name --zone your-zone
+  ```
+
+#### For Azure AKS
+- Azure CLI installed
+- Access to an existing AKS cluster
+- `kubectl` configured for your AKS cluster:
+  ```bash
+  az aks get-credentials --resource-group your-resource-group --name your-cluster-name
+  ```
+
+## Installing Ray and Daft
+
+Once your cloud Kubernetes cluster is running and `kubectl` is configured, follow the [Ray Installation Guide](./ray-installation.md) to:
+1. Install KubeRay Operator
+2. Deploy Ray cluster
+3. Install Daft
+4. Set up port forwarding
+5. Submit test jobs
+
+> **Note**: For cloud providers, you'll typically use x86/AMD64 images unless you're specifically using ARM-based instances (like AWS Graviton).