ray_quick

Quick Start

The Ray appliance includes a built-in chat application that can be easily deployed using a pre-trained model. This guide shows how to deploy and serve this application:

Download the Appliance Retrieve the Ray appliance from the OpenNebula marketplace using the following command:
```
$ onemarketapp export 'Service Ray' Ray --datastore default
```
(Optional) Configure the Ray VM Template Depending on your specific application requirements, you may need to modify the VM template to adjust resources such as vCPU or MEMORY, or to add GPU cards for enhanced model serving capabilities.

Instantiate the Template Upon instantiation, you will be prompted to configure model-specific parameters, such as the model ID and temperature, as well as provide your Hugging Face token if required. For example, deploying the Qwen2.5-1.5B-Instruct model results in the following CONTEXT and capacity attributes:

MEMORY="8192"
VCPU="4"
...
CONTEXT=[
  DISK_ID="1",
  ETH0_DNS="172.20.0.1",
  ...
  ONEAPP_RAY_AI_FRAMEWORK="RAY",
  ONEAPP_RAY_API_OPENAI="NO",
  ONEAPP_RAY_API_PORT="8000",
  ONEAPP_RAY_API_WEB="YES",
  ONEAPP_RAY_CHATBOT_CPUS="4",
  ONEAPP_RAY_MAX_NEW_TOKENS="1024",
  ONEAPP_RAY_MODEL_ID="meta-llama/Llama-3.2-3B-Instruct",
  ONEAPP_RAY_MODEL_PROMPT="You are a helpful assisstant. Answer the question.",
  ONEAPP_RAY_MODEL_QUANTIZATION="0",
  ONEAPP_RAY_MODEL_TEMPERATURE="0.1",
  ONEAPP_RAY_MODEL_TOKEN="hf_eDJEEeq*****************",
  ...
]

Note: The number of CPUs allocated to the application is automatically derived from the available virtual CPUs.

Deploy the Application The deployment process may take several minutes as it downloads the model and required dependencies (e.g., PyTorch and FastAPI). You can monitor the status by logging into the VM:

Access the VM via SSH:

$ onevm ssh 71
Warning: Permanently added '172.20.0.5' (ED25519) to the list of known hosts.
Welcome to Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-127-generic x86_64)

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/pro

System information as of Thu Jan  2 12:01:28 UTC 2025

System load:  0.16               Processes:             130
Usage of /:   10.5% of 96.73GB   Users logged in:       0
Memory usage: 89%                IPv4 address for eth0: 172.20.0.5
Swap usage:   0%

Expanded Security Maintenance for Applications is not enabled.

8 updates can be applied immediately.
To see these additional updates run: apt list --upgradable

Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
      ___   _ __    ___
     / _ \ | '_ \  / _ \   OpenNebula Service Appliance
    | (_) || | | ||  __/
     \___/ |_| |_| \___|

 All set and ready to serve 8)

Verify the Ray Cluster Status:

root@chatbot-71:~#. ./ray_env/bin/activate
(ray_env) root@chatbot-71:~# ray status
======== Autoscaler status: 2025-01-02 12:01:36.792794 ========
Node status
---------------------------------------------------------------
Active:
 1 node_4980ccc4dd76317acd4a9bab9f72a2507387f3bb10902bebc91de186
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
  4.0/4.0 CPU
  0B/4.42GiB memory
  0B/2.21GiB object_store_memory
Demands:
  (no resource demands)

Confirm the Application Deployment:

(ray_env) root@chatbot-71:~# serve status
proxies:
  4980ccc4dd76317acd4a9bab9f72a2507387f3bb10902bebc91de186: HEALTHY
applications:
  app1:
    status: RUNNING
    message: ''
    last_deployed_time_s: 1735817946.9661372
    deployments:
      ChatBot:
        status: HEALTHY
        status_trigger: CONFIG_UPDATE_COMPLETED
      replica_states:
        RUNNING: 1
      message: ''
 target_capacity: null

Test the Inference Endpoint If one-gate is enabled in your OpenNebula installation, the inference endpoint URL should be added to the VM information. Alternatively, you can use the VM's IP address and port 8000:

$ onevm show 71 | grep RAY.*CHATBOT
ONEAPP_RAY_CHATBOT_API="http://172.20.0.5:8000/chat"
ONEAPP_RAY_CHATBOT_WEB="http://172.20.0.5:5000"

A simple client.py script is available for testing the default application included in the appliance:

$ python3 ./client.py http://172.20.0.3:8000/chat
Chat interface started. Type 'exit' to quit.
You: Hello
Server: Hello! How can I assist you today?
You: What is Cloud Computing?
Server: Cloud computing refers to the delivery of computing services over the internet, such as storage, servers, databases

Alternatively you can use a web browser to access the built-in web interface, just point it to the ONEAPP_RAY_CHATBOT_WEB URL, in this example http://172.20.0.5:5000

Home

OpenNebula Apps Overview
OS Appliances Update Policy
OneApps Quick Intro
Build Instructions
Linux Contextualization Packages
Windows Contextualization Packages
OneKE (OpenNebula Kubernetes Edition)
Virtual Router
- Overview & Release Notes
- Quick Start
- OpenRC Services
- Virtual Router Modules
  - Router4
  - NAT4
  - SDNAT4
  - Load Balancing
  - DNS
  - DHCP4
  - Keepalived: Failover
  - Wireguard VPN
- Glossary
WordPress
- Overview & Release Notes
- Features and usage
Harbor Container Registry
MinIO
Ray AI
NVIDIA Dynamo AI
Rancher CAPI
Development
- Virtual Router

ray_quick

Quick Start

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally