- 
                Notifications
    You must be signed in to change notification settings 
- Fork 22
ray_quick
The Ray appliance includes a built-in chat application that can be easily deployed using a pre-trained model. This guide shows how to deploy and serve this application:
- 
Download the Appliance Retrieve the Ray appliance from the OpenNebula marketplace using the following command: $ onemarketapp export 'Service Ray' Ray --datastore default 
- 
(Optional) Configure the Ray VM Template Depending on your specific application requirements, you may need to modify the VM template to adjust resources such as vCPUorMEMORY, or to add GPU cards for enhanced model serving capabilities.
- 
Instantiate the Template Upon instantiation, you will be prompted to configure model-specific parameters, such as the model ID and temperature, as well as provide your Hugging Face token if required. For example, deploying the Qwen2.5-1.5B-Instructmodel results in the followingCONTEXTand capacity attributes:MEMORY="8192" VCPU="4" ... CONTEXT=[ DISK_ID="1", ETH0_DNS="172.20.0.1", ... ONEAPP_RAY_AI_FRAMEWORK="RAY", ONEAPP_RAY_API_OPENAI="NO", ONEAPP_RAY_API_PORT="8000", ONEAPP_RAY_API_WEB="YES", ONEAPP_RAY_CHATBOT_CPUS="4", ONEAPP_RAY_MAX_NEW_TOKENS="1024", ONEAPP_RAY_MODEL_ID="meta-llama/Llama-3.2-3B-Instruct", ONEAPP_RAY_MODEL_PROMPT="You are a helpful assisstant. Answer the question.", ONEAPP_RAY_MODEL_QUANTIZATION="0", ONEAPP_RAY_MODEL_TEMPERATURE="0.1", ONEAPP_RAY_MODEL_TOKEN="hf_eDJEEeq*****************", ... ] Note: The number of CPUs allocated to the application is automatically derived from the available virtual CPUs. 
- 
Deploy the Application The deployment process may take several minutes as it downloads the model and required dependencies (e.g., PyTorch and FastAPI). You can monitor the status by logging into the VM: - Access the VM via SSH:
 $ onevm ssh 71 Warning: Permanently added '172.20.0.5' (ED25519) to the list of known hosts. Welcome to Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-127-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/pro System information as of Thu Jan 2 12:01:28 UTC 2025 System load: 0.16 Processes: 130 Usage of /: 10.5% of 96.73GB Users logged in: 0 Memory usage: 89% IPv4 address for eth0: 172.20.0.5 Swap usage: 0% Expanded Security Maintenance for Applications is not enabled. 8 updates can be applied immediately. To see these additional updates run: apt list --upgradable Enable ESM Apps to receive additional future security updates. See https://ubuntu.com/esm or run: sudo pro status ___ _ __ ___ / _ \ | '_ \ / _ \ OpenNebula Service Appliance | (_) || | | || __/ \___/ |_| |_| \___| All set and ready to serve 8) - Verify the Ray Cluster Status:
 root@chatbot-71:~#. ./ray_env/bin/activate (ray_env) root@chatbot-71:~# ray status ======== Autoscaler status: 2025-01-02 12:01:36.792794 ======== Node status --------------------------------------------------------------- Active: 1 node_4980ccc4dd76317acd4a9bab9f72a2507387f3bb10902bebc91de186 Pending: (no pending nodes) Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 4.0/4.0 CPU 0B/4.42GiB memory 0B/2.21GiB object_store_memory Demands: (no resource demands) - Confirm the Application Deployment:
 (ray_env) root@chatbot-71:~# serve status proxies: 4980ccc4dd76317acd4a9bab9f72a2507387f3bb10902bebc91de186: HEALTHY applications: app1: status: RUNNING message: '' last_deployed_time_s: 1735817946.9661372 deployments: ChatBot: status: HEALTHY status_trigger: CONFIG_UPDATE_COMPLETED replica_states: RUNNING: 1 message: '' target_capacity: null 
- 
Test the Inference Endpoint If one-gateis enabled in your OpenNebula installation, the inference endpoint URL should be added to the VM information. Alternatively, you can use the VM's IP address and port8000:
$ onevm show 71 | grep RAY.*CHATBOT
ONEAPP_RAY_CHATBOT_API="http://172.20.0.5:8000/chat"
ONEAPP_RAY_CHATBOT_WEB="http://172.20.0.5:5000"A simple client.py script is available for testing the default application included in the appliance:
$ python3 ./client.py http://172.20.0.3:8000/chat
Chat interface started. Type 'exit' to quit.
You: Hello
Server: Hello! How can I assist you today?
You: What is Cloud Computing?
Server: Cloud computing refers to the delivery of computing services over the internet, such as storage, servers, databasesAlternatively you can use a web browser to access the built-in web interface, just point it to the ONEAPP_RAY_CHATBOT_WEB URL, in this example http://172.20.0.5:5000

- OpenNebula Apps Overview
- OS Appliances Update Policy
- OneApps Quick Intro
- Build Instructions
- Linux Contextualization Packages
- Windows Contextualization Packages
- OneKE (OpenNebula Kubernetes Edition)
- Virtual Router
- Overview & Release Notes
- Quick Start
- OpenRC Services
- Virtual Router Modules
- Glossary
 
- WordPress
- Harbor Container Registry
- MinIO
- Ray AI
- NVIDIA Dynamo AI
- Rancher CAPI
- Development