Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add a tutorial for running model locally in a Python extension #90

Merged
merged 3 commits into from
Oct 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions docs/tutorials/how_to_run_local_model_in_python_extensions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
layout:
title:
visible: true
description:
visible: false
tableOfContents:
visible: true
outline:
visible: true
pagination:
visible: true
---

# How to Run a Local AI Model in Python Extensions

In the TEN framework, extensions can utilize third-party AI services or run AI models locally to improve performance and reduce costs. This tutorial explains how to run a local AI model in a Python extension and how to interact with it within the extension.

## Step 1: Check Hardware Requirements

Before running an AI model locally, ensure that your hardware meets the necessary requirements. Key components to verify include:

- **CPU/GPU**: Check if the model requires specific processing power.
- **Memory**: Ensure sufficient memory to load and run the model.

Verify that your system can support the model’s demands to ensure smooth operation.

## Step 2: Install Necessary Software and Dependencies

Once your hardware is ready, install the required software and dependencies. Follow these steps:

1. **Operating System**: Ensure compatibility with your model. Most AI frameworks support Windows, macOS, and Linux, though specific versions may be required.
2. **Python Version**: Ensure compatibility with the TEN Python runtime and the model.
3. **Required Libraries**: Install necessary libraries such as:

- TensorFlow
- PyTorch
- Numpy
- vllm

You can list the required dependencies in a `requirements.txt` file for easy installation.

4. **Download the Model**: Obtain the local version of the AI model you plan to run.

## Step 3: Implement Your Python Extension

Below is an example of how to implement a basic text generation feature using the `vllm` inference engine in a Python extension.

First, initialize the local model within the extension:

```python
from ten import (
Extension,
TenEnv,
Cmd,
CmdResult,
)
from vllm import LLM

class TextGenerationExtension(Extension):
def on_init(self, ten_env: TenEnv) -> None:
self.llm = LLM(model="<model_path>")
ten_env.on_init_done()
```

Next, implement the `on_cmd` method to handle text generation based on the provided input:

```python
def on_cmd(self, ten_env: TenEnv, cmd: Cmd) -> None:
prompt = cmd.get_property_string("prompt")

outputs = self.llm.generate(prompt)
generated_text = outputs[0].outputs[0].text

cmd_result = CmdResult.create(StatusCode.OK)
cmd_result.set_property_string("result", generated_text)
ten_env.return_result(cmd_result, cmd)
```

In this code, the `on_cmd` method retrieves the `prompt`, generates text using the model, and returns the generated text as the command result.

You can adapt this approach to implement other functionalities such as image recognition or speech-to-text by processing the relevant input types.

## Step 4: Unload the Model

It’s important to unload the model during extension cleanup to free resources:

```python
import gc
import torch

class TextGenerationExtension(Extension):
...

def on_deinit(self, ten_env: TenEnv) -> None:
del self.llm
gc.collect()
torch.cuda.empty_cache()
torch.distributed.destroy_process_group()
print("Successfully deleted the LLM pipeline and freed GPU memory!")
ten_env.on_deinit_done()
```

This ensures efficient memory management, especially when working with GPU resources.

## Summary

Running a local model in a TEN Python extension is similar to native Python development. By loading and unloading the model in the appropriate extension lifecycle methods, you can easily integrate local AI models and interact with them effectively.