Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions src/content/docs/en/pages/guides/ai-inference/quick-start.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: How to build a simple agent with AI Inference
description: The "+ Create" button accelerates your journey to start building with Azion.
meta_tags: >-
building, onboarding, create resources, Azion Web Platform, import from
GitHub
namespace: docs_guides_ai_inference_build_agent
permalink: /documentation/products/guides/ai-inference-agent/
menu_namespace: AIInferenceMenu

---



## Usage

AI Inference can be used in a [Function]

This function receives a POST request to the desired AI model and returns the response.


```javascript
const modelResponse = await Azion.AI.run("Qwen/Qwen3-30B-A3B-Instruct-2507-FP8", {
"stream": true,
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Name the european capitals"
}
]
})
return modelResponse
```

This example uses the Qwen3 model. You can change the model and the request parameters according to your preferences. Check the [AI models reference](/en/documentation/products/ai/ai-inference/models/) for more information about the available models and how to use them in your application.

Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description: >-
Qwen3-30B-A3B-Instruct-2507-FP8 is an instruction-tuned 30B-parameter FP8 causal language model for long-context (256K) text generation and reasoning, supporting chat/QA, summarization, multilingual tasks, math/science problem solving, coding, and tool-augmented workflows.
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing, qwen'
namespace: docs_edge_ai_models_qwen_3_30ba3b
menu_namespace: AIInferenceMenu
permalink: /documentation/products/ai/ai-inference/models/qwen3-30ba3b/
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description: >-
BAAI/bge-reranker-v2-m3 is a lightweight reranker model with strong multilingual capabilities.
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing'
namespace: docs_edge_ai_models_baai_bge_reranker_v2_m3
menu_namespace: AIInferenceMenu
permalink: /documentation/products/ai/ai-inference/models/baai-bge-reranker-v2-m3/
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description: >-
InternVL3 is an advanced multimodal large language model with capabilities to encompass tool usage, GUI agents, industrial image analysis, 3D vision perception, and more.
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing'
namespace: docs_edge_ai_models_internvl3
menu_namespace: AIInferenceMenu
permalink: /documentation/products/ai/ai-inference/models/internvl3/
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description: >-
Mistral 3 Small provides a range of capabilities, including text generation, image analysis, embeddings, and more.
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing, mistral'
namespace: docs_edge_ai_models_mistral_3_small
menu_namespace: AIInferenceMenu
permalink: /documentation/products/ai/ai-inference/models/mistral-3-small/
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description: >-
Nanonets-OCR-s is an OCR model that converts document images to structured Markdown, preserving layout (headings, lists, tables) and basic tags. The output is easy to parse and feed into LLM pipelines.
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing, qwen'
namespace: docs_edge_ai_models_nanonets_ocr_s
menu_namespace: AIInferenceMenu
permalink: /documentation/products/ai/ai-inference/models/nanonets-ocr-s/
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description: >-
Qwen2.5 VL AWQ 3B is a vision-language model that supports 3 bilion parameters and offers advanced capabilities such as visual analysis, agentic reasoning, long video comprehension, visual localization, and structured output generation.
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing, qwen'
namespace: docs_edge_ai_models_qwen_2_5_vl_3b
menu_namespace: AIInferenceMenu
permalink: /documentation/products/ai/ai-inference/models/qwen-2-5-vl-3b/
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description: >-
Qwen2.5 VL AWQ 7B is a vision-language model that supports 7 billion parameters, offering advanced capabilities such as visual analysis, agentic reasoning, long video comprehension, visual localization, and structured output generation.
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing, qwen'
namespace: docs_edge_ai_models_qwen_2_5_vl_7b
menu_namespace: AIInferenceMenu
permalink: /documentation/products/ai/ai-inference/models/qwen-2-5-vl-7b/
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ description: >-
Qwen3 Embedding 4B is a 4B-parameter multilingual embedding model (36 layers, 32K context) that outputs 2560‑dim vectors for text/code retrieval, classification, clustering, and bitext mining. It supports instruction-conditioned embeddings and is optimized for efficient, cross-lingual representation learning.
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing, qwen'
namespace: docs_edge_ai_models_qwen_3_embedding_4b
menu_namespace: AIInferenceMenu
permalink: /documentation/products/ai/ai-inference/models/qwen3-embedding-4b/
---

Expand Down
Original file line number Diff line number Diff line change
@@ -1,59 +1,76 @@
---
title: AI Inference
description: >-
Azion AI Inference empowers you to build and deploy intelligent applications that process data close to where it is generated.
meta_tags: 'ai inference, artificial intelligence, edge computing'
AI Inference enables you to run AI models directly on Azion’s highly distributed infrastructure.
meta_tags: 'ai inference, artificial intelligence, edge computing, ai assistant, ai agente'
namespace: docs_edge_ai_reference
permalink: /documentation/products/ai/ai-inference/
menu_namespace: AIInferenceMenu

---

import LinkButton from 'azion-webkit/linkbutton';

**AI Inference** empowers you to build and deploy intelligent applications that process data close to where it is generated. By combining artificial intelligence with edge computing, it eliminates the complexities of scaling and infrastructure management, enabling real-time decision-making and enhanced performance.
**AI Inference** enables you to run AI models directly on Azion’s highly distributed infrastructure.

With Azion AI Inference, you can seamlessly integrate AI capabilities into your applications, leveraging tools like Edge Functions, Edge Application, and the Azion API to create scalable, secure, and efficient solutions.
With Azion AI Inference, you can integrate AI capabilities into your applications, leveraging tools like **Functions**, **Applications**, **Vector Search**, and the Azion API to create scalable, secure, and efficient solutions.

AI Inference gives you access to:
Get started by deploying the AI Inference Starter Kit Template:

- **Run AI models on Edge Runtime**, enabling advanced AI architectures to execute directly at the edge for minimal latency and maximum performance.
- **Deploy autonomous AI agents** that analyze data and make decisions at the edge.
- **Real-time processing** with reduced latency and enhanced efficiency.
- All as part of a **complete platform**, including Edge Applications, Edge Functions, Edge SQL vector search, and more.
<LinkButton
label="Deploy"
link="https://console.azion.com/create/azion/starter-kit-edge-ai"
icon="ai ai-azion"
icon-pos="left"
/>

---

## Features

### Available Models
### OpenAI-Compatible API

Connect applications using Azion’s OpenAI-compatible endpoint format.

### Run Edge optimized models

Access our catalog of open-source AI models that you can run directly on Azion Runtime. These models are optimized for edge deployment with minimal resource requirements.
- Run AI models on Azion’s globally distributed edge to minimize latency and enable real-time inference.
- Access a curated catalog of open-source models, ready to run on Azion Runtime and optimized for distributed deployment with low resource footprints.
- Native inference support for large language models (LLMs) and vision-language models (VLMs).

<LinkButton link="/en/documentation/products/ai/ai-inference/models/" label="See Available Models" severity="secondary" />

### Model customization
### Fine-Tune Models with LoRA

AI Inference allows you to fine-tune, train, and specialize models using **Low-Rank Adaptation (LoRA)**. This capability enables you to optimize models for specific tasks, ensuring they are both efficient and accurate for your business needs.
AI Inference allows you to fine-tune, train, and specialize models your own data and parameters. This capability enables you to optimize models for specific tasks, ensuring they are both efficient and accurate for your business needs.

### AI Agents
---

AI Inference supports deploying AI agents like ReAct (Reasoning + Acting) at the edge, enabling advanced tasks such as context-aware responses, semantic search, and intelligent data processing.
### Examples of what you can build with AI Inference

### Integration with Edge SQL
- **AI Assistants**: Build and deploy AI assistants that serve thousands of users simultaneously with low latency, delivering real-time support, dynamic FAQs, and customer assistance without cloud overload.

Integrate with **Edge SQL** to enable vector search capabilities, allowing for semantic queries and hybrid search. This integration enhances AI-powered applications by providing precise, contextually relevant results and supporting efficient Retrieval-Augmented Generation (RAG) implementations.
- **AI Agents**: Build AI agents that automate multi‑step workflows, collapse days of manual effort into minutes, and free teams for higher‑value work—boosting productivity across operations.

---
- **Automate Threat Detection and Takedown with AI**: Combine LLMs and vision-language models (VLMs) to monitor digital assets, spot phishing/abuse patterns in text and imagery, and automate threat classification and takedown across distributed environments.

## Related products
## Integration with SQL Database

- [Edge Application](/en/documentation/products/build/edge-application/): build applications that run directly on Azion's distributed network, delivering exceptional performance and customization options.
- [Edge Functions](/en/documentation/products/build/edge-application/edge-functions/): execute code closer to end users, enhancing performance and enabling custom logic for handling requests and responses.
- [Edge SQL](/en/documentation/products/store/edge-sql/): an edge-native SQL solution designed for serverless applications, providing data storage and querying capabilities at the edge.
- [Vector Search](/en/documentation/products/store/edge-sql/vector-search/): enable semantic search engines and AI-powered recommendations through vector embeddings at the edge.
Integrate your application with **SQL Database** to enable [vector search](/en/documentation/products/store/sql-database/vector-search/) capabilities, allowing for semantic queries and hybrid search. This integration enhances AI-powered applications by providing precise, contextually relevant results and supporting efficient Retrieval-Augmented Generation (RAG) implementations.

## Limits

These are the **default limits**:

| Scope | Limit |
| ----- | ----- |
| Requests per minute | 300 |

---

Explore practical examples of how to implement AI solutions with Azion:
## Related products

- [Applications](/en/documentation/products/build/applications/): build applications that run directly on Azion's distributed infrastructure, delivering exceptional performance and customization options.
- [Functions](/en/documentation/products/build/applications/functions/): execute code closer to end users, enhancing performance and enabling custom logic for handling requests and responses.
- [SQL Database](/en/documentation/products/store/sql-database/): an edge-native SQL solution designed for serverless applications, providing data storage and querying capabilities at the edge. Also enables [Vector Search](/en/documentation/products/store/sql-database/vector-search/) for performing semantic search and AI-powered recommendations through vector embedding.

<LinkButton link="/en/documentation/architectures/artificial-intelligence/ai-agent-copilot-assistant/" label="Go to Copilot Assistant architecture" severity="secondary" />
<LinkButton link="/en/documentation/products/guides/langgraph-ai-agent-boilerplate/" label="Go to the LangGraph AI Agent template guide" severity="secondary" />
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@ description: >-
Edge AI offers a diverse range of edge-optimized models for various AI domains, ensuring efficient deployment and performance.
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing'
namespace: docs_edge_ai_models
menu_namespace: AIInferenceMenu
permalink: /documentation/products/ai/ai-inference/models/
---

import LinkButton from 'azion-webkit/linkbutton';

Azion's edge-optimized models span multiple AI domains including text generation, image analysis, embeddings, and more. Each model is designed to balance performance and resource efficiency for edge deployment.
Azion's edge-optimized models span multiple AI domains including text generation, image analysis, embeddings, and more. Each model is designed to balance performance and resource efficiency for distributed deployment.

This page provides a list of models available for use with **Edge AI**. To learn more about it, visit the [Edge AI Reference](/en/documentation/products/ai/ai-inference/).
This page provides a list of models available for use with **AI Inference**. To learn more about it, visit the [AI Inference Reference](/en/documentation/products/ai/ai-inference/).

## Available Models

Expand Down
40 changes: 40 additions & 0 deletions src/content/docs/pt-br/pages/guias/ai-inference/quick-start.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: How to build a simple agent with AI Inference
description: The "+ Create" button accelerates your journey to start building with Azion.
meta_tags: >-
building, onboarding, create resources, Azion Web Platform, import from
GitHub
namespace: docs_guides_ai_inference_build_agent
permalink: /documentacao/produtos/guias/ai-inference-agent/
menu_namespace: AIInferenceMenu

---



## Usage

AI Inference can be used in a [Function]

This function receives a POST request to the desired AI model and returns the response.


```javascript
const modelResponse = await Azion.AI.run("Qwen/Qwen3-30B-A3B-Instruct-2507-FP8", {
"stream": true,
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Name the european capitals"
}
]
})
return modelResponse
```

This example uses the Qwen3 model. You can change the model and the request parameters according to your preferences. Check the [AI models reference](/en/documentation/products/ai/ai-inference/models/) for more information about the available models and how to use them in your application.

Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ description: >-
meta_tags: 'ai inference, ai models, inteligência artificial, edge computing, qwen'
namespace: docs_edge_ai_models_qwen_3_30ba3b
permalink: /documentacao/produtos/ai/ai-inference/modelos/qwen3-30ba3b/
menu_namespace: AIInferenceMenu
---

**Qwen3-30B-A3B-Instruct-2507-FP8** é um modelo de linguagem causal FP8 ajustado por instruções com 30 bilhões de parâmetros para geração de texto de longo contexto (256K) e raciocínio, suportando chat/QA, sumarização, tarefas multilíngues, resolução de problemas de matemática/ciência, codificação e fluxos de trabalho aumentados por ferramentas.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ description: >-
meta_tags: 'ai inference, ai models, artificial intelligence, edge computing'
namespace: docs_edge_ai_models_baai_bge_reranker_v2_m3
permalink: /documentacao/produtos/ai/ai-inference/modelos/baai-bge-reranker-v2-m3/
menu_namespace: AIInferenceMenu
---

**BAAI/bge-reranker-v2-m3** é um modelo de reranking leve com fortes capacidades multilíngues. Ele é fácil de implementar e oferece inferência rápida.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ description: >-
meta_tags: 'ai inference, modelos de ia, inteligência artificial, edge computing'
namespace: docs_edge_ai_models_internvl3
permalink: /documentacao/produtos/ai/ai-inference/modelos/internvl3/
menu_namespace: AIInferenceMenu
---

**InternVL3** é um Multimodal Large Language Model avançado (MLLM) com capacidades para abranger tool calling, agentes GUI, análise de imagem industrial, percepção de visão 3D e mais.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ description: >-
meta_tags: 'ai inference, modelos ai, inteligência artificial, computação edge, mistral'
namespace: docs_edge_ai_models_mistral_3_small
permalink: /documentacao/produtos/ai/ai-inference/modelos/mistral-3-small/
menu_namespace: AIInferenceMenu
---

**Mistral 3 Small** é um modelo de linguagem que, embora sendo compacto, oferece capacidades comparáveis às de modelos maiores. Ele é ideal para agentes conversacionais, chamada de função, ajuste fino e inferência local com dados sensíveis.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ description: >-
meta_tags: 'ai inference, ai models, inteligência artificial, edge computing, qwen'
namespace: docs_edge_ai_models_nanonets_ocr_s
permalink: /documentacao/produtos/ai/ai-inference/modelos/nanonets-ocr-s/
menu_namespace: AIInferenceMenu
---

**Nanonets-OCR-s** é um modelo OCR que converte imagens de documentos em Markdown estruturado, preservando o layout (títulos, listas, tabelas) e tags básicas. A saída é fácil de analisar e alimentar em pipelines de LLM.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ description: >-
meta_tags: 'ai inference, modelos ai, inteligência artificial, edge computing, qwen'
namespace: docs_edge_ai_models_qwen_2_5_vl_3b
permalink: /documentacao/produtos/ai/ai-inference/modelos/qwen-2-5-vl-3b/
menu_namespace: AIInferenceMenu
---

O **Qwen 2.5 VL AWQ 3B** é um modelo de linguagem e visão que oferece capacidades avançadas como análise visual, raciocínio de agente, compreensão de vídeo longo, localização visual e geração de saída estruturada. Ele suporta 3 bilhões de parâmetros.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ description: >-
meta_tags: 'ai inference, modelos ai, inteligência artificial, computação edge, qwen'
namespace: docs_edge_ai_models_qwen_2_5_vl_7b
permalink: /documentacao/produtos/ai/ai-inference/modelos/qwen-2-5-vl-7b/
menu_namespace: AIInferenceMenu
---

O **Qwen 2.5 VL AWQ 7B** é um modelo de linguagem e visão que suporta 7 bilhões de parâmetros, oferecendo capacidades avançadas como análise visual, raciocínio de agente, compreensão de vídeo longo, localização visual e geração de saída estruturada.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ description: >-
meta_tags: 'ai inference, ai modelos, inteligência artificial, edge computing, qwen'
namespace: docs_edge_ai_models_qwen_3_embedding_4b
permalink: /documentacao/produtos/ai/ai-inference/modelos/qwen3-embedding-4b/
menu_namespace: AIInferenceMenu
---

**Qwen3 Embedding 4B** é um modelo de embedding multilíngue com 4 bilhões de parâmetros (36 camadas, 32K de contexto) que gera vetores de 2560 dimensões para recuperação de texto/código, classificação, agrupamento e mineração de bitexto. Ele suporta embeddings condicionados por instrução e é otimizado para aprendizado de representação eficiente e multilíngue.
Expand Down
Loading