Documentation - API Reference - Changelog - Bug reports - Discord
⚠️ Cortex is currently in Development: Expect breaking changes and bugs!
Cortex is a C++ AI engine that comes with a Docker-like command-line interface and client libraries. It supports running AI models using ONNX, TensorRT-LLM, and llama.cpp engines. Cortex can function as a standalone server or be integrated as a library.
Cortex supports the following engines:
cortex.llamacpp:cortex.llamacpplibrary is a C++ inference tool that can be dynamically loaded by any server at runtime. We use this engine to support GGUF inference with GGUF models. Thellama.cppis optimized for performance on both CPU and GPU.cortex.onnxRepository:cortex.onnxis a C++ inference library for Windows that leveragesonnxruntime-genaiand uses DirectML to provide GPU acceleration across a wide range of hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.cortex.tensorrt-llm:cortex.tensorrt-llmis a C++ inference library designed for NVIDIA GPUs. It incorporates NVIDIA’s TensorRT-LLM for GPU-accelerated inference.
brew install cortex-enginewinget install cortex-enginesudo apt install cortex-engineComing Soon!
To install Cortex from the source, follow the steps below:
- Clone the Cortex repository here.
- Navigate to the
platformfolder. - Open the terminal and run the following command to build the Cortex project:
npx nest build- Make the
command.jsexecutable:
chmod +x '[path-to]/cortex/platform/dist/src/command.js'- Link the package globally:
npm linkTo run and chat with a model in Cortex:
# Start the Cortex server
cortex
# Start a model
cortex run [model_id]
# Chat with a model
cortex chat [model_id]Cortex supports a list of models available on Cortex Hub.
Here are example of models that you can use based on each supported engine:
| Model ID | Variant (Branch) | Model size | CLI command |
|---|---|---|---|
| codestral | 22b-gguf | 22B | cortex run codestral:22b-gguf |
| command-r | 35b-gguf | 35B | cortex run command-r:35b-gguf |
| gemma | 7b-gguf | 7B | cortex run gemma:7b-gguf |
| llama3 | gguf | 8B | cortex run llama3:gguf |
| llama3.1 | gguf | 8B | cortex run llama3.1:gguf |
| mistral | 7b-gguf | 7B | cortex run mistral:7b-gguf |
| mixtral | 7x8b-gguf | 46.7B | cortex run mixtral:7x8b-gguf |
| openhermes-2.5 | 7b-gguf | 7B | cortex run openhermes-2.5:7b-gguf |
| phi3 | medium-gguf | 14B - 4k ctx len | cortex run phi3:medium-gguf |
| phi3 | mini-gguf | 3.82B - 4k ctx len | cortex run phi3:mini-gguf |
| qwen2 | 7b-gguf | 7B | cortex run qwen2:7b-gguf |
| tinyllama | 1b-gguf | 1.1B | cortex run tinyllama:1b-gguf |
| Model ID | Variant (Branch) | Model size | CLI command |
|---|---|---|---|
| gemma | 7b-onnx | 7B | cortex run gemma:7b-onnx |
| llama3 | onnx | 8B | cortex run llama3:onnx |
| mistral | 7b-onnx | 7B | cortex run mistral:7b-onnx |
| openhermes-2.5 | 7b-onnx | 7B | cortex run openhermes-2.5:7b-onnx |
| phi3 | mini-onnx | 3.82B - 4k ctx len | cortex run phi3:mini-onnx |
| phi3 | medium-onnx | 14B - 4k ctx len | cortex run phi3:medium-onnx |
| Model ID | Variant (Branch) | Model size | CLI command |
|---|---|---|---|
| llama3 | 8b-tensorrt-llm-windows-ampere | 8B | cortex run llama3:8b-tensorrt-llm-windows-ampere |
| llama3 | 8b-tensorrt-llm-linux-ampere | 8B | cortex run llama3:8b-tensorrt-llm-linux-ampere |
| llama3 | 8b-tensorrt-llm-linux-ada | 8B | cortex run llama3:8b-tensorrt-llm-linux-ada |
| llama3 | 8b-tensorrt-llm-windows-ada | 8B | cortex run llama3:8b-tensorrt-llm-windows-ada |
| mistral | 7b-tensorrt-llm-linux-ampere | 7B | cortex run mistral:7b-tensorrt-llm-linux-ampere |
| mistral | 7b-tensorrt-llm-windows-ampere | 7B | cortex run mistral:7b-tensorrt-llm-windows-ampere |
| mistral | 7b-tensorrt-llm-linux-ada | 7B | cortex run mistral:7b-tensorrt-llm-linux-ada |
| mistral | 7b-tensorrt-llm-windows-ada | 7B | cortex run mistral:7b-tensorrt-llm-windows-ada |
| openhermes-2.5 | 7b-tensorrt-llm-windows-ampere | 7B | cortex run openhermes-2.5:7b-tensorrt-llm-windows-ampere |
| openhermes-2.5 | 7b-tensorrt-llm-windows-ada | 7B | cortex run openhermes-2.5:7b-tensorrt-llm-windows-ada |
| openhermes-2.5 | 7b-tensorrt-llm-linux-ada | 7B | cortex run openhermes-2.5:7b-tensorrt-llm-linux-ada |
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.
Note: For a more detailed CLI Reference documentation, please see here.
cortex cortex chat [options] [model_id] [message]cortex embeddings [options] [model_id] [message]cortex pull <model_id>This command can also pulls Hugging Face's models.
cortex run [options] [model_id]:[engine]cortex models get <model_id>cortex models list [options]cortex models remove <model_id>cortex models start [model_id]cortex models stop <model_id>cortex models update [options] <model_id>cortex engines get <engine_name>cortex engines install <engine_name> [options]cortex engines list [options]cortex engines set <engine_name> <config> <value>cortex psCortex has a REST API that runs at localhost:1337.
curl --request POST \
--url http://localhost:1337/v1/models/{model_id}/pullcurl --request POST \
--url http://localhost:1337/v1/models/{model_id}/start \
--header 'Content-Type: application/json' \
--data '{
"prompt_template": "system\n{system_message}\nuser\n{prompt}\nassistant",
"stop": [],
"ngl": 4096,
"ctx_len": 4096,
"cpu_threads": 10,
"n_batch": 2048,
"caching_enabled": true,
"grp_attn_n": 1,
"grp_attn_w": 512,
"mlock": false,
"flash_attn": true,
"cache_type": "f16",
"use_mmap": true,
"engine": "cortex.llamacpp"
}'curl http://localhost:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "",
"messages": [
{
"role": "user",
"content": "Hello"
},
],
"model": "mistral",
"stream": true,
"max_tokens": 1,
"stop": [
null
],
"frequency_penalty": 1,
"presence_penalty": 1,
"temperature": 1,
"top_p": 1
}'curl --request POST \
--url http://localhost:1337/v1/models/mistral/stopNote: Check our API documentation for a full list of available endpoints.
- For support, please file a GitHub ticket.
- For questions, join our Discord here.
- For long-form inquiries, please email hello@jan.ai.
