Maestro is a tool for managing and running AI agents and workflows.
- Python 3.11, 3.12, or 3.13
pip install git+https://github.com/AI4quantum/maestro.git@v0.7.0
Note: If using scoring or crewai agents, install:
pip install "maestro[crewai] @ git+https://github.com/AI4quantum/maestro.git@v0.7.0"
Python Version Note: While Maestro core supports Python 3.11-3.13, some demos and examples are tested primarily with Python 3.12. For the most stable experience with demos, we recommend using Python 3.12.
- Run a workflow:
maestro run <workflow_path>
- Create an agent:
maestro create <agent_path>
- Validate a workflow or agent:
maestro validate <path>
- Serve workflows with streaming:
maestro serve <agents_file> <workflow_file>
Maestro provides real-time streaming capabilities for workflows.
# Start streaming server
maestro serve agents.yaml workflow.yaml
# Test streaming
curl -X POST "http://localhost:8000/chat/stream" \
-H "Content-Type: application/json" \
-d '{"prompt": "Your prompt"}' \
--no-buffer
- Clone the repository:
git clone https://github.com/AI4quantum/maestro.git
cd maestro
- Install development dependencies:
uv sync --all-extras
- Run tests:
uv run pytest
- Run the formatter:
uv run ruff format
- Run the linter:
uv run ruff check --fix
Maestro includes automatic evaluation capabilities using IBM's watsonx governance platform:
Note: This feature is optional and disabled by default. To opt in, set the environment variable MAESTRO_AUTO_EVALUATION=true
when running workflows. If unset or set to anything else, evaluation will be skipped.
# Enable optional evaluation (opt-in)
export MAESTRO_AUTO_EVALUATION=true
maestro run <agents_file> <workflow_file>
Alternatively, you can enable evaluation via the CLI flag:
maestro run <agents_file> <workflow_file> --evaluate
- Reference: IBM watsonx governance Agentic AI Evaluation SDK
- Prerequisites: IBM Cloud account, valid
WATSONX_APIKEY
, service access to watsonx.governance (usage may incur costs), and a Python 3.11 evaluation environment (.venv-eval
). - Agent model: Your choice of inference model for agents is independent from evaluation. For example, you can run a local Ollama model like
llama3.1:latest
for generation while using watsonx for evaluation.
For setup and usage, see the detailed guide: Watsonx Evaluation README
- Automatic Evaluation: No code changes required
- Multiple Metrics: Answer Relevance, Faithfulness, Context Relevance, Answer Similarity
- Real Scores: Actual numerical metrics (0.0-1.0 scale)
- Transparent Integration: Works with any existing agent
- Dedicated Environment: Uses
.venv-eval
(Python 3.11) for watsonx compatibility
For detailed documentation, see Watsonx Evaluation README.
The Maestro Builder (web interface) has been moved to a separate repository: maestro-builder
Example use cases are also in a separate repository: maestro-demos
Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the Apache License - see the LICENSE file for details.