Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ TESTS_PATH := tests
-include .env

ifndef UV_VERSION
UV_VERSION := 0.9.0
UV_VERSION := 0.9.4
endif

.PHONY: uv_check venv sync update format lint test docs docs-server docs-format docs-lint release
Expand Down
19 changes: 13 additions & 6 deletions docs/guides/ComprehensiveEvaluation.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
# Comprehensive Evaluation Framework

Use Draive's evaluation primitives to score model outputs consistently and keep quality criteria transparent. This guide walks through evaluators, scenarios, suites, and supporting patterns for building end-to-end evaluation flows.
Use Draive's evaluation primitives to score model outputs consistently and keep quality criteria
transparent. This guide walks through evaluators, scenarios, suites, and supporting patterns for
building end-to-end evaluation flows.

## Evaluator Basics

- Evaluators are async callables decorated with `@evaluator` that return an `EvaluationScore` or a compatible numeric value.
- Thresholds determine whether an evaluation passes; named levels (`"perfect"`, `"excellent"`, `"good"`, `"fair"`, `"poor"`) are easier to reason about than raw floats.
- `EvaluationScore.performance` is reported as a percentage and can exceed 100 when a score comfortably beats its threshold.
- Evaluators are async callables decorated with `@evaluator` that return an `EvaluationScore` or a
compatible numeric value.
- Thresholds determine whether an evaluation passes; named levels (`"perfect"`, `"excellent"`,
`"good"`, `"fair"`, `"poor"`) are easier to reason about than raw floats.
- `EvaluationScore.performance` is reported as a percentage and can exceed 100 when a score
comfortably beats its threshold.

### Working with `EvaluationScore`

Expand Down Expand Up @@ -58,7 +63,8 @@ async def evaluate_response_quality(value: str, context: str) -> Sequence[Evalua
)
```

`evaluate` can run evaluators concurrently. Limit concurrency when evaluators hit rate-limited services.
`evaluate` can run evaluators concurrently. Limit concurrency when evaluators hit rate-limited
services.

```python
async def evaluate_response_quality_parallel(value: str, context: str) -> Sequence[EvaluatorResult]:
Expand Down Expand Up @@ -273,4 +279,5 @@ cases = await suite.generate_cases(
- Reporting helpers for insight into failures and regressions
- Concurrent execution to balance latency and throughput

Use evaluators for quick checks, scenarios for logical groupings, and suites for comprehensive regression coverage backed by persistent cases and automated generation.
Use evaluators for quick checks, scenarios for logical groupings, and suites for comprehensive
regression coverage backed by persistent cases and automated generation.
46 changes: 23 additions & 23 deletions docs/guides/Postgres.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ from draive.postgres import (
Postgres,
PostgresConnectionPool,
PostgresConfigurationRepository,
PostgresInstructionsRepository,
PostgresModelMemory,
PostgresTemplatesRepository,
)

async with ctx.scope(
"postgres-demo",
PostgresConfigurationRepository(), # use use postgres configurations
PostgresInstructionsRepository(), # use postgres instructions
PostgresTemplatesRepository(), # use postgres templates
disposables=(
PostgresConnectionPool.of(dsn="postgresql://draive:secret@localhost:5432/draive"),
),
Expand Down Expand Up @@ -62,36 +62,36 @@ Key capabilities:
Tune memory pressure through `cache_limit` and `cache_expiration` arguments when instantiating the
repository.

## InstructionsRepository implementation
## TemplatesRepository implementation

`PostgresInstructionsRepository` mirrors the behaviour of the in-memory instructions repository
while persisting values in a dedicated `instructions` table:
`PostgresTemplatesRepository` mirrors the behaviour of the file-backed templates repository while
storing revisions inside a dedicated `templates` table:

See the [Templates](./Templates.md) guide for authoring patterns and runtime resolution examples.

```sql
CREATE TABLE instructions (
name TEXT NOT NULL,
CREATE TABLE templates (
identifier TEXT NOT NULL,
description TEXT DEFAULT NULL,
content TEXT NOT NULL,
arguments JSONB NOT NULL DEFAULT '[]'::jsonb,
variables JSONB NOT NULL DEFAULT '{}'::jsonb,
meta JSONB NOT NULL DEFAULT '{}'::jsonb,
created TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (name, created)
PRIMARY KEY (identifier, created)
);
```

Highlights:
Capabilities:

- `available_instructions()` returns structured `InstructionsDeclaration` objects with cached
results for quick catalog views.
- `resolve(instructions, arguments)` resolves the latest instruction body, leveraging a dedicated
cache keyed by name and utilizing the provided arguments.
- `load(instructions)` loads the raw latest instructions keyed by name.
- `define(instructions, content)` stores new revisions and invalidates caches so subsequent reads
return the fresh version.
- `remove(instructions)` removes all revisions for the instruction and drops relevant cache entries.
- `templates()` returns cached `TemplateDeclaration` objects reflecting the newest revision per
identifier.
- `resolve(template)` and `resolve_str(template)` reuse a cached loader keyed by identifier to
pull the latest template body before rendering arguments.
- `define(template, content)` persists a new revision, invalidates caches, and ensures subsequent
reads see the updated payload.

This adapter is ideal when you author system prompts and tool manifests centrally and want version
history per instruction.
Use this adapter whenever your multimodal templates live alongside other structured content in
Postgres and you want on-demand caching with revision history.

## ModelMemory implementation

Expand Down Expand Up @@ -214,9 +214,9 @@ pgvector. Set `rerank=False` to return rows ordered solely by the database simil
### Payload filtering and requirements

Search and deletion accept `AttributeRequirement` instances which are evaluated against the stored
payload JSON. Requirements are translated to SQL expressions (for example, `AttributeRequirement.equal`
becomes `payload #>> '{text}' = $2`). Unsupported operators raise `NotImplementedError`, ensuring the
query surface remains explicit.
payload JSON. Requirements are translated to SQL expressions (for example,
`AttributeRequirement.equal` becomes `payload #>> '{text}' = $2`). Unsupported operators raise
`NotImplementedError`, ensuring the query surface remains explicit.

## Putting it together

Expand Down
122 changes: 122 additions & 0 deletions docs/guides/Templates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Templates

Templates provide a typed, multimodal-friendly replacement for the legacy instructions system. They
let you author reusable prompt fragments with parameter placeholders, resolve them at runtime, and
back them with any storage supported by `TemplatesRepository` (in-memory, file-backed, Postgres, or
your own adapter).

## Template Basics

```python
from draive import Template

welcome = Template.of(
"welcome-email",
arguments={"audience": "developers"},
)

personalised = welcome.with_arguments(product="Draive 2.0")
```

- `Template.of(...)` creates an immutable handle identified by `identifier`.
- `arguments` holds default values for `{% placeholders %}` embedded in the template source.
- Use `.with_arguments(...)` to merge additional arguments without mutating the original object.

Templates support multimodal values, so an argument can be plain text, `MultimodalContent`, or any
other part accepted by `MultimodalContent.of(...)`. When rendered, placeholders keep the modality of
the argument.

## Storing Templates

`TemplatesRepository` is the state that knows how to list, load, and define templates. You can pick
a storage backend depending on your workflow:

```python
from pathlib import Path
from draive import TemplatesRepository

file_repository = TemplatesRepository.file(Path("templates"))
volatile_repository = TemplatesRepository.volatile(
onboarding="Hello {% user %}!",
)
```

- `TemplatesRepository.file(...)` reads templates from `.tmpl` files on disk. It automatically
infers variables by scanning for `{% variable %}` markers.
- `TemplatesRepository.volatile(...)` keeps definitions in memory, ideal for tests or quick demos.
- Custom backends only need to provide the `listing`, `loading`, and `defining` callables. See
`PostgresTemplatesRepository` for a production-ready example.

After constructing the repository, make it available in your Haiway context:

```python
from draive import ctx

async with ctx.scope(
"demo",
file_repository,
):
...
```

Any coroutine running inside that scope can now resolve templates through the active repository
state.

## Resolving Templates at Runtime

```python
from draive import TemplatesRepository

async def render_welcome(user_name: str) -> str:
template = Template.of("welcome-email").with_arguments(user=user_name)
return await TemplatesRepository.resolve_str(template)
```

- `resolve(...)` returns a `MultimodalContent` instance, keeping non-text arguments intact.
- `resolve_str(...)` flattens everything into text, useful for providers that only understand text.
- Pass `default="..."` to fall back to inline content when the template is missing in storage.
- If neither the storage nor `default` can satisfy a request, `TemplateMissing` is raised.

You can override argument values call-by-call:

```python
await TemplatesRepository.resolve_str(
template,
arguments={"cta": "Join the beta"},
)
```

Custom arguments are merged on top of any defaults stored in the `Template` instance.

## Listing and Managing Templates

```python
declarations = await TemplatesRepository.templates()
for declaration in declarations:
print(declaration.identifier, declaration.variables)
```

- `templates()` returns `TemplateDeclaration` objects containing the identifier, optional
description, discovered variables, and metadata.
- `TemplatesRepository.define(...)` (available on custom backends) persists a new revision and
invalidates caches. File/volatile repositories expose it automatically through the state.

When defining templates programmatically, pass `variables={"user": "User name"}` to document
expected arguments. This metadata is surfaced in listings and downstream tooling.

## Migrating from InstructionsRepository

`TemplatesRepository` fully supersedes the deprecated `InstructionsRepository`. When updating older
code:

- Replace instruction names with template identifiers (`InstructionDeclaration` →
`TemplateDeclaration`).
- Swap `InstructionsRepository.resolve(...)` with `TemplatesRepository.resolve_str(...)` or
`resolve(...)` if you now need multimodal payloads.
- Update placeholders from legacy `{{ variable }}` markers to `{% variable %}`. The new syntax
distinguishes literal braces from arguments and supports multimodal values.
- Remove instruction-specific argument lists; template arguments are simple mappings keyed by the
placeholder name.

Combining Templates with `PostgresTemplatesRepository` or other storage adapters gives you revision
history, cache controls, and shared access across services while keeping the runtime API consistent.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ nav:
- Basic Stage Usage: guides/BasicStageUsage.md
- Advanced State: guides/AdvancedState.md
- Multimodal Content: guides/MultimodalContent.md
- Templates: guides/Templates.md
- Postgres: guides/Postgres.md
- Basics: guides/Basics.md
- Cookbooks:
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ build-backend = "uv_build"
[project]
name = "draive"
description = "Framework designed to simplify and accelerate the development of LLM-based applications."
version = "0.89.5"
version = "0.90.0"
readme = "README.md"
maintainers = [
{ name = "Kacper Kaliński", email = "kacper.kalinski@miquido.com" },
Expand Down
32 changes: 8 additions & 24 deletions src/draive/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,16 +121,6 @@
from draive.models import (
FunctionTool,
GenerativeModel,
Instructions,
InstructionsArgumentDeclaration,
InstructionsDeclaration,
InstructionsDefining,
InstructionsListing,
InstructionsLoading,
InstructionsMissing,
InstructionsRemoving,
InstructionsRepository,
InstructionsTemplate,
ModelContext,
ModelContextElement,
ModelException,
Expand Down Expand Up @@ -168,7 +158,6 @@
ModelToolSpecification,
ModelToolsSelection,
RealtimeGenerativeModel,
ResolveableInstructions,
Tool,
ToolAvailabilityChecking,
Toolbox,
Expand All @@ -178,7 +167,6 @@
ToolsLoading,
ToolsProvider,
ToolsSuggesting,
instructions,
tool,
)
from draive.multimodal import (
Expand All @@ -187,6 +175,10 @@
MultimodalContent,
MultimodalContentPart,
MultimodalTag,
Template,
TemplateDeclaration,
TemplateMissing,
TemplatesRepository,
TextContent,
)
from draive.parameters import (
Expand Down Expand Up @@ -275,16 +267,6 @@
"ImageEmbedding",
"ImageGeneration",
"Immutable",
"Instructions",
"InstructionsArgumentDeclaration",
"InstructionsDeclaration",
"InstructionsDefining",
"InstructionsListing",
"InstructionsLoading",
"InstructionsMissing",
"InstructionsRemoving",
"InstructionsRepository",
"InstructionsTemplate",
"LoggerObservability",
"Map",
"Memory",
Expand Down Expand Up @@ -344,7 +326,6 @@
"RealtimeConversation",
"RealtimeConversationSession",
"RealtimeGenerativeModel",
"ResolveableInstructions",
"Resource",
"ResourceAvailabilityCheck",
"ResourceContent",
Expand All @@ -366,6 +347,10 @@
"StageState",
"State",
"StateContext",
"Template",
"TemplateDeclaration",
"TemplateMissing",
"TemplatesRepository",
"TextContent",
"TextEmbedding",
"TextGeneration",
Expand Down Expand Up @@ -403,7 +388,6 @@
"getenv_float",
"getenv_int",
"getenv_str",
"instructions",
"is_missing",
"load_env",
"mmr_vector_similarity_search",
Expand Down
Loading