Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThis PR standardizes observability across multiple model providers (Anthropic, Bedrock, Gemini, Mistral, Ollama, vLLM) by moving verbose fields (model.instructions, model.tools, model.context) out of INFO records into adjacent DEBUG records, and updating INFO records to include concise metrics: model.tools.count, model.tools.selection (replacing model.tool_selection), and model.stream (replacing model.streaming). Some providers also surface model.temperature and model.output in INFO. Separately, Stage.result_evaluation and Stage.context_evaluation gain a keyword-only raises: bool parameter; when raises=True a failed evaluation raises StageException (default remains non-raising). Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
9596b60 to
1b83fb6
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
src/draive/bedrock/converse.py (1)
249-273: Add INFO/DEBUG records for streaming fallback to keep dashboards accurate.currently streaming attempts only log a warning and then call the non‑streaming path, which records
model.stream=False. This makes stream attempts indistinguishable in metrics. Emit a lightweight INFO/DEBUG pair here withmodel.stream=Truebefore falling back.Apply this diff near the start of
_completion_stream:async def _completion_stream( @@ - async with ctx.scope("model.completion.stream"): - ctx.log_warning( + async with ctx.scope("model.completion.stream"): + ctx.record( + ObservabilityLevel.INFO, + attributes={ + "model.provider": "bedrock", + "model.name": config.model, + "model.temperature": config.temperature, + "model.output": str(output), + "model.tools.count": len(tools.specifications), + "model.tools.selection": tools.selection, + "model.stream": True, + }, + ) + ctx.record( + ObservabilityLevel.DEBUG, + attributes={ + "model.instructions": instructions, + "model.tools": [tool.name for tool in tools.specifications], + "model.context": [element.to_str() for element in context], + }, + ) + ctx.log_warning( "Bedrock completion streaming is not supported yet, using regular response instead." )src/draive/ollama/chat.py (1)
201-214: Consider emitting INFO/DEBUG for streaming fallback like other providers.Add a small INFO/DEBUG pair with
model.stream=Truebefore warning/fallback to preserve accurate stream attempt metrics.async def _completion_stream( @@ - async with ctx.scope("model.completion.stream"): - ctx.log_warning( + async with ctx.scope("model.completion.stream"): + ctx.record( + ObservabilityLevel.INFO, + attributes={ + "model.provider": "ollama", + "model.name": config.model, + "model.temperature": config.temperature, + "model.output": str(output), + "model.tools.count": len(tools.specifications), + "model.tools.selection": tools.selection, + "model.stream": True, + }, + ) + ctx.record( + ObservabilityLevel.DEBUG, + attributes={ + "model.instructions": instructions, + "model.tools": [tool.name for tool in tools.specifications], + "model.context": [element.to_str() for element in context], + }, + ) + ctx.log_warning( "ollama completion streaming is not supported yet, using regular response instead." )src/draive/mistral/completions.py (2)
236-239: Typo in comment (“dafault”).Minor nit: fix spelling.
- # "null" is a dafault value... + # "null" is a default value...
183-199: Unify streaming usage metrics with the global naming (model.*).Only code occurrences of
lmm.are the two metrics in src/draive/mistral/completions.py; other matches are docstrings in client files.- ctx.record( - ObservabilityLevel.INFO, - metric="lmm.input_tokens", - value=usage.prompt_tokens or 0, - unit="tokens", - kind="counter", - attributes={"lmm.model": event.data.model}, - ) - ctx.record( - ObservabilityLevel.INFO, - metric="lmm.output_tokens", - value=usage.completion_tokens or 0, - unit="tokens", - kind="counter", - attributes={"lmm.model": event.data.model}, - ) + ctx.record( + ObservabilityLevel.INFO, + metric="model.input_tokens", + value=usage.prompt_tokens or 0, + unit="tokens", + kind="counter", + attributes={"model.provider": "mistral", "model.name": event.data.model}, + ) + ctx.record( + ObservabilityLevel.INFO, + metric="model.output_tokens", + value=usage.completion_tokens or 0, + unit="tokens", + kind="counter", + attributes={"model.provider": "mistral", "model.name": event.data.model}, + )src/draive/vllm/messages.py (1)
637-642: Consider not sending tools when selection is “none”.To reduce payload and avoid provider-side ambiguity, return NOT_GIVEN for tools in this case (similar to non‑stream path logic elsewhere).
- if tool_selection == "none": - return ("none", tools_list) + if tool_selection == "none": + return ("none", NOT_GIVEN)Confirm vLLM/OpenAI compatibility: some backends accept
tool_choice="none"with tools present, others prefer omitting tools.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
src/draive/anthropic/messages.py(1 hunks)src/draive/bedrock/converse.py(1 hunks)src/draive/gemini/generating.py(2 hunks)src/draive/mistral/completions.py(2 hunks)src/draive/ollama/chat.py(1 hunks)src/draive/stages/stage.py(6 hunks)src/draive/vllm/messages.py(2 hunks)tests/test_stage.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use absolute imports from the draive package (e.g., from draive import ...), not relative imports
Follow Ruff import ordering: standard library, third-party, then local imports
Use Python 3.12+ typing features (e.g., | unions, PEP 695 generics)
Prefer abstract/base collection types (Sequence, Mapping, Set, Iterable) over concrete list, dict, set in type hints
Define and raise custom exception types for specific error cases
Use NumPy docstring convention for all functions, classes, and methods
Skip module-level docstrings unless explicitly requested
Docstrings should include Parameters, Returns, Raises, and Notes (if needed) sectionsFormat all Python code with Ruff formatter (run via
make format); do not use any other formatter
Files:
src/draive/bedrock/converse.pysrc/draive/mistral/completions.pysrc/draive/ollama/chat.pysrc/draive/vllm/messages.pytests/test_stage.pysrc/draive/anthropic/messages.pysrc/draive/gemini/generating.pysrc/draive/stages/stage.py
src/draive/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/draive/**/*.py: Import Haiway symbols directly asfrom haiway import State, ctx
Usectx.scope(...)to bind activeStateinstances; avoid global state
Route all logging throughctx.log_*and never use
Enforce strict typing on public APIs; avoid untyped definitions andAnyexcept at verified third‑party boundaries
Prefer explicit attribute access with static types; avoid dynamicgetattrexcept at narrow boundaries
Use abstract immutable protocols (Mapping,Sequence,Iterable) instead of concretedict/list/setin public types
Usefinalwhere applicable; prefer composition over complex inheritance
Use precise unions (|), narrow withmatch/isinstance, and avoidcastunless provably safe and localized
Model immutable configuration/data ashaiway.State; avoid in‑place mutation and preferState.updated(...)/functional builders
Access active state viahaiway.ctxwithin async scopes (ctx.scope(...))
Public state methods that dispatch on the active instance must use@statemethod
Add structured, concise logs around generation calls, tool dispatch, and provider requests/responses (without secrets); add metrics viactx.recordwhere applicable
All I/O is async; keep boundaries async, usectx.spawnfor detached tasks, and avoid custom threading
Translate provider/SDK errors into typed exceptions; never raise bareException; preserve context
Compose multimodal content withMultimodalContent.of(...); useResourceContent/Referencefor blobs; wrap custom data inArtifactContent(use hidden when needed)
Public symbols require NumPy-style docstrings with Parameters/Returns/Raises and rationale when non-obvious
Avoid docstrings for internal helpers; prefer self-explanatory names; skip module-level docstrings
Never log secrets or full request bodies containing keys/tokens
Read credentials from environment variables; resolve via helpers likegetenv_str
Files:
src/draive/bedrock/converse.pysrc/draive/mistral/completions.pysrc/draive/ollama/chat.pysrc/draive/vllm/messages.pysrc/draive/anthropic/messages.pysrc/draive/gemini/generating.pysrc/draive/stages/stage.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Place tests in the tests/ directory and use pytest (with async support)
Mark asynchronous tests with pytest.mark.asyncio
Mock dependencies within test scope using stubbed functionality state (e.g., via ctx.scope)
tests/**/*.py: Unit tests must not perform network I/O; mock providers/HTTP
Usepytest-asyncio(@pytest.mark.asyncio) for coroutine tests
Prefer scoping withctx.scope(...)in tests and bind requiredStateinstances explicitly
Avoid real I/O in tests; stub provider calls and HTTP
Files:
tests/test_stage.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
PR: miquido/draive#0
File: AGENTS.md:0-0
Timestamp: 2025-09-11T15:22:15.967Z
Learning: Applies to src/draive/**/*.py : Add structured, concise logs around generation calls, tool dispatch, and provider requests/responses (without secrets); add metrics via `ctx.record` where applicable
📚 Learning: 2025-09-11T15:22:15.967Z
Learnt from: CR
PR: miquido/draive#0
File: AGENTS.md:0-0
Timestamp: 2025-09-11T15:22:15.967Z
Learning: Applies to src/draive/**/*.py : Add structured, concise logs around generation calls, tool dispatch, and provider requests/responses (without secrets); add metrics via `ctx.record` where applicable
Applied to files:
src/draive/bedrock/converse.pysrc/draive/mistral/completions.pysrc/draive/anthropic/messages.pysrc/draive/gemini/generating.py
🧬 Code graph analysis (6)
src/draive/bedrock/converse.py (5)
src/draive/models/types.py (2)
tools(465-467)tools(633-635)src/draive/multimodal/artifact.py (1)
to_str(82-96)src/draive/multimodal/content.py (2)
to_str(288-297)to_str(651-664)src/draive/multimodal/text.py (1)
to_str(61-70)src/draive/resources/types.py (1)
to_str(181-203)
src/draive/mistral/completions.py (6)
src/draive/models/types.py (2)
tools(465-467)tools(633-635)src/draive/models/tools/types.py (1)
name(73-73)src/draive/multimodal/artifact.py (1)
to_str(82-96)src/draive/multimodal/content.py (2)
to_str(288-297)to_str(651-664)src/draive/multimodal/text.py (1)
to_str(61-70)src/draive/resources/types.py (1)
to_str(181-203)
tests/test_stage.py (2)
src/draive/stages/stage.py (2)
Stage(76-2040)result_evaluation(929-990)src/draive/stages/types.py (1)
StageException(272-303)
src/draive/anthropic/messages.py (5)
src/draive/models/types.py (2)
tools(465-467)tools(633-635)src/draive/multimodal/artifact.py (1)
to_str(82-96)src/draive/multimodal/content.py (2)
to_str(288-297)to_str(651-664)src/draive/multimodal/text.py (1)
to_str(61-70)src/draive/resources/types.py (1)
to_str(181-203)
src/draive/gemini/generating.py (4)
src/draive/multimodal/artifact.py (1)
to_str(82-96)src/draive/multimodal/content.py (2)
to_str(288-297)to_str(651-664)src/draive/multimodal/text.py (1)
to_str(61-70)src/draive/resources/types.py (1)
to_str(181-203)
src/draive/stages/stage.py (3)
src/draive/evaluation/scenario.py (1)
passed(48-59)src/draive/evaluation/suite.py (2)
passed(76-87)passed(188-189)src/draive/evaluation/evaluator.py (1)
passed(98-107)
🔇 Additional comments (14)
src/draive/bedrock/converse.py (1)
131-143: INFO/DEBUG split looks good; concise INFO, verbose DEBUG.The new attributes (tools.count/selection, stream=False) at INFO and moving instructions/tools/context to DEBUG align with the PR intent and logging guidelines.
src/draive/ollama/chat.py (2)
118-124: Good INFO payload: adds temperature/output/tools/stream.This matches the new observability schema and keeps INFO concise.
125-132: DEBUG payload is appropriate.Verbose fields moved to DEBUG; tool names only, and context via safe
to_str().src/draive/anthropic/messages.py (1)
154-166: Observability rework is correct and consistent.
- INFO: tools.count/selection and stream flag included.
- DEBUG: instructions/tools/context only.
This adheres to the new schema and avoids logging bodies or secrets at INFO.Please confirm Anthropic instructions may include sensitive prompts; keeping them at DEBUG is acceptable, but ensure any deployment that ships DEBUG to external sinks is gated appropriately.
src/draive/gemini/generating.py (2)
141-153: Non‑stream INFO/DEBUG instrumentation looks good.Concise INFO with tool metrics and stream flag; detailed fields in DEBUG. Matches PR intent.
291-303: Stream path instrumentation is consistent.Accurate
model.stream=Trueplus verbose DEBUG fields. Good.src/draive/mistral/completions.py (2)
130-150: Stream path adds correct INFO/DEBUG attributes.Nice parity with non‑stream path and other providers.
318-330: Non‑stream INFO/DEBUG instrumentation is correct.Matches the PR’s schema changes and is aligned with streaming path.
src/draive/stages/stage.py (2)
935-936: API addition LGTM:raises: bool = False.Clear opt‑in failure behavior without breaking defaults.
973-985: Failure path logic is correct and preserves useful meta.Early return when passed or non‑raising; otherwise raises with performance/report in meta.
tests/test_stage.py (1)
973-977: Test updated correctly for raises=True behavior.Asserts StageException and verifies evaluation_performance meta; matches the new API.
src/draive/vllm/messages.py (3)
129-136: INFO attributes look good and consistent with schema.Includes temperature/max tokens/output/tools metrics and
model.stream=False.
137-144: DEBUG attributes are appropriate.Verbose fields moved to DEBUG; safe serialization with
to_str().
282-289: Stream path instrumentation matches the pattern.Accurate
model.stream=Trueand verbose DEBUG fields.Also applies to: 291-297
| raises: bool = False | ||
| Defaines if exception raises on failing evaluation. | ||
| meta: Meta | MetaValues | None = None |
There was a problem hiding this comment.
🧹 Nitpick
Docstring wording/typos; add explicit Raises section.
Fix “Defaines” and document when StageException is raised per guidelines.
- raises: bool = False
- Defaines if exception raises on failing evaluation.
+ raises : bool, default False
+ Determines whether to raise ``StageException`` when the evaluation fails.
+ When ``False``, the stage returns the input state unchanged on failure.
@@
- raises: bool = False
- Defaines if exception raises on failing evaluation.
+ raises : bool, default False
+ Determines whether to raise ``StageException`` when the evaluation fails.
+ When ``False``, the stage returns the input state unchanged on failure.Also consider appending a “Raises” section to both docstrings:
Returns
-------
Self
A new Stage instance that evaluates the result.
+ Raises
+ ------
+ StageException
+ When ``raises`` is True and evaluation fails.Also applies to: 1012-1013
🤖 Prompt for AI Agents
In src/draive/stages/stage.py around lines 949-951 (and similarly at 1012-1013),
fix the docstring typo "Defaines" to "Defines" for the raises parameter and
reword the parameter description to be clearer (e.g., "raises: bool = False —
defines whether a StageException is raised when evaluation fails"). Add an
explicit "Raises" section to the docstring that states: "Raises StageException
when evaluation fails and raises is True" (include any relevant context such as
which method or condition triggers the exception). Ensure the wording matches
project docstring style and apply the same changes to the other docstring at
lines 1012-1013.
1b83fb6 to
60e2dae
Compare
No description provided.