Skip to content

Conversation

@Anxhela21
Copy link
Contributor

@Anxhela21 Anxhela21 commented Dec 3, 2025

Description

This PR introduces several changes to the Lightspeed Stack query and streaming query pipeline with a focus on Solr vector I/O support.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added direct vector database querying with support for Solr filtering parameters in search requests.
    • Introduced enhanced document referencing with proper URL construction and metadata extraction for offline/online modes.
  • Bug Fixes

    • Improved response reliability with safer defaults for tool-related fields and RAG chunk handling.
  • Configuration

    • Expanded multi-provider setup with vector store, embedding models, and safety configurations.
    • Enabled library client mode for stack integration.
  • Improvements

    • Enhanced streaming responses to include referenced documents and merged RAG results.

@openshift-ci
Copy link

openshift-ci bot commented Dec 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Walkthrough

This PR introduces direct vector database querying for RAG functionality and restructures the configuration and query endpoints. Changes include expanded run.yaml provider/backend configuration, a new direct vector DB query path in query_v2 and streaming_query_v2 endpoints, updated request/response models to expose referenced documents, and constants updates for RAG tool defaults and document URL references.

Changes

Cohort / File(s) Summary
Configuration & Setup
run.yaml, lightspeed-stack.yaml
Expanded multi-provider configuration with inference, files, safety, scoring providers; restructured storage backends and metadata stores; switched to library client config path; added transcript collection settings.
Query Endpoint v1
src/app/endpoints/query.py, src/app/endpoints/streaming_query.py
Added module-level OFFLINE flag; changed RAG chunk assembly to use converted rag_chunks_dict; added safe defaults (empty lists) for tool_calls and tool_results instead of None propagation.
Query Endpoint v2
src/app/endpoints/query_v2.py, src/app/endpoints/streaming_query_v2.py
Implemented direct vector DB querying path with document extraction and URL construction; merged RAG chunks from vector queries with Responses API results; updated function signatures to accept Solr params and skip_rag_tools flags; exposed merged referenced documents in return values.
Request/Response Models
src/models/requests.py
Added optional solr field to QueryRequest for vector query filtering; added typing imports for Self and Any.
Constants & Utilities
src/constants.py, src/utils/endpoints.py, src/app/endpoints/shields.py, src/app/main.py
Updated DEFAULT_RAG_TOOL from "knowledge_search" to "file_search"; added MIMIR_DOC_URL constant; enhanced RAG chunk processing skip logic for "_search" sources; added fault handler for SIGUSR1; added non-functional commented deregistration block.
Unit Tests
tests/unit/app/endpoints/test_query_v2.py, tests/unit/app/endpoints/test_streaming_query_v2.py
Updated test expectations to account for direct vector querying (mocked vector_io.query); adjusted tool type expectations from {"file_search", "mcp"} to {"mcp"} when RAG tools are skipped; added Solr params handling in mock assertions.

Sequence Diagram

sequenceDiagram
    participant Client
    participant QueryEndpoint as Query Endpoint (v2)
    participant VectorDB as Vector DB
    participant ResponsesAPI as Responses API
    participant LLMBackend as LLM Backend

    Client->>QueryEndpoint: send query request + optional solr filters
    
    alt Direct Vector Querying Path
        QueryEndpoint->>VectorDB: query(vector_store_id, k=5, solr_params)
        VectorDB-->>QueryEndpoint: return chunks with metadata & doc IDs
        QueryEndpoint->>QueryEndpoint: extract doc IDs, URLs, construct RAGChunk objects
        QueryEndpoint->>QueryEndpoint: build rag_context from top 5 chunks
    end
    
    QueryEndpoint->>ResponsesAPI: call with input_text + rag_context (injected) + skip_rag_tools=True
    ResponsesAPI->>LLMBackend: forward request to LLM
    LLMBackend-->>ResponsesAPI: return response
    ResponsesAPI-->>QueryEndpoint: return response + parsed referenced documents
    
    QueryEndpoint->>QueryEndpoint: merge all_rag_chunks (vector query + response API)
    QueryEndpoint->>QueryEndpoint: merge all_referenced_documents
    QueryEndpoint-->>Client: return response with merged RAG chunks & documents
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • are-ces
  • tisnik
  • asimurka
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main feature: adding Solr Vector I/O Provider support to LCORE. It directly reflects the primary changes across configuration, query handling, and vector database integration.
Docstring Coverage ✅ Passed Docstring coverage is 96.45% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Anxhela21 Anxhela21 marked this pull request as ready for review December 8, 2025 16:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/utils/checks.py (1)

60-78: Update docstring to reflect new parent directory validation.

The docstring doesn't describe the new behavior introduced in lines 82-91. When must_exists=False and must_be_writable=True, the function now validates parent directory existence and writability.

Apply this diff to update the docstring:

 def directory_check(
     path: FilePath, must_exists: bool, must_be_writable: bool, desc: str
 ) -> None:
     """
-    Ensure the given path is an existing directory.
+    Ensure the given path is a valid directory.
 
-    If the path is not a directory, raises InvalidConfigurationError.
+    If must_exists is True, verifies the directory exists. If must_be_writable
+    is True and the directory doesn't exist, validates that its parent directory
+    exists and is writable.
 
     Parameters:
         path (FilePath): Filesystem path to validate.
-        must_exists (bool): Should the directory exists?
-        must_be_writable (bool): Should the check test if directory is writable?
+        must_exists (bool): If True, the directory must exist.
+        must_be_writable (bool): If True, the directory (or its parent if the
+            directory doesn't exist) must be writable.
         desc (str): Short description of the value being checked; used in error
         messages.
 
     Raises:
         InvalidConfigurationError: If `path` does not point to a directory or
-        is not writable when required.
+        is not writable when required, or if parent directory validation fails.
     """
src/app/endpoints/query.py (1)

746-769: Avoid relying on vector_db_ids being undefined when no_tools=True

In retrieve_response, vector_db_ids is only populated in the else branch (when no_tools is False), but it’s later read inside the RAG block:

try:
    if vector_db_ids:
        ...

When no_tools is True, this condition currently triggers an UnboundLocalError that is caught by the blanket except, producing misleading “Failed to query vector database for chunks” warnings even though RAG is intentionally disabled.

Define vector_db_ids up front and use it consistently so the RAG path is skipped cleanly when no_tools is set:

@@
-    agent, conversation_id, session_id = await get_agent(
+    agent, conversation_id, session_id = await get_agent(
         client,
         model_id,
         system_prompt,
         available_input_shields,
         available_output_shields,
         query_request.conversation_id,
         query_request.no_tools or False,
     )
@@
-    logger.debug("Conversation ID: %s, session ID: %s", conversation_id, session_id)
-    # bypass tools and MCP servers if no_tools is True
-    if query_request.no_tools:
+    logger.debug("Conversation ID: %s, session ID: %s", conversation_id, session_id)
+
+    # Track available vector DBs (used for toolgroups and RAG); default to empty
+    vector_db_ids: list[str] = []
+
+    # bypass tools and MCP servers if no_tools is True
+    if query_request.no_tools:
         mcp_headers = {}
         agent.extra_headers = {}
         toolgroups = None
     else:
@@
-        # Include RAG toolgroups when vector DBs are available
-        vector_dbs = await client.vector_dbs.list()
-        vector_db_ids = [vdb.identifier for vdb in vector_dbs]
+        # Include RAG toolgroups when vector DBs are available
+        vector_dbs = await client.vector_dbs.list()
+        vector_db_ids = [vdb.identifier for vdb in vector_dbs]

With this change, the later if vector_db_ids: guard in the RAG block works as intended without spurious exceptions/logs when no_tools=True.

🧹 Nitpick comments (11)
tests/unit/cache/test_sqlite_cache.py (1)

394-415: Stronger field-level checks look good; final assertions are slightly redundant

The new field-by-field assertions and the str(...) comparison for doc_url correctly exercise the serialization/deserialization path and the widened doc_url type. Functionally this test now does a much better job of pinning down the expected shape of CacheEntry.

You could trim a bit of redundancy by either:

  • Dropping the later assert retrieved_entries[0].referenced_documents is not None / title check, since they’re implied by the new assertions, or
  • Reusing retrieved_entry everywhere for consistency.

This is purely a readability/maintenance tweak; behavior is fine as-is.

src/models/requests.py (1)

3-168: Clarify solr field shape and avoid set literal in examples

The new solr: Optional[dict[str, Any]] field is a reasonable flexible hook for Solr filters and matches how it’s consumed in vector_io.query. Two small improvements:

  • The example currently uses a Python set:
examples=[
    {"fq": {"product:*openshift*", "product_version:*4.16*"}},
]

which is unusual in a JSON-ish example. A list is clearer and serializes more predictably:

-        examples=[
-            {"fq": {"product:*openshift*", "product_version:*4.16*"}},
-        ],
+        examples=[
+            {"fq": ["product:*openshift*", "product_version:*4.16*"]},
+        ],
  • Consider briefly documenting in the docstring or description that solr is an opaque bag of Solr parameters (e.g., fq, q.op, etc.) so clients know they’re passing them through verbatim.
src/models/context.py (1)

8-51: New RAG fields on ResponseGeneratorContext look good; update docstring

vector_io_rag_chunks: list[RAGChunk] | None and vector_io_referenced_docs: list[ReferencedDocument] | None are well-typed and align with the response models.

To keep documentation in sync, consider extending the class docstring’s Attributes section to include brief descriptions of these two new fields.

src/app/endpoints/query.py (1)

771-897: Refine RAG integration: config for OFFLINE source URL and reduce verbose payload logging

The new vector-DB/RAG flow is a good step, but two aspects are worth tightening up:

  1. Hard‑coded internal base URL and OFFLINE flag

    • OFFLINE = True and the inline "https://mimir.corp.redhat.com" base URL (both for doc_url and source) effectively bake an environment-specific detail into the endpoint.
    • This makes it hard to run in other environments and leaks internal domain knowledge in code.

    Consider:

-# TODO: move this setting to a higher level configuration
-OFFLINE = True
+OFFLINE = True # TODO: drive from configuration instead of a module constant


and pushing the base URL into configuration (or deriving it from chunk metadata) so you don’t need `"https://mimir.corp.redhat.com"` literals here.

2. **Logging entire vector_io query_response at info level**

- `logger.info(f"The query response total payload: {query_response}")` will dump all retrieved chunks and metadata on every query.
- This can be very noisy and may expose large or sensitive content in logs.

Recommend downgrading and summarizing:

```diff
-            logger.info(f"The query response total payload: {query_response}")
+            logger.debug("vector_io.query response: %s", query_response)
+            logger.info(
+                "vector_io.query returned %d chunks for vector_db_id=%s",
+                len(getattr(query_response, "chunks", []) or []),
+                vector_db_id,
+            )

This keeps observability while avoiding heavy, content-rich logs at info level.

run.yaml (1)

117-130: Consider parameterizing the Solr URL for production deployments.

The hardcoded localhost:8983 URL will not work in containerized or distributed environments. Consider using an environment variable reference similar to line 19.

   vector_io:
     - provider_id: solr-vector
       provider_type: remote::solr_vector_io
       config:
-        solr_url: "http://localhost:8983/solr"
+        solr_url: ${env.SOLR_URL:-http://localhost:8983/solr}
         collection_name: "portal-rag"
src/app/endpoints/streaming_query.py (6)

81-84: Hardcoded OFFLINE flag should be moved to configuration.

The TODO acknowledges this needs to be addressed. This module-level constant affects source URL resolution behavior and should be configurable, especially for toggling between environments.

Would you like me to help implement this as part of the configuration system, or open an issue to track this task?


165-171: Update docstring to document the new vector_io_referenced_docs parameter.

The function signature now includes vector_io_referenced_docs, but the docstring's Parameters section doesn't document it.

     Parameters:
         metadata_map (dict): A mapping containing metadata about
         referenced documents.
         summary (TurnSummary): Summary of the conversation turn.
         token_usage (TokenCounter): Token usage information.
         media_type (str): The media type for the response format.
+        vector_io_referenced_docs (list[ReferencedDocument] | None): Referenced
+        documents from vector_io query.
 
     Returns:

487-515: Reconsider emitting "No Violation" token for successful shield validations.

Yielding "No Violation" as a LLM_TOKEN_EVENT for every successful shield validation may create noise in the client stream. Clients typically only need to know about violations, not successful passes. Consider yielding nothing or using a different event type (e.g., debug/info level).

         else:
-            # Yield "No Violation" message for successful shield validations
-            yield stream_event(
-                data={
-                    "id": chunk_id,
-                    "token": "No Violation",
-                },
-                event_type=LLM_TOKEN_EVENT,
-                media_type=media_type,
-            )
+            # Successful shield validations are silently passed
+            pass

1230-1231: Use more Pythonic empty list check.

Comparing with == [] is less idiomatic than using truthiness.

         # Convert empty list to None for consistency with existing behavior
-        if toolgroups == []:
+        if not toolgroups:
             toolgroups = None

1296-1298: Missing pylint: disable=broad-except comment for consistency.

Line 1086 has the pylint disable comment for the broad except, but line 1296 does not, which may trigger linting warnings.

-    except Exception as e:
+    except Exception as e:  # pylint: disable=broad-except
         logger.warning("Failed to query vector database for chunks: %s", e)
         logger.debug("Vector DB query error details: %s", traceback.format_exc())

1304-1307: RAG context formatting could benefit from a helper function.

The RAG context formatting logic could be extracted for reusability and testability.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0550f02 and eaf66dd.

📒 Files selected for processing (11)
  • run.yaml (3 hunks)
  • src/app/endpoints/query.py (6 hunks)
  • src/app/endpoints/streaming_query.py (14 hunks)
  • src/models/context.py (2 hunks)
  • src/models/requests.py (2 hunks)
  • src/models/responses.py (1 hunks)
  • src/utils/checks.py (1 hunks)
  • tests/unit/app/endpoints/test_conversations_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query.py (4 hunks)
  • tests/unit/cache/test_postgres_cache.py (1 hunks)
  • tests/unit/cache/test_sqlite_cache.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/cache/test_postgres_cache.py
  • tests/unit/app/endpoints/test_query.py
  • tests/unit/cache/test_sqlite_cache.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/cache/test_postgres_cache.py
  • tests/unit/app/endpoints/test_query.py
  • tests/unit/cache/test_sqlite_cache.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/models/context.py
  • src/utils/checks.py
  • src/models/responses.py
  • src/models/requests.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Pydantic configuration classes must extend ConfigurationBase; data models must extend BaseModel

Files:

  • src/models/context.py
  • src/models/responses.py
  • src/models/requests.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
🧠 Learnings (3)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models

Applied to files:

  • src/models/requests.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).

Applied to files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (3)
src/models/context.py (1)
src/models/responses.py (2)
  • RAGChunk (25-30)
  • ReferencedDocument (326-336)
src/app/endpoints/query.py (1)
src/models/responses.py (2)
  • RAGChunk (25-30)
  • ReferencedDocument (326-336)
src/app/endpoints/streaming_query.py (3)
src/models/responses.py (2)
  • RAGChunk (25-30)
  • ReferencedDocument (326-336)
src/utils/types.py (1)
  • TurnSummary (89-163)
src/app/endpoints/query.py (1)
  • get_rag_toolgroups (950-977)
🪛 GitHub Actions: E2E Tests
src/utils/checks.py

[error] 87-87: InvalidConfigurationError: Check directory to store feedback '/tmp/data/feedback' cannot be created - parent directory does not exist

🔇 Additional comments (10)
src/utils/checks.py (1)

82-91: Unable to verify review comment claims due to repository access constraints.

The review comment makes specific claims about a pipeline failure and broken E2E tests that require access to the full codebase, test files, and pipeline logs to verify. Without repository access, I cannot:

  • Confirm the reported pipeline failure with /tmp/data/feedback
  • Verify all usages of directory_check and assess impact
  • Review test code to understand breaking changes
  • Determine if the parent directory validation is the root cause

Manual verification is required by a developer with repository access.

src/models/responses.py (1)

326-336: doc_url union type change looks correct

Allowing doc_url: AnyUrl | str | None aligns with the new tests and preserves backwards compatibility for existing AnyUrl usage; no issues seen here.

tests/unit/app/endpoints/test_conversations_v2.py (1)

93-116: Updated expectation for doc_url string is appropriate

Asserting str(ref_docs[0]["doc_url"]) == "http://example.com" makes the test compatible with doc_url being either an AnyUrl or a plain string, matching the updated model typing.

tests/unit/cache/test_postgres_cache.py (1)

516-537: More granular CacheEntry assertions improve robustness

The revised test that checks each CacheEntry field and stringifies doc_url before comparison is clearer and more resilient to representation differences than relying on whole-object equality.

tests/unit/app/endpoints/test_query.py (1)

1007-1102: Tests correctly capture new doc_url semantics (including malformed URLs)

The updated expectations:

  • Using str(doc.doc_url) / str(docs[i].doc_url) for valid URLs, and
  • Asserting that malformed URLs yield a non-None ReferencedDocument with doc.doc_url == "not a valid url"

are consistent with ReferencedDocument.doc_url now allowing raw strings as well as AnyUrl. This gives better resilience to imperfect metadata without dropping referenced documents.

src/app/endpoints/query.py (1)

289-423: RAG chunk propagation into transcripts and QueryResponse is wired correctly

Unpacking summary.rag_chunks, converting them once to rag_chunks_dict for store_transcript, and exposing rag_chunks=summary.rag_chunks or [] via QueryResponse keeps the same source of truth for RAG data across storage and API response. The additional referenced_documents handling remains consistent with that contract.

run.yaml (2)

19-19: LGTM!

Using environment variable reference for external_providers_dir improves configurability across different deployment environments.


144-160: LGTM!

The vector_dbs and embedding model configurations are consistent with the new Solr provider setup. The 384 embedding dimension correctly matches the granite-embedding-30m-english model specification.

src/app/endpoints/streaming_query.py (2)

78-78: LGTM!

The logger correctly uses the shared logger name "app.endpoints.handlers" as per the project convention for unified logging across endpoint handlers. Based on learnings.


1218-1227: Potential TypeError when concatenating toolgroups.

If get_rag_toolgroups returns None (when vector_db_ids is empty), and mcp_toolgroups is a list, line 1225 could attempt None + list causing a TypeError. However, the condition on line 1224 guards this case.

The logic is correct but could be clearer. The current flow works because line 1224 checks if vector_db_ids before concatenation.

@tisnik tisnik requested review from are-ces and asimurka December 8, 2025 17:17
Copy link
Contributor

@asimurka asimurka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rebase, you will need to change vector_db to vector_stores, messages from UserMessage format to pure string and copy everything to query v2 implementations that use responses api. Overall LGTM, great work!

@Anxhela21 Anxhela21 force-pushed the anx/streaming_query branch from eaf66dd to 60c3c4f Compare December 9, 2025 21:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/database.py (1)

116-136: Add a default case to handle unsupported database types.

The match statement lacks a default case. If db_config.db_type is neither "sqlite" nor "postgres", engine remains None and Line 136 creates a sessionmaker with bind=None. This could lead to confusing runtime errors later.

         case "postgres":
             logger.info("Initialize PostgreSQL database")
             postgres_config = db_config.config
             logger.debug("Configuration: %s", postgres_config)
             if not isinstance(postgres_config, PostgreSQLDatabaseConfiguration):
                 raise TypeError(
                     f"Expected PostgreSQLDatabaseConfiguration, got {type(postgres_config)}"
                 )
             engine = _create_postgres_engine(postgres_config, **create_engine_kwargs)
+        case _:
+            raise ValueError(f"Unsupported database type: {db_config.db_type}")

     session_local = sessionmaker(autocommit=False, autoflush=False, bind=engine)
♻️ Duplicate comments (4)
src/app/endpoints/streaming_query.py (4)

1046-1051: Hardcoded URL should be extracted to configuration - still present.

The URL https://mimir.corp.redhat.com is hardcoded in multiple places. This was flagged in a previous review.

Also applies to: 1067-1069, 1275-1277


905-908: RAG chunks queried twice per streaming request - still present.

The vector DB is queried at line 906-908 via query_vector_io_for_chunks, and again within retrieve_response at lines 1243-1298. This doubles latency and load. Consider passing the pre-fetched chunks to retrieve_response or consolidating the query logic.

Also applies to: 1243-1298


1022-1022: Avoid logging entire query response payloads - still present.

This was flagged in a previous review. Logging the full query_response payload at INFO level may expose sensitive document content.

-            logger.info("The query response total payload: %s", query_response)
+            logger.debug("Query response chunk count: %d", len(query_response.chunks) if query_response.chunks else 0)

1025-1033: Redundant rag_chunks creation - still present.

The rag_chunks list created on lines 1025-1033 is never used; it's overwritten by final_rag_chunks on lines 1059-1082. Remove the unused code.

             if query_response.chunks:
-                rag_chunks = [
-                    RAGChunk(
-                        content=str(chunk.content),  # Convert to string if needed
-                        source=getattr(chunk, "doc_id", None)
-                        or getattr(chunk, "source", None),
-                        score=getattr(chunk, "score", None),
-                    )
-                    for chunk in query_response.chunks[:5]  # Limit to top 5 chunks
-                ]
-                logger.info("Retrieved %d chunks from vector DB", len(rag_chunks))
-
                 # Extract doc_ids from chunks for referenced_documents

Also applies to: 1059-1084

🧹 Nitpick comments (41)
tests/e2e/features/steps/common_http.py (1)

123-132: Richer status-code assertion is good; consider a small style tweak

Including the parsed JSON (or text fallback) in the failure message is very helpful for debugging and keeps behavior correct. The if ...: assert False, ... pattern is slightly unconventional, though; consider raising AssertionError directly for clarity and to avoid the dummy False:

-    if context.response.status_code != status:
-        # Include response body in error message for debugging
-        try:
-            error_body = context.response.json()
-        except Exception:
-            error_body = context.response.text
-        assert False, (
-            f"Status code is {context.response.status_code}, expected {status}. "
-            f"Response: {error_body}"
-        )
+    if context.response.status_code != status:
+        # Include response body in error message for debugging
+        try:
+            error_body = context.response.json()
+        except Exception:
+            error_body = context.response.text
+        raise AssertionError(
+            f"Status code is {context.response.status_code}, expected {status}. "
+            f"Response: {error_body}"
+        )

Optionally, you could also narrow the exception to ValueError/json.JSONDecodeError instead of catching Exception, but that’s not critical here.

tests/e2e/features/conversation_cache_v2.feature (1)

13-29: TODO: Track and address the skipped test for empty vector DB bug.

This scenario is currently skipped due to a known bug where an empty vector database causes a 500 error. The test should be re-enabled once the underlying issue is resolved.

Do you want me to create an issue to track this bug and the test re-enablement, or is this already tracked under one of the referenced stories (LCORE-1036, LCORE-935, LCORE-882)?

scripts/gen_doc.py (2)

10-10: Adding tests directories to DIRECTORIES may overwrite existing READMEs

Including "tests/unit", "tests/integration", and "tests/e2e" means this script will now (re)generate README.md in every one of those directories and their subdirectories, overwriting any manually maintained READMEs there. If that’s intentional, this is fine; if some test folders have curated documentation, you may want to either narrow the list or skip directories that already contain a non‑generated README.


53-63: Harden main loop against missing directories and reduce brittleness of path handling

As written, generate_documentation_on_path(f"{directory}/") will raise if any directory in DIRECTORIES doesn’t exist (e.g., tests/e2e in an older or partial checkout). You can make this more robust by checking for existence once and reusing a Path object:

 def main():
     """Entry point to this script, regenerates documentation in all directories."""
-    for directory in DIRECTORIES:
-        generate_documentation_on_path(f"{directory}/")
-        for path in Path(directory).rglob("*"):
-            if path.is_dir():
-                if (
-                    path.name == "lightspeed_stack.egg-info"
-                    or path.name == "__pycache__"
-                    or ".ruff_cache" in str(path)
-                ):
-                    continue
-                generate_documentation_on_path(path)
+    for directory in DIRECTORIES:
+        base = Path(directory)
+        if not base.exists():
+            print(f"[gendoc] Skipping missing directory: {base}")
+            continue
+
+        generate_documentation_on_path(base)
+        for path in base.rglob("*"):
+            if not path.is_dir():
+                continue
+            if (
+                path.name == "lightspeed_stack.egg-info"
+                or path.name == "__pycache__"
+                or ".ruff_cache" in str(path)
+            ):
+                continue
+            generate_documentation_on_path(path)

This keeps behavior the same when directories are present, but avoids failures (and makes the assumption about directory existence explicit) when they’re not.

docs/demos/lcore/lcore.md (3)

34-37: Improve language clarity.

The phrase "It's a real framework independent on programming language" is awkward. Consider revising to: "It's a language-independent framework with providers for RAG, quota control, guardrails, metrics, etc."


23-104: Consider the scope and audience for the extensive Llama Stack content.

The PR introduces significant Llama Stack framework documentation (lines 23-104) comprising roughly 40% of the file's content. While this provides valuable context, verify that this level of detail is appropriate for a Lightspeed Core demo presentation. The new Solr Vector I/O Provider, streaming query support, and keyword filtering features are indirectly covered under RAG and LLM Inference sections, but not explicitly highlighted.

If this is intended as an architectural overview that positions Lightspeed Core within the broader Llama Stack ecosystem, the current approach works well. However, if the primary audience expects practical Solr-specific details or streaming implementation guidance, consider adding a dedicated technical section.


75-104: Consider emoji accessibility for documentation portability.

The section headers use emoji (🤖, 🛡️, 🔧, 📚, 🎯) for visual appeal. While this works well for browser/HTML presentation, verify that emoji render correctly across all planned distribution formats (PDF, markdown viewers, screen readers, etc.). If the documentation will be widely shared or translated, consider adding text fallback descriptions or using a more neutral header style.

src/app/endpoints/tools.py (4)

40-59: Docstring now slightly under-describes builtin toolgroups

The implementation clearly returns tools from both builtin toolgroups and MCP server toolgroups (distinguished via server_source), but the docstring still frames this primarily as “from all configured MCP servers.” Consider updating the description (and possibly the Raises section) to reflect that builtin toolgroups are also included and that only upstream connectivity issues currently surface as HTTP errors while missing toolgroups are skipped.


68-76: Good upstream connection handling; consider redacting low-level error details

Mapping connection failures to a 503 using ServiceUnavailableResponse and raising HTTPException(**response.model_dump()) gives a clean, centralized error path and aligns with the “handle Llama Stack connection errors” guidance. The only concern is that cause=str(e) may expose low‑level connection details to clients (hosts, ports, internal messages). If you want a stricter separation between internal diagnostics and public API, consider logging the full exception but returning a more generic cause (or omitting it) in the response body.

If you want to double‑check, please confirm in the Llama Stack client docs that the connection error you’re catching here is the one you intend to surface and that its message doesn’t contain sensitive data.


85-97: BadRequestError is silently converted into partial success

For each toolgroup, a BadRequestError is logged and then skipped, resulting in a 200 response with that toolgroup’s tools omitted. This may be exactly what you want (best‑effort aggregation), but it means misconfigured/removed toolgroups never surface to clients as errors, only as missing tools and log noise.

If misconfiguration is something operators or clients should notice, consider one of:

  • Downgrading log level to warning if this is expected and transient, or keeping error but adding metrics; and/or
  • Surfacing partial‑failure information in ToolsResponse (e.g., a “skipped_toolgroups” field) instead of only logs.

This keeps the endpoint resilient while making failures more visible and debuggable.

You may want to confirm in the Llama Stack client docs whether BadRequestError here strictly means “toolgroup not found” vs broader 4xx classes, to decide how user‑visible this should be.


78-83: Aggregation and server_source derivation are correct; possible micro-refactors

The aggregation over toolgroups_response, derivation of server_source (builtin vs MCP URL), and the final summary logging all look logically sound. A couple of optional refinements if this endpoint becomes hot or mcp_servers grows:

  • Instead of recomputing next(s for s in configuration.mcp_servers ...) for each tool, prebuild a {name: server} dict once so lookup is O(1).
  • The builtin vs MCP counts in the info log require two full passes over consolidated_tools; if needed, you could maintain these counters while appending tools.

These are minor and only worth doing if this endpoint turns into a high‑traffic or large‑dataset path.

Also applies to: 99-127, 135-140

scripts/llama_stack_tutorial.sh (1)

1-24: Consider adding error handling.

The script lacks error handling directives like set -e which would cause it to exit on errors. While this might be intentional for a tutorial script (to continue even if some commands fail), consider adding set -e or at least documenting this behavior.

Add error handling at the top of the script:

 #!/bin/bash
+set -e  # Exit on error
 
 # Llama Stack Tutorial - Interactive Guide 

Or if you want the tutorial to continue even on errors, add a comment explaining this:

 #!/bin/bash
+# Note: Error handling is intentionally omitted to allow the tutorial to continue
+# even if some endpoints are unavailable
 
 # Llama Stack Tutorial - Interactive Guide 
src/authentication/k8s.py (1)

261-284: Variable shadowing: response is reused with different types.

The variable response is first assigned the SAR result (line 272), then conditionally reassigned to a ForbiddenResponse (line 283). This shadowing makes the code harder to follow and could lead to bugs if the logic changes.

         try:
             authorization_api = K8sClientSingleton.get_authz_api()
             sar = kubernetes.client.V1SubjectAccessReview(
                 spec=kubernetes.client.V1SubjectAccessReviewSpec(
                     user=user_info.user.username,
                     groups=user_info.user.groups,
                     non_resource_attributes=kubernetes.client.V1NonResourceAttributes(
                         path=self.virtual_path, verb="get"
                     ),
                 )
             )
-            response = authorization_api.create_subject_access_review(sar)
+            sar_response = authorization_api.create_subject_access_review(sar)
 
         except Exception as e:
             logger.error("API exception during SubjectAccessReview: %s", e)
             response = ServiceUnavailableResponse(
                 backend_name="Kubernetes API",
                 cause="Unable to perform authorization check",
             )
             raise HTTPException(**response.model_dump()) from e
 
-        if not response.status.allowed:
+        if not sar_response.status.allowed:
             response = ForbiddenResponse.endpoint(user_id=user_info.user.uid)
             raise HTTPException(**response.model_dump())
src/authentication/jwk_token.py (1)

37-46: TODO: Missing error handling for connection issues.

The function has a TODO comment (line 42) about handling connection errors and timeouts. While aiohttp.ClientError is caught in the caller (lines 134-139), the resp.raise_for_status() on line 44 will raise aiohttp.ClientResponseError which is a subclass of ClientError and is handled.

However, consider adding an explicit timeout to prevent hanging on slow/unresponsive servers:

 async def get_jwk_set(url: str) -> KeySet:
     """Fetch the JWK set from the cache, or fetch it from the URL if not cached."""
     async with _jwk_cache_lock:
         if url not in _jwk_cache:
-            async with aiohttp.ClientSession() as session:
-                # TODO(omertuc): handle connection errors, timeouts, etc.
-                async with session.get(url) as resp:
+            timeout = aiohttp.ClientTimeout(total=10)
+            async with aiohttp.ClientSession(timeout=timeout) as session:
+                async with session.get(url) as resp:
                     resp.raise_for_status()
                     _jwk_cache[url] = JsonWebKey.import_key_set(await resp.json())
         return _jwk_cache[url]
src/configuration.py (1)

22-22: Type rename from ConversationCacheConfiguration to ConversationHistoryConfiguration.

The import and return type annotation have been updated to use ConversationHistoryConfiguration, aligning with the broader rename across the codebase.

Consider renaming the method to match the new type.

The property method name conversation_cache_configuration still contains "cache" while the return type is now ConversationHistoryConfiguration. For consistency, consider renaming to conversation_history_configuration in a future refactor.

Also applies to: 141-141

docs/streaming_query_endpoint.puml (2)

27-28: Consider clarifying shield event behavior.

Based on the AI summary, shield events are only emitted on violations (silent on success). The diagram shows unconditional "emit shield validation event" which may be misleading. Consider adding a note or updating the alt text to clarify this conditional behavior.

     else Chunk Type: shield
-        EventHandler->>SSE: emit shield validation event
+        EventHandler->>SSE: emit shield violation event (if violation detected)

13-14: Consider documenting RAG retrieval in the flow.

The AI summary indicates the streaming flow incorporates RAG data via query_vector_io_for_chunks, producing rag_chunks and vector_io_referenced_docs that are included in end-stream payloads. Consider adding this to the diagram for completeness.

src/constants.py (1)

102-104: New authentication module constants are properly defined.

The constants follow the existing naming pattern and use descriptive string values. Consider adding brief comments describing each module's purpose, consistent with other constants in this file (e.g., lines 93-98 have comments).

As per coding guidelines, constants should have descriptive comments:

 AUTH_MOD_NOOP_WITH_TOKEN = "noop-with-token"
+# API key based authentication module
 AUTH_MOD_APIKEY_TOKEN = "api-key-token"
 AUTH_MOD_JWK_TOKEN = "jwk-token"
+# Red Hat Identity header authentication module
 AUTH_MOD_RH_IDENTITY = "rh-identity"
src/cache/cache_factory.py (1)

4-4: Config type rename wiring is correct; optional docstring polish

Switching conversation_cache to accept ConversationHistoryConfiguration and extending the invalid type error message to include CACHE_TYPE_NOOP are consistent with the new configuration model; behavior of the factory remains the same. You may optionally update the function docstring to mention NoopCache as a possible return type for completeness, but it's not functionally required.

Also applies to: 20-20, 45-49

tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)

1-32: Verify auth_enabled for noop-with-token e2e config

This config is named for the noop-with-token auth flow and sets authentication.module: "noop-with-token", but service.auth_enabled is false. If the intent is to exercise token-based auth in the e2e suite (Authorized-tag scenarios), you may want auth_enabled: true; otherwise this config might not actually enforce the auth module. Please double-check against how other noop-with-token configs are used in the tests.
Based on learnings, noop-with-token tests typically rely on a dedicated config when the Authorized tag is present.

src/authorization/middleware.py (1)

35-45: Confirm APIKEY_TOKEN → Noop resolvers mapping and structured error usage

The switch to InternalServerErrorResponse/ForbiddenResponse for raising HTTP exceptions via response.model_dump() looks consistent with the new response model pattern and keeps error shapes centralized.

For AUTH_MOD_APIKEY_TOKEN, get_authorization_resolvers now returns NoopRolesResolver and NoopAccessResolver, which effectively disables role-based checks—any request that passes API key authentication will satisfy check_access(action, user_roles). If API keys are meant to grant blanket access, this is fine; if you intend to keep per-Action authorization semantics even for API keys, you may want a dedicated resolver instead of the noop pair. Please confirm the intended behavior.

Also applies to: 68-69, 85-86, 94-95

src/app/endpoints/rags.py (2)

53-99: Consider log level for configuration details.

Line 82 logs the entire llama_stack_configuration object at info level. This could potentially expose sensitive configuration details in production logs. Consider using debug level instead, or logging only non-sensitive attributes.

-    logger.info("Llama stack config: %s", llama_stack_configuration)
+    logger.debug("Llama stack config: %s", llama_stack_configuration)

101-154: Same log level concern applies here.

Line 130 has the same pattern—consider using debug level to avoid leaking configuration details in production logs.

-    logger.info("Llama stack config: %s", llama_stack_configuration)
+    logger.debug("Llama stack config: %s", llama_stack_configuration)
tests/e2e/features/environment.py (1)

84-102: Consider using elif for mutually exclusive config scenarios.

Lines 92-102 use separate if statements for InvalidFeedbackStorageConfig and NoCacheConfig. If a scenario were tagged with both, NoCacheConfig would overwrite the config and trigger an immediate restart. If these are mutually exclusive, using elif would make this explicit and prevent unexpected behavior.

     if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
         context.scenario_config = f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml"
-    if "NoCacheConfig" in scenario.effective_tags:
+    elif "NoCacheConfig" in scenario.effective_tags:
         context.scenario_config = (
             f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml"
         )
src/app/endpoints/root.py (2)

779-784: Consider adding 200 response for OpenAPI consistency.

Unlike other endpoints in this PR (e.g., shields_responses, models_responses), root_responses doesn't include a 200 response entry. While the response_class=HTMLResponse handles the success case, adding an explicit 200 entry would maintain consistency across endpoints.


787-808: Parameter order differs from other endpoints.

The signature has auth before request, while other endpoints in this PR (e.g., rags.py, shields.py, models.py) use request before auth. Consider reordering for consistency.

 async def root_endpoint_handler(
-    auth: Annotated[AuthTuple, Depends(get_auth_dependency())],
     request: Request,
+    auth: Annotated[AuthTuple, Depends(get_auth_dependency())],
 ) -> HTMLResponse:
src/app/endpoints/metrics.py (1)

64-64: Unnecessary str() cast on CONTENT_TYPE_LATEST.

CONTENT_TYPE_LATEST from prometheus_client is already a string constant. The explicit str() conversion is redundant.

-    return PlainTextResponse(generate_latest(), media_type=str(CONTENT_TYPE_LATEST))
+    return PlainTextResponse(generate_latest(), media_type=CONTENT_TYPE_LATEST)
.github/workflows/e2e_tests.yaml (1)

188-191: Consider using health check polling instead of fixed sleep.

A fixed 20-second sleep may be insufficient under load or excessive when services start quickly. Consider polling the health endpoint with a timeout.

      - name: Wait for services
        run: |
          echo "Waiting for services to be healthy..."
-          sleep 20
+          for i in {1..30}; do
+            if curl -sf http://localhost:8080/health > /dev/null 2>&1; then
+              echo "Services healthy after $i seconds"
+              break
+            fi
+            sleep 1
+          done
docs/config.json (1)

156-162: Minor: Consider minimum value of 1 for embedding_dimension.

An embedding dimension of 0 is technically allowed by the schema but would be semantically invalid.

           "embedding_dimension": {
             "default": 768,
             "description": "Dimensionality of embedding vectors.",
-            "minimum": 0,
+            "minimum": 1,
             "title": "Embedding dimension",
             "type": "integer"
           },
src/utils/endpoints.py (2)

637-659: Consider early return for disabled transcripts.

The nested if-else structure can be simplified with an early return pattern for better readability.

     # Store transcript if enabled
-    if not is_transcripts_enabled_func():
-        logger.debug("Transcript collection is disabled in the configuration")
-    else:
-        # Prepare attachments
-        attachments = query_request.attachments or []
-
-        # Determine rag_chunks: use provided value or empty list
-        transcript_rag_chunks = rag_chunks if rag_chunks is not None else []
-
-        store_transcript_func(
-            user_id=user_id,
-            conversation_id=conversation_id,
-            model_id=model_id,
-            provider_id=provider_id,
-            query_is_valid=True,
-            query=query_request.query,
-            query_request=query_request,
-            summary=summary,
-            rag_chunks=transcript_rag_chunks,
-            truncated=False,
-            attachments=attachments,
-        )
+    if is_transcripts_enabled_func():
+        attachments = query_request.attachments or []
+        transcript_rag_chunks = rag_chunks if rag_chunks is not None else []
+        store_transcript_func(
+            user_id=user_id,
+            conversation_id=conversation_id,
+            model_id=model_id,
+            provider_id=provider_id,
+            query_is_valid=True,
+            query=query_request.query,
+            query_request=query_request,
+            summary=summary,
+            rag_chunks=transcript_rag_chunks,
+            truncated=False,
+            attachments=attachments,
+        )
+    else:
+        logger.debug("Transcript collection is disabled in the configuration")

591-610: Consider refactoring cleanup_after_streaming to reduce parameter count.

This function has 19 parameters (triggering pylint R0913/R0917). Consider grouping related parameters into a dataclass or TypedDict to improve maintainability and reduce the chance of argument ordering errors.

Example approach:

@dataclass
class StreamingCleanupContext:
    user_id: str
    conversation_id: str
    model_id: str
    provider_id: str
    llama_stack_model_id: str
    query_request: QueryRequest
    summary: TurnSummary
    metadata_map: dict[str, Any]
    started_at: str
    skip_userid_check: bool
    rag_chunks: list[dict[str, Any]] | None = None

async def cleanup_after_streaming(
    context: StreamingCleanupContext,
    client: AsyncLlamaStackClient,
    config: AppConfig,
    callbacks: StreamingCleanupCallbacks,  # another dataclass for the 4 funcs
) -> None:
src/app/endpoints/conversations_v2.py (1)

229-241: Redundant cache type check at line 232.

The check_conversation_existence function checks if configuration.conversation_cache_configuration.type is None but this is already validated in the calling handlers before this function is invoked. Consider removing the redundant check or documenting why it's needed for safety.

 def check_conversation_existence(user_id: str, conversation_id: str) -> None:
     """Check if conversation exists."""
-    # checked already, but we need to make pyright happy
-    if configuration.conversation_cache_configuration.type is None:
-        return
+    # Callers must validate cache availability before calling this function
+    assert configuration.conversation_cache_configuration.type is not None
     conversations = configuration.conversation_cache.list(user_id, False)

Alternatively, if keeping for type narrowing, the comment already explains the intent which is acceptable.

src/app/endpoints/query.py (4)

79-82: Configuration TODO noted - consider addressing before merge.

The OFFLINE flag controls chunk source resolution but is currently hardcoded. The TODO comment acknowledges this should be moved to configuration. Consider prioritizing this to avoid environment-specific issues.


783-790: Use lazy % formatting for logger calls instead of f-strings.

f-strings are evaluated even when the log level is disabled, causing unnecessary overhead. Use %s placeholders with logger for deferred formatting.

-            logger.info(f"Initial params: {params}")
-            logger.info(f"query_request.solr: {query_request.solr}")
+            logger.info("Initial params: %s", params)
+            logger.info("query_request.solr: %s", query_request.solr)
             if query_request.solr:
                 # Pass the entire solr dict under the 'solr' key
                 params["solr"] = query_request.solr
-                logger.info(f"Final params with solr filters: {params}")
+                logger.info("Final params with solr filters: %s", params)
             else:
                 logger.info("No solr filters provided")
-            logger.info(f"Final params being sent to vector_io.query: {params}")
+            logger.info("Final params being sent to vector_io.query: %s", params)

816-821: Hardcoded URL should be extracted to configuration.

The URL https://mimir.corp.redhat.com is hardcoded. This should be configurable for different environments.

Consider adding a configuration value like mimir_base_url:

# In configuration
mimir_base_url = configuration.mimir_base_url or "https://mimir.corp.redhat.com"

# Then use:
doc_url=urljoin(mimir_base_url, reference_doc),

827-830: Consider narrowing the exception type or logging the exception class.

Catching bare Exception can mask unexpected errors. Consider catching more specific exceptions or at minimum logging the exception type for debugging.

-    except Exception as e:
-        logger.warning(f"Failed to query vector database for chunks: {e}")
-        logger.debug(f"Vector DB query error details: {traceback.format_exc()}")
+    except Exception as e:
+        logger.warning("Failed to query vector database for chunks: %s (%s)", e, type(e).__name__)
+        logger.debug("Vector DB query error details: %s", traceback.format_exc())
src/app/endpoints/streaming_query.py (2)

505-515: Inconsistent behavior: "No Violation" token emitted for successful shield validations.

For successful shield validations (no violations), the code now emits a "No Violation" token to the stream. This may clutter the response with unnecessary tokens. Consider whether this should be silently skipped instead.

         else:
-            # Yield "No Violation" message for successful shield validations
-            yield stream_event(
-                data={
-                    "id": chunk_id,
-                    "token": "No Violation",
-                },
-                event_type=LLM_TOKEN_EVENT,
-                media_type=media_type,
-            )
+            # Successful shield validations are silently ignored
+            pass

1296-1298: Bare Exception catch should log exception type.

Consider logging the exception type for better debugging.

     except Exception as e:
-        logger.warning("Failed to query vector database for chunks: %s", e)
+        logger.warning("Failed to query vector database for chunks: %s (%s)", e, type(e).__name__)
         logger.debug("Vector DB query error details: %s", traceback.format_exc())
src/app/endpoints/streaming_query_v2.py (2)

56-56: Use __name__ for module logger pattern.

Per coding guidelines, use logger = logging.getLogger(__name__) instead of a hardcoded string. This ensures consistent logger naming across the codebase.

-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)

101-103: Consider specifying a more precise return type.

The return type Any is vague. Consider using a Callable type hint for better documentation and IDE support.

-def create_responses_response_generator(  # pylint: disable=too-many-locals,too-many-statements
-    context: ResponseGeneratorContext,
-) -> Any:
+from collections.abc import Callable
+
+def create_responses_response_generator(  # pylint: disable=too-many-locals,too-many-statements
+    context: ResponseGeneratorContext,
+) -> Callable[[AsyncIterator[OpenAIResponseObjectStream]], AsyncIterator[str]]:
src/models/config.py (1)

841-850: Redundant validation check.

The check if self.api_key_config.api_key.get_secret_value() is None on line 847 is redundant. The APIKeyTokenConfiguration class already has min_length=1 on the api_key field, which would fail validation before this check is reached. Additionally, get_secret_value() returns a string, not None, for a valid SecretStr.

Consider simplifying or removing this check:

         if self.module == constants.AUTH_MOD_APIKEY_TOKEN:
             if self.api_key_config is None:
                 raise ValueError(
                     "API Key configuration section must be specified "
                     "when using API Key token authentication"
                 )
-            if self.api_key_config.api_key.get_secret_value() is None:
-                raise ValueError(
-                    "api_key parameter must be specified when using API_KEY token authentication"
-                )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eaf66dd and 60c3c4f.

⛔ Files ignored due to path filters (12)
  • docs/architecture.png is excluded by !**/*.png
  • docs/architecture.svg is excluded by !**/*.svg
  • docs/config.png is excluded by !**/*.png
  • docs/config.svg is excluded by !**/*.svg
  • docs/convesation_history.svg is excluded by !**/*.svg
  • docs/demos/lcore/images/journey.png is excluded by !**/*.png
  • docs/demos/lcore/images/llama_stack_as_library.svg is excluded by !**/*.svg
  • docs/demos/lcore/images/llama_stack_as_service.svg is excluded by !**/*.svg
  • docs/query_endpoint.svg is excluded by !**/*.svg
  • docs/rest_api.svg is excluded by !**/*.svg
  • docs/streaming_query_endpoint.svg is excluded by !**/*.svg
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (107)
  • .coderabbit.yaml (1 hunks)
  • .github/PULL_REQUEST_TEMPLATE.md (1 hunks)
  • .github/workflows/e2e_tests.yaml (7 hunks)
  • .tekton/lightspeed-stack-pull-request.yaml (3 hunks)
  • .tekton/lightspeed-stack-push.yaml (3 hunks)
  • CONTRIBUTING.md (6 hunks)
  • Containerfile (2 hunks)
  • Makefile (3 hunks)
  • README.md (12 hunks)
  • docker-compose-library.yaml (1 hunks)
  • docker-compose.yaml (1 hunks)
  • docs/auth.md (1 hunks)
  • docs/config.html (1 hunks)
  • docs/config.json (1 hunks)
  • docs/config.md (1 hunks)
  • docs/config.puml (9 hunks)
  • docs/demos/lcore/lcore.md (5 hunks)
  • docs/query_endpoint.puml (1 hunks)
  • docs/rag_guide.md (2 hunks)
  • docs/streaming_query_endpoint.puml (1 hunks)
  • examples/lightspeed-stack-api-key-auth.yaml (1 hunks)
  • examples/lightspeed-stack-rh-identity.yaml (1 hunks)
  • pyproject.toml (2 hunks)
  • requirements.hermetic.txt (1 hunks)
  • requirements.torch.txt (1 hunks)
  • rpms.in.yaml (1 hunks)
  • rpms.lock.yaml (1 hunks)
  • scripts/gen_doc.py (2 hunks)
  • scripts/llama_stack_tutorial.sh (1 hunks)
  • scripts/remove_torch_deps.sh (1 hunks)
  • src/app/database.py (2 hunks)
  • src/app/endpoints/README.md (2 hunks)
  • src/app/endpoints/authorized.py (1 hunks)
  • src/app/endpoints/config.py (3 hunks)
  • src/app/endpoints/conversations.py (11 hunks)
  • src/app/endpoints/conversations_v2.py (8 hunks)
  • src/app/endpoints/feedback.py (6 hunks)
  • src/app/endpoints/health.py (2 hunks)
  • src/app/endpoints/info.py (4 hunks)
  • src/app/endpoints/metrics.py (3 hunks)
  • src/app/endpoints/models.py (3 hunks)
  • src/app/endpoints/providers.py (3 hunks)
  • src/app/endpoints/query.py (12 hunks)
  • src/app/endpoints/query_v2.py (9 hunks)
  • src/app/endpoints/rags.py (1 hunks)
  • src/app/endpoints/root.py (3 hunks)
  • src/app/endpoints/shields.py (3 hunks)
  • src/app/endpoints/streaming_query.py (13 hunks)
  • src/app/endpoints/streaming_query_v2.py (1 hunks)
  • src/app/endpoints/tools.py (3 hunks)
  • src/app/main.py (3 hunks)
  • src/app/routers.py (3 hunks)
  • src/authentication/README.md (2 hunks)
  • src/authentication/__init__.py (2 hunks)
  • src/authentication/api_key_token.py (1 hunks)
  • src/authentication/jwk_token.py (2 hunks)
  • src/authentication/k8s.py (4 hunks)
  • src/authentication/rh_identity.py (1 hunks)
  • src/authentication/utils.py (1 hunks)
  • src/authorization/middleware.py (5 hunks)
  • src/cache/cache_factory.py (3 hunks)
  • src/configuration.py (2 hunks)
  • src/constants.py (2 hunks)
  • src/metrics/utils.py (2 hunks)
  • src/models/README.md (1 hunks)
  • src/models/config.py (21 hunks)
  • src/models/context.py (1 hunks)
  • src/models/requests.py (4 hunks)
  • src/models/responses.py (17 hunks)
  • src/quota/__init__.py (1 hunks)
  • src/quota/quota_limiter.py (2 hunks)
  • src/quota/user_quota_limiter.py (1 hunks)
  • src/utils/README.md (1 hunks)
  • src/utils/checks.py (1 hunks)
  • src/utils/common.py (3 hunks)
  • src/utils/endpoints.py (6 hunks)
  • src/utils/quota.py (2 hunks)
  • src/utils/responses.py (1 hunks)
  • src/utils/shields.py (1 hunks)
  • test.containerfile (1 hunks)
  • tests/configuration/rh-identity-config.yaml (1 hunks)
  • tests/e2e/README.md (1 hunks)
  • tests/e2e/__init__.py (1 hunks)
  • tests/e2e/configs/README.md (1 hunks)
  • tests/e2e/configs/run-azure.yaml (1 hunks)
  • tests/e2e/configs/run-ci.yaml (3 hunks)
  • tests/e2e/configs/run-rhelai.yaml (1 hunks)
  • tests/e2e/configuration/README.md (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-no-cache.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml (1 hunks)
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1 hunks)
  • tests/e2e/features/README.md (1 hunks)
  • tests/e2e/features/authorized_noop_token.feature (2 hunks)
  • tests/e2e/features/conversation_cache_v2.feature (1 hunks)
  • tests/e2e/features/conversations.feature (5 hunks)
  • tests/e2e/features/environment.py (4 hunks)
  • tests/e2e/features/feedback.feature (4 hunks)
  • tests/e2e/features/health.feature (2 hunks)
  • tests/e2e/features/info.feature (4 hunks)
  • tests/e2e/features/query.feature (3 hunks)
  • tests/e2e/features/steps/README.md (1 hunks)
  • tests/e2e/features/steps/__init__.py (1 hunks)
  • tests/e2e/features/steps/common_http.py (2 hunks)
  • tests/e2e/features/steps/conversation.py (5 hunks)
⛔ Files not processed due to max files limit (25)
  • tests/e2e/features/steps/llm_query_response.py
  • tests/e2e/features/streaming_query.feature
  • tests/e2e/test_api.py
  • tests/e2e/utils/README.md
  • tests/integration/README.md
  • tests/integration/endpoints/README.md
  • tests/integration/endpoints/test_config_integration.py
  • tests/integration/endpoints/test_health_integration.py
  • tests/integration/endpoints/test_info_integration.py
  • tests/integration/endpoints/test_query_v2_integration.py
  • tests/integration/test_middleware_integration.py
  • tests/integration/test_openapi_json.py
  • tests/integration/test_rh_identity_integration.py
  • tests/unit/README.md
  • tests/unit/app/README.md
  • tests/unit/app/endpoints/README.md
  • tests/unit/app/endpoints/test_authorized.py
  • tests/unit/app/endpoints/test_config.py
  • tests/unit/app/endpoints/test_conversations.py
  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_feedback.py
  • tests/unit/app/endpoints/test_health.py
  • tests/unit/app/endpoints/test_info.py
  • tests/unit/app/endpoints/test_metrics.py
  • tests/unit/app/endpoints/test_models.py
✅ Files skipped from review due to trivial changes (16)
  • src/models/README.md
  • rpms.in.yaml
  • tests/e2e/features/steps/README.md
  • .github/PULL_REQUEST_TEMPLATE.md
  • src/quota/init.py
  • examples/lightspeed-stack-api-key-auth.yaml
  • tests/e2e/README.md
  • tests/e2e/configuration/README.md
  • docs/config.md
  • tests/e2e/configs/README.md
  • tests/e2e/init.py
  • Containerfile
  • src/quota/user_quota_limiter.py
  • tests/e2e/features/README.md
  • tests/configuration/rh-identity-config.yaml
  • requirements.hermetic.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/checks.py
🧰 Additional context used
📓 Path-based instructions (10)
tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (CLAUDE.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

  • tests/e2e/features/info.feature
  • tests/e2e/features/feedback.feature
  • tests/e2e/features/health.feature
  • tests/e2e/features/conversation_cache_v2.feature
  • tests/e2e/features/query.feature
  • tests/e2e/features/authorized_noop_token.feature
  • tests/e2e/features/conversations.feature
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/e2e/features/steps/__init__.py
  • tests/e2e/features/steps/common_http.py
  • tests/e2e/features/steps/conversation.py
  • tests/e2e/features/environment.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/main.py
  • src/metrics/utils.py
  • src/authentication/utils.py
  • src/authorization/middleware.py
  • src/app/endpoints/models.py
  • src/utils/endpoints.py
  • src/cache/cache_factory.py
  • src/models/context.py
  • src/app/endpoints/tools.py
  • src/authentication/jwk_token.py
  • src/configuration.py
  • src/app/routers.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/rags.py
  • src/authentication/rh_identity.py
  • src/utils/shields.py
  • src/app/endpoints/health.py
  • src/models/requests.py
  • src/app/endpoints/query.py
  • src/utils/common.py
  • src/authentication/k8s.py
  • src/app/endpoints/providers.py
  • src/utils/responses.py
  • src/utils/quota.py
  • src/app/endpoints/info.py
  • src/app/database.py
  • src/constants.py
  • src/authentication/api_key_token.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/config.py
  • src/authentication/__init__.py
  • src/app/endpoints/conversations.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/shields.py
  • src/app/endpoints/authorized.py
  • src/quota/quota_limiter.py
  • src/app/endpoints/root.py
  • src/models/responses.py
  • src/app/endpoints/query_v2.py
  • src/models/config.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/models.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/health.py
  • src/app/endpoints/query.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/info.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/config.py
  • src/app/endpoints/conversations.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/shields.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/root.py
  • src/app/endpoints/query_v2.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/models.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/feedback.py
  • src/app/endpoints/rags.py
  • src/app/endpoints/health.py
  • src/app/endpoints/query.py
  • src/app/endpoints/providers.py
  • src/app/endpoints/info.py
  • src/app/endpoints/metrics.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/config.py
  • src/app/endpoints/conversations.py
  • src/app/endpoints/conversations_v2.py
  • src/app/endpoints/shields.py
  • src/app/endpoints/authorized.py
  • src/app/endpoints/root.py
  • src/app/endpoints/query_v2.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Configure pylint with source-roots = "src"
Exclude src/auth/k8s.py from pyright type checking

Files:

  • pyproject.toml
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Pydantic configuration classes must extend ConfigurationBase; data models must extend BaseModel

Files:

  • src/models/context.py
  • src/models/requests.py
  • src/models/responses.py
  • src/models/config.py
src/**/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define shared constants in central constants.py file with descriptive comments

Files:

  • src/constants.py
src/**/__init__.py

📄 CodeRabbit inference engine (CLAUDE.md)

Package __init__.py files must contain brief package descriptions

Files:

  • src/authentication/__init__.py
src/models/config.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/config.py: All configuration must use Pydantic models extending ConfigurationBase with extra="forbid" to reject unknown fields
Use type hints Optional[FilePath], PositiveInt, SecretStr for Pydantic configuration models

Files:

  • src/models/config.py
🧠 Learnings (19)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code

Applied to files:

  • tests/e2e/features/info.feature
  • src/app/main.py
  • src/metrics/utils.py
  • src/app/endpoints/models.py
  • src/app/endpoints/tools.py
  • src/app/endpoints/health.py
  • src/app/endpoints/query.py
  • src/app/endpoints/info.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/conversations.py
  • src/app/endpoints/shields.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • examples/lightspeed-stack-rh-identity.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml
  • src/authentication/README.md
  • tests/e2e/features/authorized_noop_token.feature
  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml
  • docs/auth.md
  • tests/e2e/configuration/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/features/authorized_noop_token.feature
  • tests/e2e/configuration/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

  • src/app/main.py
  • src/authentication/utils.py
  • src/authorization/middleware.py
  • src/app/endpoints/models.py
  • src/utils/endpoints.py
  • src/app/endpoints/feedback.py
  • src/utils/quota.py
  • src/app/endpoints/info.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/conversations.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Run `uv run make verify` to run all linters (black, pylint, pyright, ruff, docstyle, check-types) before completion

Applied to files:

  • Makefile
  • CONTRIBUTING.md
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt` for end-to-end tests

Applied to files:

  • Makefile
  • .github/workflows/e2e_tests.yaml
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml
  • tests/e2e/features/health.feature
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Applied to files:

  • Makefile
  • .github/workflows/e2e_tests.yaml
  • tests/e2e/features/environment.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

  • docker-compose-library.yaml
  • README.md
  • docs/demos/lcore/lcore.md
📚 Learning: 2025-07-23T14:26:40.340Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 278
File: src/app/endpoints/feedback.py:113-118
Timestamp: 2025-07-23T14:26:40.340Z
Learning: In the lightspeed-stack project after PR #278, the `UserDataCollection.feedback_storage` property dynamically constructs paths from `user_data_dir` and will always return a valid string, making fallback logic like `or Path("")` redundant.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Always check `pyproject.toml` for existing dependencies and versions before adding new ones

Applied to files:

  • pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Use `uv sync --group dev --group llslibdev` to install development dependencies

Applied to files:

  • pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/{unit,integration}/**/*.py : Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Applied to files:

  • CONTRIBUTING.md
📚 Learning: 2025-08-18T11:45:59.961Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:77-77
Timestamp: 2025-08-18T11:45:59.961Z
Learning: torch==2.7.1 is available on PyPI and is a valid version that can be used in dependency specifications.

Applied to files:

  • requirements.torch.txt
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models

Applied to files:

  • src/models/requests.py
📚 Learning: 2025-08-06T06:02:21.060Z
Learnt from: eranco74
Repo: lightspeed-core/lightspeed-stack PR: 348
File: src/utils/endpoints.py:91-94
Timestamp: 2025-08-06T06:02:21.060Z
Learning: The direct assignment to `agent._agent_id` in `src/utils/endpoints.py` is a necessary workaround for the missing agent rehydration feature in the LLS client SDK. This allows preserving conversation IDs when handling existing agents.

Applied to files:

  • src/app/endpoints/conversations.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/config.py : All configuration must use Pydantic models extending `ConfigurationBase` with `extra="forbid"` to reject unknown fields

Applied to files:

  • docs/config.puml
  • src/models/config.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/config.py : Use type hints `Optional[FilePath]`, `PositiveInt`, `SecretStr` for Pydantic configuration models

Applied to files:

  • docs/config.puml
  • src/models/config.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Pydantic configuration classes must extend `ConfigurationBase`; data models must extend `BaseModel`

Applied to files:

  • docs/config.puml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (16)
tests/integration/endpoints/test_query_v2_integration.py (2)

357-375: Resolve merge conflict in mock_tool_output.results structure.

Unresolved merge conflict markers prevent the file from being parsed. The conflict shows divergent mock structures for tool search results:

  • HEAD: Inline list of mocks with file_id, filename, and score attributes only
  • upstream/main: Separate mock_result object with additional attributes dict containing doc_url and link fields

Choose the structure that matches the actual Llama Stack API response format. The upstream/main version appears more complete with URL metadata.

-<<<<<<< HEAD
-    mock_tool_output.results = [
-        mocker.MagicMock(
-            file_id="doc-1",
-            filename="ansible-docs.txt",
-            score=0.95,
-        )
-    ]
-=======
-    mock_result = mocker.MagicMock()
-    mock_result.file_id = "doc-1"
-    mock_result.filename = "ansible-docs.txt"
-    mock_result.score = 0.95
-    mock_result.attributes = {
-        "doc_url": "https://example.com/ansible-docs.txt",
-        "link": "https://example.com/ansible-docs.txt",
-    }
-    mock_tool_output.results = [mock_result]
->>>>>>> upstream/main
+    mock_result = mocker.MagicMock()
+    mock_result.file_id = "doc-1"
+    mock_result.filename = "ansible-docs.txt"
+    mock_result.score = 0.95
+    mock_result.attributes = {
+        "doc_url": "https://example.com/ansible-docs.txt",
+        "link": "https://example.com/ansible-docs.txt",
+    }
+    mock_tool_output.results = [mock_result]

1244-1275: Resolve merge conflict in transcript behavior test.

Unresolved merge conflict markers prevent the file from being parsed. The conflict affects the function signature and transcript mocking:

  • HEAD: No mocker parameter, no mocking of store_transcript (may create actual files during tests)
  • upstream/main: Includes mocker parameter and mocks store_transcript to prevent file creation

The upstream/main approach is preferred for integration tests, as it prevents unwanted side effects (actual file I/O) while still verifying the integration logic.

 async def test_query_v2_endpoint_transcript_behavior(
     test_config: AppConfig,
     mock_llama_stack_client: AsyncMockType,
     test_request: Request,
     test_auth: AuthTuple,
     patch_db_session: Session,
-<<<<<<< HEAD
-=======
+    mocker: MockerFixture,
->>>>>>> upstream/main
 ) -> None:
     """Test transcript storage behavior based on configuration.
 
     This integration test verifies:
     - Endpoint succeeds with transcripts enabled
     - Endpoint succeeds with transcripts disabled
     - Conversation is persisted regardless of transcript setting
     - Integration between query handler and transcript configuration
 
     Args:
         test_config: Test configuration
         mock_llama_stack_client: Mocked Llama Stack client
         test_request: FastAPI request
         test_auth: noop authentication tuple
         patch_db_session: Test database session
-<<<<<<< HEAD
-    """
-    _ = mock_llama_stack_client
-
-=======
+        mocker: pytest-mock fixture
+    """
+    _ = mock_llama_stack_client
+
+    # Mock store_transcript to prevent file creation
+    mocker.patch("app.endpoints.query.store_transcript")
+
->>>>>>> upstream/main
     test_config.user_data_collection_configuration.transcripts_enabled = True
tests/e2e/features/environment.py (1)

92-112: Harden NoCacheConfig config swapping and restore logic

The new immediate switch/restart for NoCacheConfig in before_scenario is fine, but the restore path in after_scenario has a couple of pitfalls:

  • after_scenario always restores using context.feature_config for NoCacheConfig, but context.feature_config is only set in before_feature for Authorized features. If a NoCacheConfig scenario ever appears in a non‑Authorized feature, this will raise an AttributeError and leave lightspeed-stack.yaml in the no‑cache state.
  • Unlike the feature‑level Authorized path, there is no dedicated backup of lightspeed-stack.yaml when you overwrite it for NoCacheConfig, so the restore logic depends on knowing the original config path instead of having a real snapshot of the active config.

You can make this path robust and self‑contained by introducing a per‑scenario backup for NoCacheConfig and cleaning it up afterwards, e.g.:

@@ def before_scenario(context: Context, scenario: Scenario) -> None:
-    if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
-        context.scenario_config = f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml"
-    if "NoCacheConfig" in scenario.effective_tags:
-        context.scenario_config = (
-            f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml"
-        )
-        # Switch config and restart immediately
-        switch_config(
-            context.scenario_config
-        )  # Copies to default lightspeed-stack.yaml
-        restart_container("lightspeed-stack")
+    if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
+        context.scenario_config = (
+            f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml"
+        )
+    if "NoCacheConfig" in scenario.effective_tags:
+        context.scenario_config = (
+            f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml"
+        )
+        # Backup current config, switch to no-cache, and restart immediately
+        context.scenario_config_backup = create_config_backup("lightspeed-stack.yaml")
+        switch_config(context.scenario_config)  # Copies to default lightspeed-stack.yaml
+        restart_container("lightspeed-stack")
@@ def after_scenario(context: Context, scenario: Scenario) -> None:
-    if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
-        switch_config(context.feature_config)
-        restart_container("lightspeed-stack")
-    if "NoCacheConfig" in scenario.effective_tags:
-        switch_config(context.feature_config)
-        restart_container("lightspeed-stack")
+    if "InvalidFeedbackStorageConfig" in scenario.effective_tags:
+        switch_config(context.feature_config)
+        restart_container("lightspeed-stack")
+    if "NoCacheConfig" in scenario.effective_tags:
+        # Restore the backed-up config for this scenario if available
+        if hasattr(context, "scenario_config_backup"):
+            switch_config(context.scenario_config_backup)
+            restart_container("lightspeed-stack")
+            remove_config_backup(context.scenario_config_backup)
+            delattr(context, "scenario_config_backup")

This removes the dependency on context.feature_config for NoCacheConfig restores, avoids AttributeError when the tag is used outside Authorized features, and ensures per‑scenario state is properly cleaned up so behavior doesn’t leak across scenarios. Based on learnings, Behave’s Context persists across scenarios, so this cleanup is important for test isolation.

tests/unit/app/endpoints/test_rags.py (1)

192-206: Resolve merge conflict in RagInfo mock class.

The RagInfo.__init__ method contains unresolved merge conflict markers, causing multiple pipeline failures:

  • IndentationError (syntax error)
  • Ruff, Black, and Pylint parsing failures

Remove the conflict markers and decide whether to include the verbose docstring from upstream/main or keep the minimal version from HEAD. Ensure the resulting code is syntactically valid.

requirements.aarch64.txt (4)

1-4197: Unresolved merge conflicts in requirements.aarch64.txt—regenerate via uv from pyproject.toml.

This file contains unresolved conflict markers (<<<<<<< HEAD, =======, >>>>>>>). Any marker will break pip/uv and CI. Do not hand-edit. Fix all conflicts in pyproject.toml, then regenerate all requirements-*.txt files using uv to ensure consistency across architectures.

The file includes conflicting pins for faiss-cpu, fastapi, greenlet, kubernetes, litellm, llama-stack, llama-stack-client, mcp, openai, opentelemetry packages (api, sdk, instrumentation, exporter variants, proto), protobuf, and urllib3. Additionally, python-jose (and ecdsa) appear only in the HEAD section and are absent from upstream/main—since authlib is present in both branches, python-jose is likely redundant and should be removed from pyproject.toml before regeneration.


2115-2213: Resolve the OpenTelemetry version conflict by selecting one coherent version set.

The requirements file contains an unresolved merge conflict with split OpenTelemetry versions: HEAD uses 1.38.0 family + instrumentation 0.59b0, while upstream/main uses 1.39.0 family + instrumentation 0.60b0. Select one coherent set (e.g., 1.39.0 family + 0.60b0 instrumentation) and regenerate the requirements file to ensure all opentelemetry* packages maintain version coherence throughout requirements.aarch64.txt and related requirement files.


654-696: Resolve the merge conflict in requirements.aarch64.txt and regenerate with uv.

The file contains unresolved conflict markers for faiss-cpu and fastapi. Accept the upstream/main versions (faiss-cpu==1.13.1 and fastapi==0.124.0), which match pyproject.toml. Then regenerate:

uv pip compile pyproject.toml -o requirements.aarch64.txt --generate-hashes --group llslibdev --python-platform aarch64-unknown-linux-gnu --torch-backend cpu --python-version 3.12

faiss-cpu is pinned at top-level to maintain consistent control across dependency graphs; keep it uniform across architecture files.


1000-1118: Resolve merge conflict: greenlet requires a single pinned version.

Remove the unresolved merge markers and choose one greenlet version (3.2.4 or 3.3.0). Both are compatible with SQLAlchemy on Python 3.12; 3.3.0 is newer with improved wheel packaging. After resolving the conflict, regenerate the requirements.aarch64.txt with uv to ensure consistency across all requirements files.

requirements.x86_64.txt (4)

654-696: Pick one: faiss-cpu 1.13.0 vs 1.13.1; keep policy pin.

Choose a single version (likely 1.13.1) and ensure it’s pinned in pyproject before compiling, then propagate into all platform requirement files. Based on learnings, keep faiss-cpu pinned at the top-level.

Based on learnings, ...


2115-2212: Resolve the OpenTelemetry package version conflict in requirements.x86_64.txt.

Git merge conflict markers show HEAD pinned to 1.38.x/0.59b0 while upstream/main uses 1.39.x/0.60b0. All opentelemetry packages (api, exporter-otlp, exporter-otlp-proto-*, instrumentation, proto, sdk, semantic-conventions) must be aligned to the same minor version. Select one version set, remove the conflict markers, update all affected packages consistently, and recompile to avoid compatibility issues.


1508-1531: litellm / llama-stack[‑client] bumps: verify API compatibility.

Upgrading to llama-stack 0.3.0 introduces breaking changes: tools parameter schema changed from "parameters" to { "input_schema", "output_schema" }, API routes reorganized (v1/, v1alpha/, v1beta/), and some endpoints removed/unpublished. If these versions are merged, code must be updated for tool definitions and API endpoint usage; otherwise, keep existing pins (1.80.7, 0.2.22) and regenerate.


673-696: FastAPI and Starlette versions are incompatible.

FastAPI 0.124.0 requires Starlette >=0.40.0 and <0.49.0. Starlette 0.50.0 exceeds the upper bound and will not work with FastAPI 0.124.0. Either downgrade Starlette to <0.49.0 or select a different FastAPI version. Resolve the merge conflict and pin compatible versions in pyproject.toml.

Also applies to: 3607-3613

tests/unit/models/responses/test_error_responses.py (1)

9-36: Resolve merge conflicts and adopt PromptTooLongResponse tests

This test module currently contains unresolved merge conflict markers (<<<<<<< HEAD, =======, >>>>>>> upstream/main), which breaks parsing and unit tests.

You should:

  • Keep the upstream/main side that introduces PROMPT_TOO_LONG_DESCRIPTION and PromptTooLongResponse in the imports.
  • Drop the conflict markers and ensure the new TestPromptTooLongResponse class is present once.

For example:

-from models.responses import (
-    BAD_REQUEST_DESCRIPTION,
-    FORBIDDEN_DESCRIPTION,
-    INTERNAL_SERVER_ERROR_DESCRIPTION,
-    NOT_FOUND_DESCRIPTION,
-<<<<<<< HEAD
-=======
-    PROMPT_TOO_LONG_DESCRIPTION,
->>>>>>> upstream/main
-    QUOTA_EXCEEDED_DESCRIPTION,
+from models.responses import (
+    BAD_REQUEST_DESCRIPTION,
+    FORBIDDEN_DESCRIPTION,
+    INTERNAL_SERVER_ERROR_DESCRIPTION,
+    NOT_FOUND_DESCRIPTION,
+    PROMPT_TOO_LONG_DESCRIPTION,
+    QUOTA_EXCEEDED_DESCRIPTION,
@@
-    NotFoundResponse,
-<<<<<<< HEAD
-=======
-    PromptTooLongResponse,
->>>>>>> upstream/main
+    NotFoundResponse,
+    PromptTooLongResponse,
@@
-<<<<<<< HEAD
-=======
-class TestPromptTooLongResponse:
-    """Test cases for PromptTooLongResponse."""
-
-    def test_constructor_with_default_response(self) -> None:
-        """Test PromptTooLongResponse with default response."""
-        response = PromptTooLongResponse(
-            cause="The prompt exceeds the maximum allowed length."
-        )
-        assert isinstance(response, AbstractErrorResponse)
-        assert response.status_code == status.HTTP_413_REQUEST_ENTITY_TOO_LARGE
-        assert isinstance(response.detail, DetailModel)
-        assert response.detail.response == "Prompt is too long"
-        assert response.detail.cause == "The prompt exceeds the maximum allowed length."
-
-    def test_openapi_response(self) -> None:
-        """Test PromptTooLongResponse.openapi_response() method."""
-        schema = PromptTooLongResponse.model_json_schema()
-        model_examples = schema.get("examples", [])
-        expected_count = len(model_examples)
-
-        result = PromptTooLongResponse.openapi_response()
-        assert result["description"] == PROMPT_TOO_LONG_DESCRIPTION
-        assert result["model"] == PromptTooLongResponse
-        assert "examples" in result["content"]["application/json"]
-        examples = result["content"]["application/json"]["examples"]
-
-        # Verify example count matches schema examples count
-        assert len(examples) == expected_count
-        assert expected_count == 1
-
-        # Verify example structure
-        assert "prompt too long" in examples
-        prompt_example = examples["prompt too long"]
-        assert "value" in prompt_example
-        assert "detail" in prompt_example["value"]
-        assert prompt_example["value"]["detail"]["response"] == "Prompt is too long"
-        assert (
-            prompt_example["value"]["detail"]["cause"]
-            == "The prompt exceeds the maximum allowed length."
-        )
-
-    def test_openapi_response_with_explicit_examples(self) -> None:
-        """Test PromptTooLongResponse.openapi_response() with explicit examples."""
-        result = PromptTooLongResponse.openapi_response(examples=["prompt too long"])
-        examples = result["content"]["application/json"]["examples"]
-
-        # Verify only 1 example is returned when explicitly specified
-        assert len(examples) == 1
-        assert "prompt too long" in examples
-
-
->>>>>>> upstream/main
+class TestPromptTooLongResponse:
+    """Test cases for PromptTooLongResponse."""
+
+    def test_constructor_with_default_response(self) -> None:
+        """Test PromptTooLongResponse with default response."""
+        response = PromptTooLongResponse(
+            cause="The prompt exceeds the maximum allowed length."
+        )
+        assert isinstance(response, AbstractErrorResponse)
+        assert response.status_code == status.HTTP_413_REQUEST_ENTITY_TOO_LARGE
+        assert isinstance(response.detail, DetailModel)
+        assert response.detail.response == "Prompt is too long"
+        assert response.detail.cause == "The prompt exceeds the maximum allowed length."
+
+    def test_openapi_response(self) -> None:
+        """Test PromptTooLongResponse.openapi_response() method."""
+        schema = PromptTooLongResponse.model_json_schema()
+        model_examples = schema.get("examples", [])
+        expected_count = len(model_examples)
+
+        result = PromptTooLongResponse.openapi_response()
+        assert result["description"] == PROMPT_TOO_LONG_DESCRIPTION
+        assert result["model"] == PromptTooLongResponse
+        assert "examples" in result["content"]["application/json"]
+        examples = result["content"]["application/json"]["examples"]
+
+        # Verify example count matches schema examples count
+        assert len(examples) == expected_count
+        assert expected_count == 1
+
+        # Verify example structure
+        assert "prompt too long" in examples
+        prompt_example = examples["prompt too long"]
+        assert "value" in prompt_example
+        assert "detail" in prompt_example["value"]
+        assert prompt_example["value"]["detail"]["response"] == "Prompt is too long"
+        assert (
+            prompt_example["value"]["detail"]["cause"]
+            == "The prompt exceeds the maximum allowed length."
+        )
+
+    def test_openapi_response_with_explicit_examples(self) -> None:
+        """Test PromptTooLongResponse.openapi_response() with explicit examples."""
+        result = PromptTooLongResponse.openapi_response(examples=["prompt too long"])
+        examples = result["content"]["application/json"]["examples"]
+
+        # Verify only 1 example is returned when explicitly specified
+        assert len(examples) == 1
+        assert "prompt too long" in examples

After this, the SyntaxError and Ruff “merge conflict markers detected” should go away.

Also applies to: 666-719

docs/config.html (1)

1-21: Resolve merge conflicts in generated config HTML (and regenerate from Markdown)

docs/config.html still contains merge conflict markers across multiple sections (<<<<<<< HEAD, =======, >>>>>>> upstream/main). This makes the file invalid HTML and breaks tooling (Ruff/Black are already complaining).

Given this file is clearly generated from docs/config.md, the safest approach is:

  • First, resolve the conflicts in docs/config.md (see separate comment) so that it contains a single, correct version of the configuration schema, including the new APIKeyTokenConfiguration section and api_key_config field in AuthenticationConfiguration.
  • Then, regenerate docs/config.html from the Markdown (e.g., via your existing pandoc or docs build pipeline), rather than hand‑editing the HTML.
  • Ensure the regenerated HTML no longer includes any conflict markers and that the tables (e.g., AuthenticationConfiguration) mention api_key_config alongside jwk_config and rh_identity_config.

Until these conflicts are resolved and the HTML is regenerated, this doc will remain broken.

Also applies to: 191-268, 443-721, 1009-1266, 1570-1815, 2275-2452

docs/config.md (1)

10-23: Resolve Markdown merge conflicts and keep the API key schema updates

docs/config.md also has unresolved merge conflict markers, which breaks the Markdown and the HTML generation:

  • At the top around APIKeyTokenConfiguration (Lines ~10‑23).
  • In AuthenticationConfiguration where api_key_config is conditionally present (Lines ~59‑63).
  • In narrative text for ModelContextProtocolServer, PostgreSQLDatabaseConfiguration, and ServiceConfiguration.

To fix:

  • Remove all <<<<<<< HEAD / ======= / >>>>>>> upstream/main markers.
  • Keep the upstream/main additions that introduce:
    • The ## APIKeyTokenConfiguration section with its api_key field table.
    • The api_key_config row in the AuthenticationConfiguration table.
    • The reflowed one‑line paragraphs (they are equivalent content, just wrapped differently).

Conceptually, the cleaned sections should look like:

## APIKeyTokenConfiguration

API Key Token configuration.

| Field   | Type   | Description |
|--------|--------|-------------|
| api_key | string |             |

...

| Field          | Type   | Description |
|----------------|--------|-------------|
| module         | string |             |
| skip_tls_verification | boolean |    |
| k8s_cluster_api       | string  |    |
| k8s_ca_cert_path      | string  |    |
| jwk_config            |        |     |
| api_key_config        |        |     |
| rh_identity_config    |        |     |

Once these conflicts are resolved, re‑run your docs build to regenerate docs/config.html and clear the markdownlint errors.

Also applies to: 59-63, 318-331, 351-359, 473-482

tests/unit/app/endpoints/test_streaming_query_v2.py (1)

1-637: Critical: Resolve all merge conflicts before proceeding.

This file contains unresolved merge conflicts throughout (lines 1-5, 12-19, 50-56, 89-95, 148-152, 184-198, 220-224, 240-244, 263-289, 309-315, 348-354, 405-409, 501-505, 557-637), causing all pipeline checks to fail:

  • Unit tests: SyntaxError: invalid syntax
  • Ruff/Black/Pylint: Cannot parse due to conflict markers

Key decisions to make when resolving:

  1. Imports (lines 12-19): Choose between HTTPException (HEAD) or RateLimitError (upstream) — likely need both if keeping the quota test.

  2. stream_end_event signature (lines 148-152, 405-409, 501-505): Upstream uses lambda _m, _t, _aq, _rd, _media which aligns with new RAG chunk fields (rag_chunks, referenced_documents). HEAD uses lambda _m, _s, _t, _media. Choose based on the production signature in streaming_query_v2.py.

  3. Conversation mocking (lines 50-56, 89-95, 309-315, 348-354): Upstream adds mock_client.conversations.create mocks — include these if retrieve_response now creates conversations.

  4. Error handling pattern (lines 263-289): HEAD raises HTTPException; upstream returns StreamingResponse with error status code. Align with the production implementation.

  5. Conversation ID expectations (lines 220-224, 240-244): HEAD expects conv-xyz; upstream expects abc123def456. Align with how retrieve_response returns the conversation ID.

  6. Quota test (lines 557-637): Upstream adds a new test for RateLimitError — keep if this error handling exists in production.

♻️ Duplicate comments (5)
src/utils/checks.py (1)

90-91: Fix misleading error message.

The error message references path but the actual check is for parent_path writability, making debugging more difficult.

Apply this diff to clarify the error message:

             if not os.access(parent_path, os.W_OK):
-                raise InvalidConfigurationError(f"{desc} '{path}' is not writable")
+                raise InvalidConfigurationError(
+                    f"{desc} '{path}' cannot be created - parent directory '{parent_path}' is not writable"
+                )

Based on past review comments, this issue was previously flagged but remains unresolved.

src/models/responses.py (1)

749-788: ConversationDeleteResponse.success should reflect deleted and examples must be updated

__init__ still hardcodes success=True (Line 763), so the response claims success even when deleted=False. The “not found” example also shows success: True with a failure message and uses the inconsistent phrase “can not be deleted”.

This is the same issue previously flagged; please wire success to deleted and fix the example payload.

Apply something like:

     def __init__(self, *, deleted: bool, conversation_id: str) -> None:
@@
         response_msg = (
             "Conversation deleted successfully"
             if deleted
             else "Conversation cannot be deleted"
         )
         super().__init__(
             conversation_id=conversation_id,  # type: ignore[call-arg]
-            success=True,  # type: ignore[call-arg]
+            success=deleted,  # type: ignore[call-arg]
             response=response_msg,  # type: ignore[call-arg]
         )
@@
                 {
                     "label": "not found",
                     "value": {
                         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
-                        "success": True,
-                        "response": "Conversation can not be deleted",
+                        "success": False,
+                        "response": "Conversation cannot be deleted",
                     },
                 },
src/app/endpoints/query.py (2)

978-1005: Fix “attatchment” typos in validation errors

Both error messages use “attatchment” instead of “attachment”, and a previous review called this out. These strings surface in HTTP 422 responses and should be correct and consistent.

-            message = (
-                f"Invalid attatchment type {attachment.attachment_type}: "
+            message = (
+                f"Invalid attachment type {attachment.attachment_type}: "
@@
-            message = (
-                f"Invalid attatchment content type {attachment.content_type}: "
+            message = (
+                f"Invalid attachment content type {attachment.content_type}: "

You may also want to align these messages with the example texts in UnprocessableEntityResponse (“Invalid attribute value”) if you care about strict consistency.


792-797: Avoid logging full vector DB payloads at INFO

Both vector DB queries log the entire query_response at INFO, which can contain sensitive content and is also noisy:

logger.info(f"The query response total payload: {query_response}")

Prefer logging lightweight metadata (e.g., chunk count) at DEBUG instead, as in the diff above. This aligns with the earlier review feedback on the same issue.

Also applies to: 890-897

tests/unit/models/responses/test_successful_responses.py (1)

673-737: Align ConversationDeleteResponse tests with corrected success semantics

Once ConversationDeleteResponse is fixed to set success=deleted, the tests here must be updated accordingly:

  • test_constructor_not_deleted should expect success is False.
  • The “not found” OpenAPI example should assert "success": False and the corrected message "Conversation cannot be deleted".

Proposed changes:

     def test_constructor_not_deleted(self) -> None:
         """Test ConversationDeleteResponse when conversation cannot be deleted."""
         response = ConversationDeleteResponse(deleted=False, conversation_id="conv-123")
-        assert response.success is True
-        assert response.response == "Conversation cannot be deleted"
+        assert response.success is False
+        assert response.response == "Conversation cannot be deleted"
@@
         not_found_example = examples["not found"]
@@
-        assert not_found_example["value"]["success"] is True
-        assert (
-            not_found_example["value"]["response"] == "Conversation can not be deleted"
-        )
+        assert not_found_example["value"]["success"] is False
+        assert (
+            not_found_example["value"]["response"] == "Conversation cannot be deleted"
+        )

This keeps the tests consistent with the corrected model and with the updated examples in ConversationDeleteResponse.model_config.

🧹 Nitpick comments (2)
src/models/responses.py (1)

25-30: Avoid duplicating RAGChunk model across modules

RAGChunk is now defined here and also in src/utils/types.py with the same fields. Maintaining two nearly identical models will drift over time and complicates typing (e.g., TurnSummary.rag_chunks vs QueryResponse.rag_chunks). Consider making a single canonical definition (e.g., in utils.types) and importing/re-exporting it here, or at least aliasing one to the other to keep them in sync.

src/app/endpoints/query.py (1)

879-936: Second vector DB query duplicates work and repeats issues

The post‑agent block re‑lists vector DBs and re‑queries vector_io.query with almost identical parameters, again logging the full payload and re‑importing RAGChunk/ReferencedDocument locally. It also builds RAGChunk instances with content=str(chunk.content) and source from loosely typed attributes.

Even if you decide to keep the second query (e.g., for different params vs. Solr filtering), it should:

  • Reuse the module‑level RAGChunk / ReferencedDocument imports.
  • Avoid logging the full payload at INFO.
  • Normalize content and source types the same way as in the pre‑agent block (e.g., interleaved_content_as_str + str | None guarded from metadata).
  • Consider reusing the initial vector_db_ids instead of calling client.vector_dbs.list() again.

You may also want to reconsider whether two separate vector_io.query calls per request are necessary; a single call whose results are reused for both RAG context injection and final rag_chunks / referenced_documents would simplify the flow and reduce load on the vector backend.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60c3c4f and 02f8f19.

📒 Files selected for processing (18)
  • docs/config.html (12 hunks)
  • docs/config.md (6 hunks)
  • requirements.aarch64.txt (21 hunks)
  • requirements.hermetic.txt (1 hunks)
  • requirements.x86_64.txt (21 hunks)
  • src/app/endpoints/query.py (10 hunks)
  • src/models/requests.py (2 hunks)
  • src/models/responses.py (19 hunks)
  • src/utils/checks.py (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1 hunks)
  • tests/e2e/features/conversation_cache_v2.feature (2 hunks)
  • tests/e2e/features/environment.py (1 hunks)
  • tests/integration/endpoints/test_query_v2_integration.py (10 hunks)
  • tests/unit/app/endpoints/test_rags.py (2 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (15 hunks)
  • tests/unit/authentication/test_api_key_token.py (3 hunks)
  • tests/unit/models/responses/test_error_responses.py (4 hunks)
  • tests/unit/models/responses/test_successful_responses.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/models/requests.py
🧰 Additional context used
📓 Path-based instructions (7)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/e2e/features/environment.py
  • tests/integration/endpoints/test_query_v2_integration.py
  • tests/unit/models/responses/test_successful_responses.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/models/responses/test_error_responses.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/endpoints/test_rags.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/integration/endpoints/test_query_v2_integration.py
  • tests/unit/models/responses/test_successful_responses.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/models/responses/test_error_responses.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/endpoints/test_rags.py
tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (CLAUDE.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

  • tests/e2e/features/conversation_cache_v2.feature
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/utils/checks.py
  • src/app/endpoints/query.py
  • src/models/responses.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/query.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Pydantic configuration classes must extend ConfigurationBase; data models must extend BaseModel

Files:

  • src/models/responses.py
🧠 Learnings (9)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/features/environment.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/unit/authentication/test_api_key_token.py
📚 Learning: 2025-08-25T09:11:38.701Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 445
File: tests/e2e/features/environment.py:0-0
Timestamp: 2025-08-25T09:11:38.701Z
Learning: In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared, so per-scenario state should be reset in hooks to maintain test isolation.

Applied to files:

  • tests/e2e/features/environment.py
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-08-18T10:57:39.266Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:59-59
Timestamp: 2025-08-18T10:57:39.266Z
Learning: In the lightspeed-stack project, transitive dependencies like faiss-cpu are intentionally pinned as top-level dependencies to maintain better control over the dependency graph and avoid version conflicts when bundling ML/LLM tooling packages.

Applied to files:

  • requirements.aarch64.txt
  • requirements.x86_64.txt
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.

Applied to files:

  • requirements.aarch64.txt
  • requirements.x86_64.txt
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

  • requirements.aarch64.txt
  • requirements.x86_64.txt
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Use Python package manager `uv` with `uv run` prefix for all development commands

Applied to files:

  • requirements.hermetic.txt
🧬 Code graph analysis (5)
tests/e2e/features/environment.py (1)
tests/e2e/utils/utils.py (2)
  • switch_config (98-106)
  • restart_container (130-144)
tests/unit/models/responses/test_successful_responses.py (1)
src/models/responses.py (3)
  • RAGChunk (25-30)
  • ToolCall (33-38)
  • ToolsResponse (90-125)
tests/unit/app/endpoints/test_streaming_query_v2.py (2)
src/app/endpoints/streaming_query_v2.py (1)
  • streaming_query_endpoint_handler_v2 (318-345)
src/models/requests.py (1)
  • QueryRequest (73-275)
src/app/endpoints/query.py (2)
src/models/responses.py (4)
  • ToolCall (33-38)
  • model (1525-1529)
  • ReferencedDocument (326-339)
  • RAGChunk (25-30)
src/utils/types.py (2)
  • TurnSummary (135-234)
  • RAGChunk (127-132)
src/models/responses.py (2)
src/utils/types.py (1)
  • RAGChunk (127-132)
src/quota/quota_exceed_error.py (1)
  • QuotaExceedError (6-38)
🪛 GitHub Actions: Black
tests/integration/endpoints/test_query_v2_integration.py

[error] 84-84: Cannot parse for target version Python 3.13: 84:0: <<<<<<< HEAD

tests/unit/models/responses/test_successful_responses.py

[error] 1-1: Cannot parse for target version Python 3.13: 1:0: <<<<<<< HEAD


[warning] Black would reformat this file. Run 'black --write' to fix code style issues.

tests/unit/authentication/test_api_key_token.py

[error] 5-5: Cannot parse for target version Python 3.13: 5:0: <<<<<<< HEAD

tests/unit/models/responses/test_error_responses.py

[error] 14-14: Cannot parse for target version Python 3.13: 14:0: <<<<<<< HEAD


[warning] Black would reformat this file. Run 'black --write' to fix code style issues.

tests/unit/app/endpoints/test_streaming_query_v2.py

[error] 1-1: Cannot parse for target version Python 3.13: 1:0: <<<<<<< HEAD

tests/unit/app/endpoints/test_rags.py

[error] 193-193: Cannot parse for target version Python 3.13: 193:0: <<<<<<< HEAD

src/app/endpoints/query.py

[error] Black would reformat this file. Run 'black --write' to fix code style issues.

src/models/responses.py

[error] 1-1: Black would reformat this file. Run 'black --write' to fix code style issues.

🪛 GitHub Actions: Integration tests
tests/integration/endpoints/test_query_v2_integration.py

[error] 84-84: SyntaxError: invalid syntax due to merge conflict markers (<<<<<<< HEAD)

🪛 GitHub Actions: Pyright
src/app/endpoints/query.py

[error] 753-753: pyright: Operator '+' not supported for 'None' (reportOptionalOperand).


[error] 778-778: pyright: 'vector_db_ids' is possibly unbound (reportPossiblyUnboundVariable).


[error] 816-816: pyright: Argument missing for parameter 'doc_id' (reportCallIssue).


[error] 840-840: pyright: Argument of type 'float | str | List[object] | object | Literal[True]' cannot be assigned to parameter 'url' of type 'AnyStr@urljoin | None' in function 'urljoin'.


[error] 848-848: pyright: 'RAGChunk' is possibly unbound (reportPossiblyUnboundVariable).


[error] 849-849: pyright: Argument of type 'InterleavedContent' cannot be assigned to parameter 'content' of type 'str' in function 'init'.


[error] 850-850: pyright: Argument of type 'str | bool | float | List[object] | object | None' cannot be assigned to parameter 'source' of type 'str | None' in function 'init'.


[error] 875-875: pyright: Argument of type 'list[str | ToolgroupAgentToolGroupWithArgs] | Unknown | list[str] | None' cannot be assigned to parameter 'toolgroups' of type 'List[Toolgroup] | None'.


[error] 937-937: pyright: Argument missing for parameter 'tool_results' (reportCallIssue).

🪛 GitHub Actions: Python linter
tests/integration/endpoints/test_query_v2_integration.py

[error] 84-84: Pylint parsing failed: 'invalid syntax (tests.integration.endpoints.test_query_v2_integration, line 84)'

tests/unit/models/responses/test_successful_responses.py

[error] 56-56: Pylint parsing failed: 'unmatched ')' (tests.unit.models.responses.test_successful_responses, line 56)'

tests/unit/authentication/test_api_key_token.py

[error] 5-5: Pylint parsing failed: 'invalid syntax (tests.unit.authentication.test_api_key_token, line 5)'

tests/unit/models/responses/test_error_responses.py

[error] 14-14: Pylint parsing failed: 'invalid syntax (tests.unit.models.responses.test_error_responses, line 14)'

tests/unit/app/endpoints/test_streaming_query_v2.py

[error] 1-1: Pylint parsing failed: 'invalid syntax (tests.unit.app.endpoints.test_streaming_query_v2, line 1)'

tests/unit/app/endpoints/test_rags.py

[error] 193-193: Pylint parsing failed: 'expected an indented block after function definition on line 192 (tests.unit.app.endpoints.test_rags, line 193)'

🪛 GitHub Actions: Ruff
tests/integration/endpoints/test_query_v2_integration.py

[error] 84-84: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

tests/unit/models/responses/test_successful_responses.py

[error] 1-100: Merge conflict markers detected in file (<<<<<<< HEAD / ======= / >>>>>>> upstream/main). Resolve merge conflicts to restore valid syntax.

tests/unit/authentication/test_api_key_token.py

[error] 5-5: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

tests/unit/models/responses/test_error_responses.py

[error] 14-14: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

tests/unit/app/endpoints/test_streaming_query_v2.py

[error] 1-100: Merge conflict markers detected in file (<<<<<<< HEAD / ======= / >>>>>>> upstream/main). Resolve merge conflicts to restore valid syntax.

tests/unit/app/endpoints/test_rags.py

[error] 192-192: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

src/app/endpoints/query.py

[error] 14-14: F401: 'pydantic.ValidationError' imported but unused. Remove unused import.


[error] 1-1: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).

🪛 GitHub Actions: Type checks
src/app/endpoints/query.py

[error] 753-753: mypy: Unsupported left operand type for + ("None").


[error] 793-793: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | NotGiven'.


[error] 840-840: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'.

🪛 GitHub Actions: Unit tests
tests/unit/models/responses/test_error_responses.py

[error] 14-14: SyntaxError: invalid syntax due to merge conflict markers (<<<<< HEAD) in test file

tests/unit/app/endpoints/test_streaming_query_v2.py

[error] 1-1: SyntaxError: invalid syntax due to merge conflict markers (<<<<< HEAD) in test file

tests/unit/app/endpoints/test_rags.py

[error] 193-193: IndentationError / SyntaxError: invalid syntax due to merge conflict markers (e.g., '<<<<<<< HEAD') in test_rags.py

🪛 LanguageTool
docs/config.md

[style] ~323-~323: Consider using “inaccessible” to avoid wordiness.
Context: ...figured in the llama-stack run.yaml are not accessible to lightspeed-core agents. ======= MCP ...

(NOT_ABLE_PREMIUM)


[style] ~328-~328: Consider using “inaccessible” to avoid wordiness.
Context: ...figured in the llama-stack run.yaml are not accessible to lightspeed-core agents. >>>>>>> upst...

(NOT_ABLE_PREMIUM)

🪛 markdownlint-cli2 (0.18.1)
docs/config.md

10-10: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)


59-59: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)


59-59: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)


59-59: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


60-60: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)


60-60: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)


60-60: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


61-61: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


318-318: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)


351-351: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)


472-472: Heading style
Expected: atx; Actual: setext

(MD003, heading-style)

🪛 YAMLlint (1.37.1)
tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml

[error] 19-19: syntax error: could not find expected ':'

(syntax)

🔇 Additional comments (20)
tests/unit/authentication/test_api_key_token.py (1)

134-139: Verify inconsistent detail format across error cases.

This test expects detail to be a plain string ("Invalid API Key"), while other error tests expect detail to be a dict with a "cause" key. Confirm this asymmetry is intentional in the implementation, or align the assertion style for consistency.

tests/integration/endpoints/test_query_v2_integration.py (1)

398-405: Resolve merge conflicts for tool call field name assertions.

Unresolved merge conflict markers at three locations prevent the file from being parsed:

  • Line 398-405: test_query_v2_endpoint_with_tool_calls
  • Line 469-476: test_query_v2_endpoint_with_mcp_list_tools
  • Line 543-547: test_query_v2_endpoint_with_multiple_tool_types

HEAD uses tool_calls[0].tool_name, while upstream/main uses tool_calls[0].name. Determine the correct field name by checking the ToolCallSummary model definition, then resolve all three conflicts consistently. Also remove the conditional if response.rag_chunks: guard introduced in HEAD (lines 401-402) if it was not in upstream/main.

requirements.aarch64.txt (7)

1234-1236: Clean up conflict markers in requirements.aarch64.txt.

Lines 1234-1236 contain stray conflict markers (<<<<<<, =======, >>>>>>>) around a comment. Regenerate the file with uv after resolving the conflict between upstream and HEAD.


2587-2610: Resolve protobuf version conflict in requirements.aarch64.txt (6.33.1 vs 6.33.2). Pick one version and ensure it's consistent with googleapis-common-protos and opentelemetry-proto dependencies. Update pyproject.toml if needed.


1508-1531: Versions shown are already in lockstep, not diverged.

The snippet shows versions moving together: litellm 1.80.7→1.80.8, llama-stack 0.2.22→0.3.0, and llama-stack-client 0.2.22→0.3.0. All three are synchronized. Verify the actual merged state of requirements.aarch64.txt to confirm this was resolved correctly.

Likely an incorrect or invalid review comment.


3001-3007: Verify python-jose necessity before merging; confirm it's not redundant with existing authlib/pyjwt stack.

The review raises a valid concern about duplicate JWT/cryptography dependencies—repo already has authlib and pyjwt. However, direct verification of python-jose usage in the codebase could not be completed due to technical constraints. Before merging:

  • Search the codebase for from jose import or import jose patterns to confirm if python-jose is actually used
  • If unused, remove python-jose and regenerate lock files
  • If required, document the rationale in code comments or dependency documentation
  • Ensure cryptography dependency pins remain consistent across the stack

Also check lines 639-644, 3525-3527, 3374-3379 for consistency.


1494-1501: Resolve kubernetes version conflict between 33.1.0 and 34.1.0.

This merge conflict requires a decision on which version to use. Consult the kubernetes-client/python CHANGELOG between v33.1.0 and v34.1.0 to identify any breaking changes, then verify compatibility with your kubernetes client usage patterns and regenerate with uv.


2098-2110: Resolve openai version drift (2.8.1 vs 2.9.0) by pinning a single version.

Version 2.9.0 includes streaming fixes (stream closure, response config handling) but maintains API compatibility with 2.8.1; no breaking changes to streaming behavior. However, verify jiter (Rust-based JSON parser dependency) compatibility in your target environments, as it can be problematic in some runtimes. Align the pin in pyproject.toml and recompile with uv.


1830-1837: Resolve the mcp version conflict (1.22.0 vs 1.23.1) in requirements.aarch64.txt.

This file contains a merge conflict. Determine which version is specified in pyproject.toml and update the requirements file accordingly, or manually choose the intended version and regenerate the locked requirements if this file is auto-generated.

requirements.x86_64.txt (5)

2639-2706: Retain psycopg2-binary pin (policy).

psycopg2-binary is correctly pinned; ensure it remains in pyproject and survives re‑compile since some llama‑stack providers require it. Based on learnings, ...


3001-3007: Verify python-jose dependency requirement before resolving merge conflict.

This file shows a merge conflict where HEAD adds python-jose==3.5.0 while upstream removes it. Before resolving, confirm whether python-jose is actually used in the codebase. Search for jose imports and usage (including jose.jwt and related auth functions). If the new auth implementation doesn't depend on it, remove the dependency from both the requirements file and pyproject.toml. If it is required, ensure the pinned version and hashes are intentional.

Note: This concern also applies to the related dependency occurrences at lines 639-644, 2771-2773, 3525-3527, and 3374-3379.


605-607: The review comment addresses a critical issue (unresolved merge conflicts in lockfiles making them unusable), but verification of the specific technical claims requires direct access to the repository code, which is currently unavailable.


1000-1117: Resolve the greenlet merge conflict by choosing version 3.3.0.

Remove the conflicting versions (3.2.4 vs 3.3.0) and standardize on greenlet 3.3.0, which is compatible with SQLAlchemy 2.0.44 and includes wheels and improved support for CPython 3.14. Also applies to lines 3607-3613.


3808-3818: Prefer urllib3==2.6.1; both versions compatible with requests 2.32.5 and kubernetes client.

urllib3 2.6.1 and 2.3.0 are both compatible with requests 2.32.5 (which supports urllib3 >=1.21.1 and <3). The kubernetes Python client imposes no specific urllib3 version constraint and uses the environment's installed version. Accept the upstream version (2.6.1) as it is newer and includes recent fixes. Before merging, verify no other transitive dependencies in the environment pin urllib3<2.6 (e.g., older botocore versions can conflict), though this is unlikely given the upstream choice.

src/models/responses.py (3)

332-339: ReferencedDocument extension looks consistent

The addition of doc_id and widening doc_url to AnyUrl | str | None match how it’s used in query.py (vector DB metadata and Solr docs). No issues from a modeling perspective.


370-452: QueryResponse RAG/tool fields wiring is sound

Adding rag_chunks: list[RAGChunk] and tool_calls: list[ToolCall] | None plus updating the examples keeps the model aligned with the new query pipeline (TurnSummary + tool execution metadata). Defaults ([] / None) also match the usage in query.py.


1102-1143: Abstract error response OpenAPI helpers look correct

The new AbstractErrorResponse.__init__, get_description(), and openapi_response() implementations are consistent with the subclasses and the tests (labelled examples, description-from-class-variable, JSON examples under detail). No functional issues here.

src/app/endpoints/query.py (3)

427-445: Improved database and rate‑limit error handling looks good

Catching SQLAlchemyError with logger.exception and mapping it to InternalServerErrorResponse.database_error(), plus translating RateLimitError into QuotaExceededResponse.model(), is consistent with the error model and the FastAPI HTTPException pattern you’re using elsewhere.


1008-1035: get_rag_toolgroups helper matches Toolgroup typing, but consider simplifying call sites

The helper correctly returns list[Toolgroup] | None, using ToolgroupAgentToolGroupWithArgs for the RAG knowledge search tool. This is compatible with the llama‑stack types.

Given the call sites now need to deal with None and concatenate with MCP toolgroup names, using this helper together with the “extend and or None” pattern (shown in the earlier diff) will avoid None + list issues and make the intent clearer.

No functional issues in the helper itself.


746-769: Fix vector DB / RAG setup: unbound vector_db_ids, type issues, and RAGChunk construction

This block has several correctness and typing issues that CI is flagging:

  • vector_db_ids is only assigned inside the else branch of the no_tools check, but it's used here unconditionally (if vector_db_ids:). When query_request.no_tools is True, this will be empty/undefined, leading to the pyright "possibly unbound" and a potential runtime error.
  • toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups is problematic because get_rag_toolgroups can return None, and you're adding a list of str to a list of Toolgroup (type mismatch and None + list risk). This is what mypy/pyright are complaining about.
  • params = {"k": 5, "score_threshold": 0.0} does not match the union type expected by AsyncVectorIoResource.query; CI reports an incompatible dict type.
  • urljoin("https://mimir.corp.redhat.com", parent_id) is called with parent_id from chunk metadata, which is typed as a very broad object union; pyright rightfully complains about AnyStr incompatibility.
  • RAGChunk(content=chunk.content, source=source, score=score) passes an InterleavedContent and an untyped metadata value into fields typed as str / str | None, which triggers the content/source type errors.

Consider applying the suggested fixes to resolve the unbound variable, type mismatches for params, urljoin, and RAGChunk fields, and remove the INFO‑level logging of the full vector DB payload. Re‑run mypy/pyright after applying.

@Anxhela21 Anxhela21 force-pushed the anx/streaming_query branch 2 times, most recently from 60c3c4f to c971439 Compare December 10, 2025 14:46
@Anxhela21 Anxhela21 changed the base branch from main to TamiTakamiya/async-agent December 10, 2025 14:54
@Anxhela21 Anxhela21 changed the base branch from TamiTakamiya/async-agent to main December 10, 2025 14:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (11)
src/utils/types.py (1)

36-50: Align get_tool_calls signature, docstring, and guard condition.

The method is typed as get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall], but the docstring still documents CompletionMessage | None and you guard with if output_message and output_message.tool_calls:. Given the non‑optional type, that if output_message is redundant and the docstring is misleading.

Consider tightening this to keep everything in sync:

-    def get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall]:
+    def get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall]:
@@
-        Parameters:
-            output_message (CompletionMessage | None): Completion
-            message potentially containing `tool_calls`.
+        Parameters:
+            output_message (CompletionMessage): Completion message potentially
+            containing `tool_calls`.
@@
-        if output_message and output_message.tool_calls:
-            return output_message.tool_calls
-        return []
+        return output_message.tool_calls or []

(or, if None is actually possible at call sites, make the parameter CompletionMessage | None and keep the guard.)

src/utils/endpoints.py (1)

345-375: Inconsistent return order between get_agent and get_temp_agent.

get_agent returns (agent, conversation_id, session_id) (line 342), but get_temp_agent returns (agent, session_id, conversation_id) (line 375). This inconsistency is error-prone for callers.

Additionally, the docstring (lines 359-360) says it returns tuple[AsyncAgent, str] with "(agent, session_id)" but the actual return type annotation shows tuple[AsyncAgent, str, str].

-    Returns:
-        tuple[AsyncAgent, str]: A tuple containing the agent and session_id.
+    Returns:
+        tuple[AsyncAgent, str, str]: A tuple containing (agent, conversation_id, session_id).
     """
     ...
-    return agent, session_id, conversation_id
+    return agent, conversation_id, session_id
tests/unit/app/endpoints/test_streaming_query.py (2)

147-172: Assertions inside pytest.raises block are never executed

In test_streaming_query_endpoint_handler_configuration_not_loaded, the status/detail assertions are placed inside the with pytest.raises(HTTPException) as e: block, after the awaited call. Once the exception is raised, code after the call inside the context is not executed, so those assertions never run.

Move the assertions outside the with block, e.g.:

-    with pytest.raises(HTTPException) as e:
-        await streaming_query_endpoint_handler(request, query_request, auth=MOCK_AUTH)
-        assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
-        assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore
+    with pytest.raises(HTTPException) as e:
+        await streaming_query_endpoint_handler(request, query_request, auth=MOCK_AUTH)
+
+    assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+    assert e.value.detail["response"] == "Configuration is not loaded"  # type: ignore

175-205: Connection‑error test likely not exercising the real failure path

test_streaming_query_endpoint_on_connection_error currently:

  • Patches mock_client.models.side_effect = APIConnectionError(...), and
  • Patches client.AsyncLlamaStackClientHolder.get_client twice on the same target, and
  • Only asserts that a StreamingResponse is returned (no check that an error event is actually streamed).

If the production code calls await client.models.list(...) (as other tests suggest by setting mock_client.models.list.return_value), the side effect on models will never be triggered, so this test will not verify the connection‑error behavior.

Consider:

-    mock_client = mocker.AsyncMock()
-    mock_client.models.side_effect = APIConnectionError(request=query_request)  # type: ignore
-    mock_lsc = mocker.patch("client.AsyncLlamaStackClientHolder.get_client")
-    mock_lsc.return_value = mock_client
-    mock_async_lsc = mocker.patch("client.AsyncLlamaStackClientHolder.get_client")
-    mock_async_lsc.return_value = mock_client
+    mock_client = mocker.AsyncMock()
+    mock_client.models.list = mocker.AsyncMock(
+        side_effect=APIConnectionError(request=query_request)  # type: ignore
+    )
+    mocker.patch(
+        "client.AsyncLlamaStackClientHolder.get_client",
+        return_value=mock_client,
+    )

and then asserting that the streamed content actually contains an error event (status 503, appropriate message) rather than only checking the response type.

docs/openapi.json (5)

1366-1373: 422 example contradicts schema: QueryRequest only requires 'query'.

Examples say missing ['query','model','provider'], but model/provider are optional in the schema. Update examples to only require 'query' (or change the schema if intentional).

- "cause": "Missing required attributes: ['query', 'model', 'provider']",
+ "cause": "Missing required attributes: ['query']",

Applies to v1 query 422, v2 query 422, and v2 streaming_query 422.

Also applies to: 3586-3593, 3868-3875


2608-2613: 'not found' delete example sets success=true; fix success=false and grammar.

A failed delete should not return success=true; also prefer “cannot”.

- "response": "Conversation can not be deleted",
- "success": true
+ "response": "Conversation cannot be deleted",
+ "success": false

Reflect the same in components/ConversationDeleteResponse examples.

Also applies to: 5005-5011


1374-1380: Typo: 'attatchment' → 'attachment'.

Correct the spelling across all error examples.

- "cause": "Invalid attatchment type: must be one of [...]
+ "cause": "Invalid attachment type: must be one of [...]

Also applies to: 1654-1661, 3594-3601, 3877-3881, 7702-7707


4284-4290: metrics error responses: text/plain uses JSON object schemas.

For 401/403/500/503 you declare text/plain but reference JSON object schemas. Either:

  • Return JSON only (application/json), or
  • Keep text/plain with a simple string schema.

Recommend JSON-only for errors to match the rest of the API.

- "text/plain": { "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" } }
+ // remove text/plain; keep application/json only

Repeat for 403/500/503 blocks.

Also applies to: 4306-4311, 4328-4333, 4351-4355


67-71: Root 401 text/html schema references a JSON object; use string.

Change text/html schema to a simple string for the error page (or remove if you always emit JSON for errors).

- "text/html": { "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" } }
+ "text/html": { "schema": { "type": "string" } }
src/app/endpoints/query_v2.py (1)

131-136: Potential AttributeError if id attribute is missing.

On line 132, getattr(output_item, "id") is called without a default value, which will raise AttributeError if the attribute doesn't exist. Other similar calls in this function use fallback patterns (e.g., line 90-92 for function_call).

Apply this diff to add a fallback:

         return ToolCallSummary(
-            id=str(getattr(output_item, "id")),
+            id=str(getattr(output_item, "id", "") or ""),
             name=DEFAULT_RAG_TOOL,
             args=args,
             response=response_payload,
         )
src/models/responses.py (1)

1-3: Black formatting check failed.

The pipeline indicates Black formatting issues in this file. Run black src/models/responses.py to fix formatting.

#!/bin/bash
# Check Black formatting issues
uv tool run black --check --diff src/models/responses.py 2>&1 | head -50
♻️ Duplicate comments (18)
tests/unit/authentication/test_api_key_token.py (1)

5-5: Tests read cleanly after conflict resolution; direct detail["cause"] checks are fine here.

The FastAPI imports are now normalized, and the updated assertions on exc_info.value.detail["cause"] keep the tests concise while still validating the exact error causes. Given that you control the exception construction in the auth dependency, this level of defensiveness is sufficient for unit tests.

Also applies to: 74-74, 97-97

requirements.hermetic.txt (1)

1-2: Clarify the rationale for pinning older versions of build tools, particularly pip 24.2 which contains known security vulnerabilities.

The versions pinned in requirements.hermetic.txt are indeed downgrades: uv 0.8.15 (September 3, 2025) is approximately 3 months older than uv 0.9.16 (December 6, 2025), and pip 24.2 (July 28, 2024) is over a year older than pip 25.3 (October 24-25, 2025).

The pip 24.2 downgrade is particularly concerning. This version contains multiple unpatched security vulnerabilities:

  • CVE-2025-8869: Symlink/tar extraction vulnerability allowing file extraction outside target directory (unfixed in pip <25.0)
  • Arbitrary code execution via self_outdated_check: pip loads untrusted code after installation (unfixed in pip <25.0)

If these downgrades are necessary for compatibility with specific components, that rationale should be documented in a comment or README. If not, consider upgrading to the latest stable versions.

src/utils/checks.py (1)

79-91: New parent-directory check is breaking tests; also keep error message aligned with what is checked

The added parent-directory checks change behavior in a way that’s currently failing integration tests:

  • When must_exists=False and must_be_writable=True, a missing parent now raises InvalidConfigurationError ("... cannot be created - parent directory does not exist"). The pipeline failure for '/tmp/data/feedback' shows this is hit in normal test configuration, so either:
    • tests/infra need to ensure the parent directory (e.g. /tmp/data) is created up front, or
    • this check should be relaxed or adjusted (e.g. test writability of the nearest existing ancestor, or let later code os.makedirs() the full path and surface that error instead).

Additionally, the branch that checks os.access(parent_path, os.W_OK) still reports "{desc} '{path}' is not writable", even though the actual check is on the parent directory. To keep the error self-explanatory, consider something like:

-            if not os.access(parent_path, os.W_OK):
-                raise InvalidConfigurationError(f"{desc} '{path}' is not writable")
+            if not os.access(parent_path, os.W_OK):
+                raise InvalidConfigurationError(
+                    f"{desc} '{path}' cannot be created - parent directory '{parent_path}' is not writable"
+                )

This both fixes the message and makes it clear why creation of path would fail.

requirements.aarch64.txt (1)

1409-1411: urllib3 downgrade reintroduces known security issues; resolve via authoritative constraints, not this lockfile.

You’ve moved to kubernetes==34.1.0 while downgrading urllib3 to 2.3.0. Earlier analysis on this PR already noted that 2.3.x carries known redirect/info‑disclosure and DoS‑style vulnerabilities that were fixed in later 2.x releases, and recommended landing on a secure urllib3 version that also satisfies the kubernetes client’s constraint.

Right now this looks like the compatibility problem was “fixed” by picking an insecure urllib3. Instead:

  • Decide the authoritative constraint (in pyproject.toml or your constraints file) for both kubernetes and urllib3 so you can use a version of urllib3 that includes the security fixes (per the earlier bot comment, a >=2.5.x/2.6.x line, or whatever the current secure guidance is) while remaining within the kubernetes client’s supported range.
  • Re‑run uv pip compile to regenerate this file instead of editing requirements.aarch64.txt by hand.

Until that’s done, this environment is exposed to the urllib3 issues that motivated the original upgrade.

To double‑check the current safe combo for this exact kubernetes client version, please run a web search like:

What urllib3 version range is officially supported by the Python kubernetes client version 34.1.0, and what is the earliest urllib3 2.x release that includes fixes for the 2.3.x redirect/info‑disclosure and decompression/chained-encoding DoS issues?

Also applies to: 3601-3603

src/app/endpoints/query.py (4)

796-796: Avoid logging entire query response payloads at INFO level.

This was flagged in a past review. Logging the full query_response payload may expose sensitive document content.


929-941: Fix typo: "attatchment" should be "attachment".

This was flagged in a past review. The typo appears in error messages at lines 930 and 940.


836-852: Fix type errors in RAGChunk construction.

Pipeline reports multiple type errors:

  • Line 840: urljoin receives object instead of str
  • Line 848: RAGChunk is possibly unbound (local import issue)
  • Line 849: chunk.content is InterleavedContent, not str
  • Line 850: source type mismatch

The RAGChunk import at lines 799-800 is inside a conditional block and may not execute. Also, chunk.metadata.get() returns object, not str.

-        if chunk.metadata:
-            if OFFLINE:
-                parent_id = chunk.metadata.get("parent_id")
-                if parent_id:
-                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
+        if chunk.metadata:
+            if OFFLINE:
+                parent_id = chunk.metadata.get("parent_id")
+                if parent_id and isinstance(parent_id, str):
+                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
             else:
-                source = chunk.metadata.get("reference_url")
+                ref_url = chunk.metadata.get("reference_url")
+                source = ref_url if isinstance(ref_url, str) else None
         # ...
         rag_chunks.append(
             RAGChunk(
-                content=chunk.content,
+                content=str(chunk.content) if chunk.content else "",
                 source=source,
                 score=score,
             )
         )

Also, move the RAGChunk import to module level (this was flagged in past review).


14-14: Remove unused import ValidationError.

Pipeline failure confirms this import is unused. Remove it to fix the Ruff F401 error.

-from pydantic import ValidationError
src/app/endpoints/streaming_query.py (5)

81-84: OFFLINE flag duplicated from query.py.

This was flagged in a past review. Extract to a shared constants module.


925-930: ResponseGeneratorContext does not have vector_io_rag_chunks or vector_io_referenced_docs attributes.

Pipeline reports Cannot assign to attribute "vector_io_rag_chunks" and "vector_io_referenced_docs". The hasattr check will always return False for a dataclass/Pydantic model without these fields defined.

This was flagged in a past review. Add these fields to ResponseGeneratorContext.


1022-1022: Avoid logging entire query response payloads.

This was flagged in a past review. Move to DEBUG level and log metadata only.

Also applies to: 1264-1264


1046-1051: Hardcoded URL should be extracted to configuration.

This was flagged in a past review. The URL https://mimir.corp.redhat.com appears multiple times.

Also applies to: 1067-1069


1243-1298: Significant code duplication with query_vector_io_for_chunks.

This was flagged in a past review. The vector DB querying logic is duplicated between query_vector_io_for_chunks (lines 982-1091) and retrieve_response (lines 1243-1298). Additionally, query_vector_io_for_chunks is called at line 906-908 AND retrieve_response queries vector DB again, resulting in duplicate queries per request.

src/app/endpoints/conversations_v2.py (1)

99-101: Typo "Converastion" still present.

The typo "Converastion" should be "Conversation" at lines 99, 126, and 161.

docs/config.html (1)

205-247: Clarify auth config references and RHIdentity semantics in schema doc

In the AuthenticationConfiguration table, the jwk_config and rh_identity_config rows have empty Type/Description cells, even though dedicated sections for JwkConfiguration and RHIdentityConfiguration exist below. Consider:

  • Setting the Type to the concrete schema name (JwkConfiguration, RHIdentityConfiguration), and
  • Adding a brief description (e.g., “JWT-based auth via JWK” / “Red Hat Identity-based auth”).

In the RHIdentityConfiguration section, it would also help to explicitly document what happens when required_entitlements is omitted or an empty list (e.g., whether entitlement checks are skipped vs. requests fail). This keeps runtime behavior unambiguous for operators.

Also applies to: 1086-1103

tests/e2e/configs/run-azure.yaml (1)

93-103: Parameterize Braintrust OpenAI API key via env var instead of literal

In the Azure E2E config, the braintrust scoring provider uses openai_api_key: '********'. Similar to the CI config, it’s preferable to reference an environment variable (or remove the provider if unused) so the key is not hard‑coded and can be provided securely at runtime, e.g.:

  scoring:
    - provider_id: braintrust
      provider_type: inline::braintrust
      config:
        openai_api_key: ${env.BRAINTRUST_OPENAI_API_KEY}

This keeps the config consistent with how other credentials (e.g., AZURE_API_KEY) are handled.

src/models/responses.py (2)

746-762: success field still hardcoded to True.

This issue was flagged in a previous review but remains unfixed. When deleted=False, success should also be False.

         super().__init__(
             conversation_id=conversation_id,  # type: ignore[call-arg]
-            success=True,  # type: ignore[call-arg]
+            success=deleted,  # type: ignore[call-arg]
             response=response_msg,  # type: ignore[call-arg]
         )

775-782: Example inconsistency remains unfixed.

The "not found" example still shows success: True and uses "can not be deleted" (with space) while the code uses "cannot be deleted" (no space).

                 {
                     "label": "not found",
                     "value": {
                         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
-                        "success": True,
-                        "response": "Conversation can not be deleted",
+                        "success": False,
+                        "response": "Conversation cannot be deleted",
                     },
                 },
🧹 Nitpick comments (21)
tests/integration/test_openapi_json.py (2)

87-99: Type annotation on spec could be more precise/consistent

Using bare dict works, but other helpers in this file use dict[str, Any]. For consistency and stricter typing, consider annotating spec as dict[str, Any] or Mapping[str, Any] instead of plain dict.


185-187: Fixture argument type hints now use plain dict

Switching spec_from_file/spec_from_url annotations from dict[str, Any] to dict is harmless at runtime but slightly reduces type precision and differs from other helpers. Consider standardizing on one style (e.g., dict[str, Any] everywhere or Mapping[str, Any]) for readability.

Also applies to: 255-257

docker-compose.yaml (1)

26-26: Debug logging flag is reasonable; consider making it easily tunable.

Enabling LLAMA_STACK_LOGGING=all=debug on the llama‑stack service is helpful while stabilizing the new flows. If this compose file is also used beyond local dev, you might optionally gate this via an env default (e.g., ${LLAMA_STACK_LOGGING:-all=debug}) so you can dial it back without editing the file, and confirm that all=debug matches the logging syntax for the llama‑stack version you now support.

scripts/query_llm.py (1)

16-16: Docstring is correct but could be slightly more descriptive.

The new main() docstring is accurate but very generic. Optional: expand it to mention that it sends a single /v1/query request to the local LLM endpoint and prints the response plus timing, which would make the script’s intent clearer when read in isolation.

scripts/generate_openapi_schema.py (1)

26-27: Consider retaining parameter and return value documentation in docstrings.

The docstrings for read_version_from_openapi and read_version_from_pyproject have been condensed to single-line descriptions. While concise, they now lack parameter and return type documentation that would help future maintainers understand the function contracts.

Per coding guidelines requiring Google Python docstring conventions, consider including Args and Returns sections:

 def read_version_from_openapi(filename: str) -> str:
-    """Read version from OpenAPI.json file."""
+    """Read version from OpenAPI.json file.
+    
+    Args:
+        filename: Path to the OpenAPI.json file.
+        
+    Returns:
+        Version string extracted from the 'info.version' field.
+    """

As per coding guidelines, all functions should have docstrings with brief descriptions following Google Python docstring conventions.

Also applies to: 38-39

docs/config.md (1)

300-304: Optional: Consider more concise wording.

The MCP server description has been updated for clarity. The static analysis tool suggests using "inaccessible" instead of "not accessible" on line 304 for more concise wording, though the current phrasing is perfectly clear and correct.

If you prefer more concise wording:

-Only MCP servers defined in the lightspeed-stack.yaml configuration are
-available to the agents. Tools configured in the llama-stack run.yaml
-are not accessible to lightspeed-core agents.
+Only MCP servers defined in the lightspeed-stack.yaml configuration are
+available to the agents. Tools configured in the llama-stack run.yaml
+are inaccessible to lightspeed-core agents.

However, the current wording is clear and acceptable as-is.

src/utils/types.py (2)

92-163: RAG chunk extraction silently drops some valid-but-unexpected JSON; consider a more forgiving fallback.

_extract_rag_chunks_from_response handles the expected shapes well (dict with "chunks", or list of dicts), and falls back to a single chunk when JSON parsing fails or when a structural exception is raised. But there are two notable edge cases where you currently drop the response entirely:

  • response_content parses as a dict without a "chunks" key (e.g. {"text": "foo"}) – no chunks are appended and no exception is raised, so the outer except doesn’t trigger.
  • It parses as a list of non‑dict items (e.g. ["foo", "bar"]) – they’re skipped by the isinstance(chunk, dict) guard without a fallback.

If you want to retain as much signal as possible, a small tweak like this would keep behavior robust while preserving the structured cases:

-                if isinstance(data, dict) and "chunks" in data:
-                    for chunk in data["chunks"]:
-                        self.rag_chunks.append(
-                            RAGChunk(
-                                content=chunk.get("content", ""),
-                                source=chunk.get("source"),
-                                score=chunk.get("score"),
-                            )
-                        )
-                elif isinstance(data, list):
-                    # Handle list of chunks
-                    for chunk in data:
-                        if isinstance(chunk, dict):
-                            self.rag_chunks.append(
-                                RAGChunk(
-                                    content=chunk.get("content", str(chunk)),
-                                    source=chunk.get("source"),
-                                    score=chunk.get("score"),
-                                )
-                            )
+                if isinstance(data, dict) and "chunks" in data:
+                    for chunk in data["chunks"]:
+                        self.rag_chunks.append(
+                            RAGChunk(
+                                content=chunk.get("content", ""),
+                                source=chunk.get("source"),
+                                score=chunk.get("score"),
+                            )
+                        )
+                elif isinstance(data, list):
+                    # Handle list of chunks (dicts or plain values)
+                    for chunk in data:
+                        if isinstance(chunk, dict):
+                            self.rag_chunks.append(
+                                RAGChunk(
+                                    content=chunk.get("content", str(chunk)),
+                                    source=chunk.get("source"),
+                                    score=chunk.get("score"),
+                                )
+                            )
+                        else:
+                            self.rag_chunks.append(
+                                RAGChunk(
+                                    content=str(chunk),
+                                    source=DEFAULT_RAG_TOOL,
+                                    score=None,
+                                )
+                            )
+                else:
+                    # Unknown JSON shape: keep the whole payload as a single chunk
+                    if response_content.strip():
+                        self.rag_chunks.append(
+                            RAGChunk(
+                                content=response_content,
+                                source=DEFAULT_RAG_TOOL,
+                                score=None,
+                            )
+                        )

That way, any non‑empty response still yields at least one RAGChunk, even when the tool payload format drifts slightly.


15-29: Minor guideline nits: logging and type hints on Singleton.__call__.

Two small style/guideline points:

  • Module has no logger = logging.getLogger(__name__) despite the project guideline to standardize logging per–module.
  • Singleton.__call__ is currently untyped (*args, **kwargs only, # type: ignore); guidelines call for complete type annotations on functions, even if the return type is just Singleton/object.

These don’t block functionality, but cleaning them up would keep this module aligned with the rest of the stack.

Also applies to: 89-96

src/app/endpoints/query.py (1)

818-821: Hardcoded URL should be extracted to configuration.

The URL https://mimir.corp.redhat.com is hardcoded. This should be moved to configuration for flexibility across environments.

+# At module level or in configuration
+MIMIR_BASE_URL = configuration.mimir_base_url or "https://mimir.corp.redhat.com"
+
 # In the code:
-                            doc_url="https://mimir.corp.redhat.com"
-                            + reference_doc,
+                            doc_url=urljoin(MIMIR_BASE_URL, reference_doc),
src/app/endpoints/streaming_query.py (1)

1126-1126: Function retrieve_response has too many statements (75/50).

Pipeline reports this function exceeds the statement limit. Consider extracting the RAG context injection logic into a helper function.

async def _build_rag_context(
    client: AsyncLlamaStackClient,
    query_request: QueryRequest,
    vector_db_ids: list[str],
) -> tuple[list[RAGChunk], str]:
    """Build RAG context from vector DB query."""
    rag_chunks: list[RAGChunk] = []
    rag_context = ""
    # ... extracted logic from lines 1243-1313
    return rag_chunks, rag_context
run.yaml (1)

121-121: Hardcoded Solr URL may need environment configuration.

The solr_url: "http://localhost:8983/solr" is hardcoded. For production deployments across different environments, consider using an environment variable.

       config:
-        solr_url: "http://localhost:8983/solr"
+        solr_url: ${env.SOLR_URL:-http://localhost:8983/solr}
tests/integration/endpoints/test_query_v2_integration.py (1)

307-372: Re‑evaluate LCORE‑1025 skips for tool‑call integration tests

Both test_query_v2_endpoint_with_tool_calls and test_query_v2_endpoint_with_multiple_tool_types are permanently skipped via @pytest.mark.skip(reason="LCORE-1025: ToolCallSummary.response type mismatch"). These tests exercise the new RAG/tool-call surface and are valuable regression coverage.

If LCORE‑1025 is now resolved (the ToolCall / ToolCallSummary shapes look stabilized elsewhere in this PR), consider removing the skips so these paths are exercised again. If the issue is still open, keeping the skip is fine but it’s worth confirming the ticket status and planning to re‑enable once fixed.

Also applies to: 439-507

tests/unit/app/endpoints/test_streaming_query.py (1)

372-757: Inconsistent async test marking may cause tests to be skipped or mis‑run

Several async tests (test_retrieve_response_vector_db_available, test_retrieve_response_no_available_shields, test_retrieve_response_one_available_shields, test_retrieve_response_two_available_shields, test_retrieve_response_four_available_shields, and the attachment variants) are defined as async def but lack @pytest.mark.asyncio, while other async tests in this module are explicitly marked.

Depending on your pytest-asyncio configuration, these functions may not run as intended or may be skipped. For consistency and clarity, consider adding @pytest.mark.asyncio to these async tests as well.

tests/e2e/configs/run-ci.yaml (1)

99-102: Avoid hard‑coding Braintrust OpenAI API key placeholder in CI config

The braintrust scoring provider is configured with a literal openai_api_key: '********'. For CI/e2e configs this is better modeled as an environment‑driven value (or omitted entirely if Braintrust isn’t used in tests), e.g.:

scoring:
  - provider_id: braintrust
    provider_type: inline::braintrust
    config:
      openai_api_key: ${env.BRAINTRUST_OPENAI_API_KEY}

This avoids committing even placeholder secret‑looking values and makes it clear where the real key should come from at runtime.

src/app/endpoints/streaming_query_v2.py (1)

154-162: Consider catching specific exception types.

The broad except Exception clause catches all exceptions, which can mask unexpected errors. Since you're accessing attributes, AttributeError would be more appropriate.

             # Emit start on response.created
             if event_type == "response.created":
                 try:
                     conv_id = getattr(chunk, "response").id
-                except Exception:  # pylint: disable=broad-except
+                except AttributeError:
                     logger.warning("Missing response id!")
                     conv_id = ""
                 yield stream_start_event(conv_id)
                 continue
docs/openapi.json (1)

6631-6645: Constrain media_type to known values.

Provide an enum for media_type to guide clients and validation.

 "media_type": {
-  "anyOf": [ { "type": "string" }, { "type": "null" } ],
+  "anyOf": [
+    { "type": "string", "enum": ["application/json", "text/plain"] },
+    { "type": "null" }
+  ],
   "title": "Media Type",
   "description": "Media type for the response format",
   "examples": ["application/json", "text/plain"]
 }
src/app/endpoints/query_v2.py (3)

138-145: Missing fallback for id attribute in web_search_call.

Similar to the file_search_call case, getattr(output_item, "id") on line 141 lacks a default value. For consistency with the function_call pattern (lines 90-92), consider adding a fallback.

     if item_type == "web_search_call":
         args = {"status": getattr(output_item, "status", None)}
+        call_id = getattr(output_item, "id", None) or getattr(output_item, "call_id", None)
         return ToolCallSummary(
-            id=str(getattr(output_item, "id")),
+            id=str(call_id),
             name="web_search",
             args=args,
             response=None,
         )

409-415: TODO: Document parsing not yet implemented.

The function returns an empty list with a clear TODO explaining the needed work. This is acceptable as a placeholder, but tracking the implementation would be valuable.

Would you like me to open an issue to track the implementation of document parsing from the Responses API?


519-539: Consider making max_num_results configurable.

The max_num_results: 10 is hardcoded. Consider making this configurable via the AppConfig if different use cases require different limits.

src/models/responses.py (1)

367-376: Inconsistent default patterns for rag_chunks vs tool_calls.

rag_chunks defaults to [] (empty list) while tool_calls defaults to None. Consider using consistent defaults for similar collection fields.

     rag_chunks: list[RAGChunk] = Field(
-        [],
+        default_factory=list,
         description="List of RAG chunks used to generate the response",
     )

     tool_calls: list[ToolCall] | None = Field(
-        None,
+        default_factory=list,
         description="List of tool calls made during response generation",
     )
src/models/config.py (1)

1281-1284: Consider adding error handling for file operations.

The dump method opens and writes to a file without exception handling. Consider wrapping in try-except for better error reporting.

     def dump(self, filename: str = "configuration.json") -> None:
         """Dump actual configuration into JSON file."""
-        with open(filename, "w", encoding="utf-8") as fout:
-            fout.write(self.model_dump_json(indent=4))
+        try:
+            with open(filename, "w", encoding="utf-8") as fout:
+                fout.write(self.model_dump_json(indent=4))
+        except OSError as e:
+            raise ValueError(f"Failed to dump configuration to {filename}: {e}") from e

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
run.yaml (1)

140-143: Update provider_shield_id to reference a valid Llama-Guard model.

The provider_shield_id is set to "gpt-3.5-turbo", which is an OpenAI model, not a Llama-Guard model. For the llama-guard provider, provider_shield_id must be a Llama-Guard model identifier (e.g., "meta-llama/Llama-Guard-3-8B").

src/app/endpoints/query.py (3)

878-889: Add missing tool_results parameter to TurnSummary.

The TurnSummary constructor is missing the required tool_results parameter. This will cause a runtime TypeError.

     summary = TurnSummary(
         llm_response=(
             interleaved_content_as_str(response.output_message.content)
             if (
                 getattr(response, "output_message", None) is not None
                 and getattr(response.output_message, "content", None) is not None
             )
             else ""
         ),
         tool_calls=[],
+        tool_results=[],
         rag_chunks=rag_chunks,
     )

745-758: Fix type error: get_rag_toolgroups may return None.

Line 752 concatenates the result of get_rag_toolgroups(vector_db_ids) with mcp_toolgroups, but get_rag_toolgroups can return None. This will cause a TypeError at runtime.

         toolgroups = None
         if vector_db_ids:
-            toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
+            rag_toolgroups = get_rag_toolgroups(vector_db_ids)
+            toolgroups = (rag_toolgroups or []) + mcp_toolgroups
         elif mcp_toolgroups:
             toolgroups = mcp_toolgroups

798-799: Move imports to module level.

Importing RAGChunk and ReferencedDocument inside the function is inefficient. Move these to the module-level imports at the top of the file.

+from models.responses import (
+    ForbiddenResponse,
+    QueryResponse,
+    RAGChunk,
+    ReferencedDocument,
+    ToolCall,
+    UnauthorizedResponse,
+)
...
             if query_response.chunks:
-                from models.responses import RAGChunk, ReferencedDocument
-
                 retrieved_chunks = query_response.chunks
src/app/endpoints/streaming_query.py (3)

77-80: Extract OFFLINE flag to shared constants module.

This constant is duplicated from query.py. Extract it to a shared constants module (e.g., src/app/constants.py) and import from both files to ensure consistency and follow DRY principles.


1045-1046: Extract hardcoded Mimir base URL to configuration.

The URL https://mimir.corp.redhat.com is hardcoded here (and duplicated at lines 1065, 1241). Extract this to a configuration value for flexibility across environments.


1208-1275: Significant code duplication: vector DB querying logic.

Lines 1208-1275 duplicate the vector DB query logic from query_vector_io_for_chunks (lines 979-1088). Both blocks:

  • Query vector_io with identical params setup
  • Process chunks with OFFLINE flag logic
  • Extract metadata and build RAGChunk objects

Consider consolidating by having retrieve_response call query_vector_io_for_chunks to get the chunks, then format the RAG context from those results.

-    # Get RAG chunks before sending to LLM (reuse logic from query_vector_io_for_chunks)
-    rag_chunks = []
-    try:
-        if vector_db_ids:
-            # ... 60+ lines of duplicate logic
-    except Exception as e:
-        logger.warning("Failed to query vector database for chunks: %s", e)
+    # Reuse query_vector_io_for_chunks for consistency
+    rag_chunks, _ = await query_vector_io_for_chunks(client, query_request)
     
     # Format RAG context for injection into user message
     rag_context = ""
🧹 Nitpick comments (1)
lightspeed-stack.yaml (1)

24-28: Consider persistent storage for production deployments.

The conversation cache uses /tmp/data/conversation-cache.db which won't survive container restarts. This is fine for development/testing as indicated by the comment, but production deployments should use a persistent volume mount or external database.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c971439 and 4ca24a7.

📒 Files selected for processing (8)
  • lightspeed-stack.yaml (1 hunks)
  • run.yaml (3 hunks)
  • src/app/endpoints/query.py (8 hunks)
  • src/app/endpoints/streaming_query.py (14 hunks)
  • src/models/requests.py (2 hunks)
  • src/models/responses.py (1 hunks)
  • tests/unit/app/endpoints/test_conversations_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/models/requests.py
  • src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/unit/app/endpoints/test_query.py
  • tests/unit/app/endpoints/test_conversations_v2.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/unit/app/endpoints/test_query.py
  • tests/unit/app/endpoints/test_conversations_v2.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
🧠 Learnings (1)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).

Applied to files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (1)
src/app/endpoints/streaming_query.py (4)
src/models/responses.py (2)
  • RAGChunk (149-154)
  • ReferencedDocument (179-191)
src/utils/types.py (1)
  • TurnSummary (89-163)
src/models/requests.py (1)
  • QueryRequest (73-233)
src/app/endpoints/query.py (1)
  • get_rag_toolgroups (951-978)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (5)
tests/unit/app/endpoints/test_conversations_v2.py (1)

103-103: LGTM! Test updated to match model type changes.

The assertion correctly uses str() to compare doc_url, accommodating the type change in ReferencedDocument where doc_url is now str | Optional[AnyUrl].

tests/unit/app/endpoints/test_query.py (1)

1014-1014: LGTM! Tests updated to match model type changes.

The test assertions correctly use str() for doc_url comparisons, aligning with the ReferencedDocument type change. The malformed URL test (lines 1041-1043) properly verifies that invalid URLs are stored as raw strings rather than being rejected.

Also applies to: 1041-1043, 1066-1066, 1093-1095

run.yaml (1)

155-160: LGTM! Granite embedding model configuration is consistent.

The granite-embedding model configuration correctly specifies the embedding dimension (384) which matches the vector_io configuration at line 125. The model identifier ibm-granite/granite-embedding-30m-english is properly formatted for the sentence-transformers provider.

src/app/endpoints/streaming_query.py (1)

174-246: LGTM! Streaming end event properly includes RAG data.

The stream_end_event function correctly includes both rag_chunks and vector_io_referenced_docs in the final streaming response. The implementation properly converts RAG chunks to dict format and combines metadata_map documents with vector_io documents.

src/app/endpoints/query.py (1)

856-867: LGTM! RAG context injection implementation.

The RAG context formatting and injection logic properly:

  • Limits to top 5 chunks
  • Formats each chunk with source attribution
  • Prepends relevant documentation to the user query
  • Logs the injection for debugging

This approach cleanly augments the user message with retrieved context before the LLM call.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
run.yaml (1)

29-35: Fix duplicate top‑level models and vector_dbs keys in YAML.

You now have two models: mappings (lines 29–35 and 140–150) and two vector_dbs: mappings (lines 134–139 and 155), which YAMLlint flags as duplicate keys. Most parsers will silently let the last one win, so the earlier definitions are ignored.

Consolidate these into single blocks, e.g.:

  • Move all model declarations into the bottom models: list and delete the earlier models: at lines 29–35 (or vice‑versa).
  • Keep only one vector_dbs: mapping (likely the new portal-rag block) and remove the trailing vector_dbs: [].

Until this is fixed, the effective config will be ambiguous and may not match your intent.

Also applies to: 134-150, 154-154

src/app/endpoints/streaming_query.py (3)

54-71: Resolve RAGChunk/ReferencedDocument import issues and redefinitions.

  • models.responses apparently does not export RAGChunk (Ruff reports “no name 'RAGChunk' in module 'models.responses'”), and you also import ReferencedDocument again from utils.endpoints. This leads to F811/F821 redefinition errors.
  • You only need a single canonical ReferencedDocument type (and likely only from models.responses).

Suggestion:

-from models.responses import (
-    ForbiddenResponse,
-    RAGChunk,
-    ReferencedDocument,
-    UnauthorizedResponse,
-    ...
-)
-from utils.endpoints import (
-    ReferencedDocument,
-    check_configuration_loaded,
-    ...
-)
+from models.responses import (
+    ForbiddenResponse,
+    UnauthorizedResponse,
+    RAGChunk,            # if actually defined there; otherwise import from its real module
+    ReferencedDocument,
+    ...
+)
+from utils.endpoints import (
+    check_configuration_loaded,
+    cleanup_after_streaming,
+    create_rag_chunks_dict,
+    get_agent,
+    get_system_prompt,
+    validate_model_provider_override,
+)

Adjust the RAGChunk import to match where it is really defined. This will clear the redefinition/undefined-name warnings.


902-1064: Broken context wiring and dead nested response_generator block.

Inside streaming_query_endpoint_handler_base:

  • You define a second async response_generator (lines 926–1047) that references many undefined names (interleaved_content_as_str, get_session, create_referenced_documents_with_metadata, CacheEntry) and a context variable that is never created.
  • The old ResponseGeneratorContext initialization is fully commented out (1049–1060), yet you still call create_response_generator_func(context) (line 1063), which will raise NameError at runtime.
  • The nested response_generator is never used; the actual generator comes from create_response_generator_func.

This block needs to be either completed or removed. Given you already have create_agent_response_generator, the simplest fix is to delete the unused nested generator and reinstate a valid ResponseGeneratorContext:

-        metadata_map: dict[str, dict[str, Any]] = {}
-
-        async def response_generator(...):
-            ...
-            cache_entry = CacheEntry(...)
-        # Create context object for response generator
-#         context = ResponseGeneratorContext(
-#             ...
-#         )
+        metadata_map: dict[str, dict[str, Any]] = {}
+
+        context = ResponseGeneratorContext(
+            conversation_id=conversation_id,
+            user_id=user_id,
+            skip_userid_check=_skip_userid_check,
+            model_id=model_id,
+            provider_id=provider_id,
+            llama_stack_model_id=llama_stack_model_id,
+            query_request=query_request,
+            started_at=started_at,
+            client=client,
+            metadata_map=metadata_map,
+        )

…and delete the entire nested async def response_generator(...) block if it is not intended to be used anymore.

As written, this function will not work and is the main source of the context/undefined-name errors from Ruff and mypy.


1357-1455: Normalize vector DB toolgroup handling and RAG chunk typing in retrieve_response.

In retrieve_response (streaming path):

  • toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups will fail if get_rag_toolgroups returns None (mypy error and potential TypeError).
  • params and urljoin suffer from the same typing issues as query_vector_io_for_chunks.
  • RAGChunk(content=chunk.content, source=source, ...) again passes non‑string content and potentially non‑string source.

You can mirror the patterns used in the earlier suggestion:

-        vector_dbs = await client.vector_dbs.list()
-        vector_db_ids = [vdb.identifier for vdb in vector_dbs]
-        mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers]
-
-        toolgroups = None
-        if vector_db_ids:
-            toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
-        elif mcp_toolgroups:
-            toolgroups = mcp_toolgroups
+        vector_dbs = await client.vector_dbs.list()
+        vector_db_ids = [vdb.identifier for vdb in vector_dbs]
+        mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers]
+
+        rag_toolgroups = get_rag_toolgroups(vector_db_ids) if vector_db_ids else None
+        toolgroups = (rag_toolgroups or []) + mcp_toolgroups if mcp_toolgroups else rag_toolgroups
@@
-            params = {"k": 5, "score_threshold": 0.0}
+            params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}
@@
-                            parent_id = chunk.metadata.get("parent_id")
-                            if parent_id:
-                                source = urljoin(
-                                    "https://mimir.corp.redhat.com", parent_id
-                                )
+                            parent_id = chunk.metadata.get("parent_id")
+                            if parent_id and isinstance(parent_id, str):
+                                source = urljoin(
+                                    "https://mimir.corp.redhat.com", parent_id
+                                )
@@
-                    rag_chunks.append(
-                        RAGChunk(
-                            content=chunk.content,
-                            source=source,
-                            score=score,
-                        )
-                    )
+                    rag_chunks.append(
+                        RAGChunk(
+                            content=str(chunk.content) if chunk.content else "",
+                            source=source if isinstance(source, str) else None,
+                            score=score,
+                        )
+                    )

This addresses the mypy errors and ensures RAGChunk instances have the expected types.

src/app/endpoints/query.py (1)

582-613: Fix return type and docstring of parse_metadata_from_text_item.

Two small issues:

  • The function is annotated as Optional[ReferencedDocument] but returns a list[ReferencedDocument] when text_item is not a TextContentItem:
docs: list[ReferencedDocument] = []
if not isinstance(text_item, TextContentItem):
    return docs  # wrong type
  • The docstring says it returns ReferencedDocument but doesn’t mention None.

Suggested fix:

-    docs: list[ReferencedDocument] = []
-    if not isinstance(text_item, TextContentItem):
-        return docs
+    if not isinstance(text_item, TextContentItem):
+        return None
@@
-            if url and title:
-                return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None)
+            if url and title:
+                return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None)

Optionally update the docstring “Returns” section to mention that None is returned when no valid metadata is found.

♻️ Duplicate comments (5)
src/app/endpoints/query.py (4)

78-82: Shared OFFLINE flag should be centralized (optional).

You now have an OFFLINE constant here and in streaming_query.py with identical semantics. Consider moving this to a shared config/constants module and reading from configuration/env rather than hard‑coding it in two places, to avoid divergence later.


874-885: Ensure TurnSummary is fully populated and referenced documents include RAG docs.

You now construct:

summary = TurnSummary(
    llm_response=...,
    tool_calls=[],
    rag_chunks=rag_chunks,
)
referenced_documents = parse_referenced_documents(response)
referenced_documents.extend(doc_ids_from_chunks)

Two points:

  • If TurnSummary also expects tool_results, confirm that its default is optional; otherwise add tool_results=[] explicitly (similar to other code paths in the repo).
  • Extending referenced_documents with doc_ids_from_chunks is correct, but ensure doc_ids_from_chunks elements are built with the same ReferencedDocument type (which they are) and that the “Mimir” base URL is eventually configurable rather than hard‑coded.

If tool_results is required, adjust:

 summary = TurnSummary(
     llm_response=...,
-    tool_calls=[],
-    rag_chunks=rag_chunks,
+    tool_calls=[],
+    tool_results=[],
+    rag_chunks=rag_chunks,
 )

Also applies to: 887-891


915-942: Fix “attatchment” typo in validation error messages.

The error messages in validate_attachments_metadata still contain the typo:

"Invalid attatchment type ..."
"Invalid attatchment content type ..."

This was previously flagged. Update to:

-            message = (
-                f"Invalid attatchment type {attachment.attachment_type}: "
+            message = (
+                f"Invalid attachment type {attachment.attachment_type}: "
@@
-            message = (
-                f"Invalid attatchment content type {attachment.content_type}: "
+            message = (
+                f"Invalid attachment content type {attachment.content_type}: "

This improves API error clarity and keeps tests/docs consistent.


766-826: Fix vector IO params typing, logging, and exception message formatting in retrieve_response.

Issues in this block:

  • params = {"k": 5, "score_threshold": 0.0} is inferred as dict[str, float], which doesn’t match the more general dict[str, bool | float | str | Iterable | object | None] expected by vector_io.query (mypy error).
  • Logging uses f-strings with potentially large query_response payloads at INFO level (may leak content and is noisy).
  • Exception logging uses f-strings inside logger methods (logger.info(f"...")) instead of %s formatting (style, not correctness).

Suggested cleanup:

-            params = {"k": 5, "score_threshold": 0.0}
-            logger.info(f"Initial params: {params}")
-            logger.info(f"query_request.solr: {query_request.solr}")
+            params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}
+            logger.info("Initial params: %s", params)
+            logger.info("query_request.solr: %s", query_request.solr)
@@
-                params["solr"] = query_request.solr
-                logger.info(f"Final params with solr filters: {params}")
+                params["solr"] = query_request.solr
+                logger.info("Final params with solr filters: %s", params)
@@
-            logger.info(f"Final params being sent to vector_io.query: {params}")
+            logger.info("Final params being sent to vector_io.query: %s", params)
@@
-            logger.info(f"The query response total payload: {query_response}")
+            logger.debug(
+                "Vector IO query returned %d chunks",
+                len(query_response.chunks) if query_response.chunks else 0,
+            )
@@
-    except Exception as e:
-        logger.warning(f"Failed to query vector database for chunks: {e}")
-        logger.debug(f"Vector DB query error details: {traceback.format_exc()}")
+    except Exception as e:
+        logger.warning("Failed to query vector database for chunks: %s", e)
+        logger.debug("Vector DB query error details: %s", traceback.format_exc())

This addresses the mypy error and improves logging hygiene.

src/app/endpoints/streaming_query.py (1)

1153-1262: Type and logging issues in query_vector_io_for_chunks.

Several problems here affect type‑checking and logging:

  • params = {"k": 5, "score_threshold": 0.0} is inferred as dict[str, float], which doesn’t match the vector_io params type.
  • parent_id = chunk.metadata.get("parent_id") is typed as object; calling urljoin(..., parent_id) without a guard causes mypy/pyright errors and could pass non‑strings at runtime.
  • RAGChunk(content=chunk.content, source=source, ...) passes an InterleavedContent (or similar) object into a str field, and source may not be a str.
  • logger.info("The query response total payload: %s", query_response) logs the entire payload at INFO, which is noisy and may leak content (similar to past comments).

Suggested fixes:

-            params = {"k": 5, "score_threshold": 0.0}
+            params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}
@@
-            logger.info("The query response total payload: %s", query_response)
+            logger.debug(
+                "Vector IO query returned %d chunks",
+                len(query_response.chunks) if query_response.chunks else 0,
+            )
@@
-                        if OFFLINE:
-                            parent_id = chunk.metadata.get("parent_id")
-                            if parent_id:
-                                source = urljoin(
-                                    "https://mimir.corp.redhat.com", parent_id
-                                )
+                        if OFFLINE:
+                            parent_id = chunk.metadata.get("parent_id")
+                            if parent_id and isinstance(parent_id, str):
+                                source = urljoin(
+                                    "https://mimir.corp.redhat.com", parent_id
+                                )
@@
-                    final_rag_chunks.append(
-                        RAGChunk(
-                            content=chunk.content,
-                            source=source,
-                            score=score,
-                        )
-                    )
+                    final_rag_chunks.append(
+                        RAGChunk(
+                            content=str(chunk.content) if chunk.content else "",
+                            source=source if isinstance(source, str) else None,
+                            score=score,
+                        )
+                    )

Also consider moving the hard‑coded Mimir base URL into configuration (as prior reviews suggested).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ca24a7 and a534ece.

📒 Files selected for processing (7)
  • run.yaml (2 hunks)
  • src/app/endpoints/query.py (8 hunks)
  • src/app/endpoints/streaming_query.py (10 hunks)
  • src/models/requests.py (2 hunks)
  • src/models/responses.py (1 hunks)
  • tests/unit/app/endpoints/test_conversations_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/models/requests.py
  • src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_query.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_query.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).

Applied to files:

  • src/app/endpoints/streaming_query.py
🪛 GitHub Actions: Black
src/app/endpoints/streaming_query.py

[error] 1-1: Black formatting check failed. 1 file would be reformatted.

🪛 GitHub Actions: Integration tests
src/app/endpoints/query.py

[error] 1-1: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' while importing module 'app.endpoints.query'.

🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py

[warning] 54-54: Reimport 'ForbiddenResponse' (imported line 54) (reimported)


[warning] 54-54: Reimport 'UnauthorizedResponse' (imported line 54) (reimported)


[error] 54-54: No name 'RAGChunk' in module 'models.responses' (no-name-in-module)


[warning] 70-70: Reimport 'ReferencedDocument' (imported line 54) (reimported)


[error] 159-159: Too many arguments (7/5) (too-many-arguments)


[error] 159-159: Too many positional arguments (7/5) (too-many-positional-arguments)


[warning] 164-164: Unused argument 'referenced_documents' (unused-argument)


[error] 1016-1016: Undefined variable 'get_session' (undefined-variable)


[error] 1029-1029: Undefined variable 'create_referenced_documents_with_metadata' (undefined-variable)


[error] 1037-1037: Undefined variable 'CacheEntry' (undefined-variable)


[warning] 1015-1015: Unused variable 'topic_summary' (unused-variable)


[warning] 1037-1037: Unused variable 'cache_entry' (unused-variable)


[error] 1063-1063: Undefined variable 'context' (undefined-variable)


[warning] 1435-1435: Catching too general exception Exception (broad-exception-caught)


[error] 1265-1265: Too many statements (74/50) (too-many-statements)

🪛 GitHub Actions: Ruff
src/app/endpoints/streaming_query.py

[error] 55-55: F811: Redefinition of unused 'ForbiddenResponse'.


[error] 67-67: F811: Redefinition of unused 'UnauthorizedResponse'.


[error] 71-71: F821: Undefined or redefined name 'ReferencedDocument'.


[error] 960-960: F821: Undefined name 'interleaved_content_as_str'.


[error] 1016-1016: F821: Undefined name 'get_session'.


[error] 1023-1023: F841: Local variable 'topic_summary' assigned to but never used.


[error] 1029-1029: F821: Undefined name 'create_referenced_documents_with_metadata'.


[error] 1037-1037: F841: Local variable 'cache_entry' assigned to but never used.


[error] 1037-1037: F821: Undefined name 'CacheEntry'.


[error] 1063-1063: F811: Redefinition of unused 'response_generator'.


[error] 1063-1063: F821: Undefined name 'context'.

src/app/endpoints/query.py

[error] 22-22: F401: Remove unused import: 'Document' from llama_stack_client.types.agents.turn_create_params.

🪛 GitHub Actions: Type checks
tests/unit/app/endpoints/test_conversations_v2.py

[error] Process completed with exit code 1.

tests/unit/app/endpoints/test_query.py

[error] Process completed with exit code 1.

src/app/endpoints/streaming_query.py

[error] 809-809: mypy: Argument 4 to 'stream_end_event' has incompatible type 'Any | list[Any]'; expected 'dict[str, int]'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 960-960: mypy: Name 'interleaved_content_as_str' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1016-1016: mypy: Name 'get_session' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1029-1029: mypy: Name 'create_referenced_documents_with_metadata' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1037-1037: mypy: Name 'CacheEntry' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1063-1063: mypy: Name 'context' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1190-1190: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1238-1238: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1400-1400: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1400-1400: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 1414-1414: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] Process completed with exit code 1.

src/app/endpoints/query.py

[error] 748-748: mypy: Unsupported left operand type for + ('None'). (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 788-788: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] 835-835: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')


[error] Process completed with exit code 1.

run.yaml

[error] Process completed with exit code 1.

🪛 GitHub Actions: Unit tests
tests/unit/app/endpoints/test_query.py

[error] 22-22: ImportError while importing test module due to missing dependency ('llama_stack_client.types.agents').

src/app/endpoints/query.py

[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents'. This import error occurs while importing Turn from llama_stack_client.types.agents in app.endpoints.query, causing test collection to fail during the 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing' step.

🪛 YAMLlint (1.37.1)
run.yaml

[error] 140-140: duplication of key "models" in mapping

(key-duplicates)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: server mode / azure
🔇 Additional comments (4)
tests/unit/app/endpoints/test_conversations_v2.py (1)

93-97: Updated doc_url assertion matches widened type.

Using str(ref_docs[0]["doc_url"]) == "http://example.com" is consistent with ReferencedDocument.doc_url now being either a plain string or AnyUrl; the test is correct and resilient.

Also applies to: 111-116

tests/unit/app/endpoints/test_query.py (1)

1009-1023: Doc URL expectations correctly adapted to string/URL union.

The updated assertions (str(doc.doc_url) == ...) and the “malformed URL” test expecting a non‑None ReferencedDocument with raw doc_url align with the new ReferencedDocument.doc_url semantics. The test logic itself looks sound; remaining test failures stem from import errors in app.endpoints.query.

Also applies to: 1041-1051, 1070-1075, 1097-1103

src/app/endpoints/query.py (2)

299-310: query_endpoint_handler_base now depends on extended retrieve_response contract — ensure all call sites match.

You’ve updated the base handler to expect retrieve_response_func to return (summary, conversation_id, referenced_documents, token_usage) and to pass provider_id as a keyword argument. That’s correct for the new retrieve_response in this module, but any other callers (e.g., alternative backends) must maintain this contract.

Double‑check any other query_endpoint_handler_base(..., retrieve_response_func=...) usages to ensure they:

  • Accept provider_id as kwarg in their retrieve_response implementation.
  • Return the 4‑tuple so token_usage and referenced_documents are well‑defined.

Within this file, the wiring looks coherent.

Also applies to: 333-350


7-25: The import paths in the current code are correct for llama-stack-client==0.3.0. According to official documentation, the imports should use llama_stack_client.types.agents (not the alpha namespace):

from llama_stack_client.types.agents import Turn, TurnCreateResponse

The current code's import structure aligns with this. The ModuleNotFoundError cited in the review comment is not due to incorrect import paths—reverting to types.alpha.agents would introduce the actual error. If CI is failing, the issue lies elsewhere (e.g., dependency installation, version mismatch, or a different module).

Regarding the unused Document import: if this is truly unused, it should be removed per code cleanliness standards, but this is a minor concern unrelated to the import path issue.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (8)
src/app/endpoints/query.py (7)

817-818: Remove redundant local import.

RAGChunk and ReferencedDocument are imported locally inside the function, which is inefficient. Import them at module level instead. This was previously flagged.


897-908: Include tool_results in TurnSummary construction.

The TurnSummary construction is missing the required tool_results parameter. This was previously flagged and will cause a TypeError at runtime.

🔎 Previously suggested fix
     summary = TurnSummary(
         llm_response=(
             content_to_str(response.output_message.content)
             if (
                 getattr(response, "output_message", None) is not None
                 and getattr(response.output_message, "content", None) is not None
             )
             else ""
         ),
         tool_calls=[],
+        tool_results=[],
         rag_chunks=rag_chunks,
     )

948-949: Fix typo: "attatchment" should be "attachment".

There's a typo in the error message at line 948 (and also line 958). This was previously flagged but remains unresolved.


764-777: Fix type error: get_rag_toolgroups can return None.

At line 771, get_rag_toolgroups(vector_db_ids) can return None, causing a TypeError when concatenating with mcp_toolgroups. This issue was previously flagged but remains unresolved.

🔎 Previously suggested fix
-        toolgroups = None
-        if vector_db_ids:
-            toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups
-        elif mcp_toolgroups:
-            toolgroups = mcp_toolgroups
+        toolgroups = None
+        if vector_db_ids:
+            rag_toolgroups = get_rag_toolgroups(vector_db_ids) or []
+            toolgroups = rag_toolgroups + mcp_toolgroups
+        elif mcp_toolgroups:
+            toolgroups = mcp_toolgroups

814-814: Avoid logging entire query response payload at INFO level.

Logging the full query_response may expose sensitive document content. This was previously flagged but remains unresolved.


850-871: Fix type errors in RAGChunk construction.

Pipeline failures indicate:

  1. Line 858: parent_id is typed as object, causing a type error with urljoin which expects str.
  2. chunk.content may not be a primitive string type expected by RAGChunk.content.
🔎 Previously suggested fix
             if OFFLINE:
                 parent_id = chunk.metadata.get("parent_id")
-                if parent_id:
-                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
+                if parent_id and isinstance(parent_id, str):
+                    source = urljoin("https://mimir.corp.redhat.com", parent_id)
@@
         rag_chunks.append(
             RAGChunk(
-                content=chunk.content,
-                source=source,
+                content=str(chunk.content) if chunk.content else "",
+                source=source if isinstance(source, str) else None,
                 score=score,
             )
         )

799-812: Fix type incompatibility for params dictionary.

Pipeline reports that params at line 811 has type dict[str, float] but vector_io.query expects dict[str, bool | float | str | Iterable[object] | object | None] | Omit. The issue arises because params["solr"] is assigned a complex object (line 804).

🔎 Proposed fix

Explicitly type params to satisfy the broader type requirement:

-            params = {"k": 5, "score_threshold": 0.0}
+            params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}
run.yaml (1)

101-113: Hardcoded Solr URL should use environment variable.

The solr_url: "http://localhost:8983/solr" at line 104 remains hardcoded and won't work in containerized or distributed deployments. This was previously flagged but has not been addressed.

🧹 Nitpick comments (1)
src/app/endpoints/query.py (1)

79-82: Move OFFLINE flag to configuration.

The module-level OFFLINE = True constant should be sourced from configuration or environment variables rather than hardcoded. This will allow runtime control without code changes.

🔎 Suggested approach

Add to your configuration class or read from environment:

# In configuration module
OFFLINE_MODE = os.getenv("LIGHTSPEED_OFFLINE_MODE", "true").lower() == "true"

Then import and use in this module:

from configuration import configuration

# Later in code
if configuration.offline_mode:
    # ...
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a534ece and 35182dc.

📒 Files selected for processing (5)
  • run.yaml
  • src/app/endpoints/query.py
  • src/models/requests.py
  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_query.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/models/requests.py
  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/app/endpoints/test_query.py
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/endpoints/query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-12-18T10:21:03.056Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:03.056Z
Learning: In run.yaml for Llama Stack 0.3.x, do not add a telemetry provider block under providers. To enable telemetry, set telemetry.enabled: true directly (no provider block required). This pattern applies specifically to run.yaml configuration in this repository.

Applied to files:

  • run.yaml
🧬 Code graph analysis (1)
src/app/endpoints/query.py (2)
src/models/responses.py (1)
  • ReferencedDocument (328-340)
src/utils/types.py (1)
  • RAGChunk (128-133)
🪛 GitHub Actions: Integration tests
src/app/endpoints/query.py

[error] 20-20: ImportError: No module named 'llama_stack_client.types.agents'.

🪛 GitHub Actions: Ruff
src/app/endpoints/query.py

[error] 22-22: F401: Unused import Document from llama_stack_client.types.agents.turn_create_params. Remove unused import.

🪛 GitHub Actions: Type checks
src/app/endpoints/query.py

[error] 771-771: mypy: Unsupported left operand type for + ("None").


[error] 811-811: mypy: Argument "params" to "query" of "AsyncVectorIoResource" has incompatible type "dict[str, float]"; expected "dict[str, bool | float | str | Iterable[object] | object | None] | Omit".


[error] 811-811: mypy: Note: "dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance.


[error] 858-858: mypy: Value of type variable "AnyStr" of "urljoin" cannot be "object".

🪛 GitHub Actions: Unit tests
src/app/endpoints/query.py

[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' during pytest collection. Command: 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / vertexai
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: server mode / azure
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (1)
src/app/endpoints/query.py (1)

875-887: RAG context injection implementation looks good.

The logic correctly formats the top 5 RAG chunks and injects them into the user message. The approach of appending context to the query is appropriate for providing relevant documentation to the LLM.

@Anxhela21 Anxhela21 force-pushed the anx/streaming_query branch from 35182dc to e98ff61 Compare February 3, 2026 23:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query_v2.py (1)

636-644: ⚠️ Potential issue | 🔴 Critical

Missing third return element when moderation blocks the request.

When moderation_result.blocked is True, the function returns a 2-element tuple but the function signature expects a 3-element tuple including doc_ids_from_chunks. This will cause a runtime error when unpacking.

🐛 Proposed fix
     if moderation_result.blocked:
         violation_message = moderation_result.message or ""
         await append_turn_to_conversation(
             client, llama_stack_conv_id, input_text, violation_message
         )
         return (
             create_violation_stream(violation_message, moderation_result.shield_model),
             normalize_conversation_id(conversation_id),
+            [],  # Empty doc_ids_from_chunks for blocked requests
         )
🤖 Fix all issues with AI agents
In `@run.yaml`:
- Around line 155-157: The provider_model_id is hardcoded to a local absolute
path which will break on other machines; update the run.yaml to read the model
path from an environment variable (e.g., replace the literal "/Users/acoba/..."
with a variable reference like ${MODEL_PATH} or similar) and document a fallback
or error if the env var is missing, ensuring the provider_model_id entry uses
that env var reference instead of the hardcoded string so other developers can
set MODEL_PATH in their environment.
- Around line 164-168: The embedding_model entry under the vector_stores block
(vector_store_id "portal-rag", provider_id "solr-vector") contains a hardcoded
local filesystem path and an incorrect double-slash format; replace it with a
portable value (e.g., a model identifier like
"sentence-transformers/<model-name>" or a reference to an environment/config
variable) and remove the extra slash so the string follows the provider's
expected format; make the value configurable (env var or shared config key)
rather than an absolute path so it works across machines and CI.

In `@src/app/endpoints/query_v2.py`:
- Around line 437-440: Hardcoded offline=True and the hardcoded domain URL
(https://mimir.corp.redhat.com) should be removed from
src/app/endpoints/query_v2.py and made configurable, and the RAG extraction
logic should be factored into a shared helper used by both query_v2.py and
streaming_query_v2.py; create a config entry (env or settings object) for the
offline flag and domain, implement a shared function (e.g.,
determine_chunk_source or extract_rag_source) that encapsulates "when offline
use parent_id, else use reference_url" logic, and replace the inline offline
variable in query_v2.py (and streaming_query_v2.py) to call that helper and read
from the config.
- Around line 1101-1106: The call to get_rag_tools in
prepare_tools_for_responses_api is missing the solr params, so Solr filters
aren't applied; update the call site where vector_store_ids is truthy to pass
query_request.solr as the second argument (e.g., get_rag_tools(vector_store_ids,
query_request.solr)) so that get_rag_tools receives the solr_params and can
apply Solr filtering when building rag_tools.

In `@src/app/endpoints/streaming_query_v2.py`:
- Around line 529-540: The code hardcodes "https://mimir.corp.redhat.com" when
building doc_url; replace that literal with a configurable base URL (e.g.,
BASE_DOC_URL or get_setting("BASE_DOC_URL")) and use it wherever doc_url is
constructed in this block (variables: offline, reference_doc, reference_url,
doc_id, doc_url). Update the logic so that when reference_doc does not start
with "http" you prepend the configurable base URL, ensure the config has
sensible defaults and is loaded once (module constant or config accessor) so all
environments (dev/staging/prod) can override it via config/env.
- Around line 443-446: The hardcoded flag offline = True in
streaming_query_v2.py should be replaced with a configurable source; update the
function to read offline from a higher-level configuration (e.g., a settings
object, app config, environment variable, or a request parameter) instead of the
literal, and use that value wherever offline is referenced (the variable named
offline controls chunk source selection and URL construction). Locate the
assignment of offline and replace it with a lookup like settings.OFFLINE or
request.args.get('offline') (or inject it as a function argument),
validate/normalize the value to a boolean, and ensure fallback default behavior
is preserved when the config is missing. Ensure all downstream uses of offline
in the function continue to reference this configurable variable (no other
hardcoded occurrences remain).
🧹 Nitpick comments (4)
src/app/main.py (1)

62-66: Remove commented-out debug code before merging.

This commented-out block for deregistering a vector store appears to be leftover debugging code. Commented-out code reduces readability and should be removed. If this functionality is needed in the future, it can be retrieved from version control.

🧹 Proposed fix to remove dead code
-    # try:
-    #     await client.vector_stores.delete(vector_store_id="portal-rag")
-    #     logger.info("Successfully deregistered vector store: portal-rag")
-    # except Exception as e:
-    #     logger.warning("Failed to deregister vector store 'portal-rag': %s", e)
-
lightspeed-stack.yaml (1)

13-18: Remove commented-out configuration or add clarifying comments.

The commented-out use_as_library_client: false and URL/API key lines add clutter. If these are meant as documentation for alternative configurations, consider adding explicit comments explaining when to use each mode. Otherwise, remove them.

src/app/endpoints/streaming_query_v2.py (1)

437-599: Significant code duplication with query_v2.py.

The RAG chunk extraction, document ID parsing, and URL construction logic (lines 437-599) is nearly identical to the implementation in src/app/endpoints/query_v2.py. Consider extracting this into a shared utility function in src/utils/ to maintain DRY principles and ensure consistent behavior.

src/app/endpoints/query_v2.py (1)

1100-1113: Commented-out code should be removed.

The commented-out Solr filter logging code adds noise without providing value. If this functionality is needed, uncomment and use it; otherwise, remove it.

🧹 Proposed fix
         if vector_store_ids:
-            # logger.info("query_request.solr: %s", query_request.solr)
             rag_tools = get_rag_tools(vector_store_ids)
             if rag_tools:
                 logger.info("rag_tool are: %s", rag_tools)
                 toolgroups.extend(rag_tools)
-                # if query_request.solr:
-                #     logger.info(
-                #         "RAG tools configured with Solr filters: %s", query_request.solr
-                #     )
-                # else:
-                #     logger.info("RAG tools configured without Solr filters")
             else:
                 logger.info("No RAG tools configured")

@Anxhela21 Anxhela21 changed the title [LCORE-873] Support Solr Vector I/O Provider, Add Streaming Query Support, and Introduce Solr Keyword Filtering [LCORE-873] Support Solr Vector I/O Provider in LCORE Feb 4, 2026
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
@Anxhela21 Anxhela21 force-pushed the anx/streaming_query branch from e98ff61 to 19bc8ca Compare February 4, 2026 16:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/app/endpoints/streaming_query_v2.py (1)

641-644: ⚠️ Potential issue | 🔴 Critical

Fix return value mismatch in shield violation path.

The function signature returns tuple[..., str, list[ReferencedDocument]] but the violation path at line 641 returns only a 2-tuple. This causes a type error and will fail at runtime when callers unpack 3 values.

🐛 Proposed fix
         return (
             create_violation_stream(violation_message, moderation_result.shield_model),
             normalize_conversation_id(conversation_id),
+            [],  # Empty list of ReferencedDocument for violation case
         )
src/app/endpoints/query_v2.py (2)

12-24: ⚠️ Potential issue | 🟡 Minor

Group llama_stack_client imports to satisfy lint.

Pylint flags the split import; merging them avoids the CI error.

Proposed fix
-from llama_stack_client import APIConnectionError, APIStatusError
+from llama_stack_client import (
+    APIConnectionError,
+    APIStatusError,
+    AsyncLlamaStackClient,
+)
@@
-from llama_stack_client import AsyncLlamaStackClient

1009-1017: ⚠️ Potential issue | 🟡 Minor

Resolve pylint too-many-arguments on prepare_tools_for_responses_api.

CI is failing after the extra parameter. Either bundle options or suppress for this function.

Proposed minimal fix
-async def prepare_tools_for_responses_api(
+async def prepare_tools_for_responses_api(  # pylint: disable=too-many-arguments
🤖 Fix all issues with AI agents
In `@run.yaml`:
- Around line 166-181: There are two identical vector_stores blocks (both
defining vector_store_id: portal-rag, provider_id: solr-vector, embedding_model:
sentence-transformers/${env.EMBEDDING_MODEL_DIR}, embedding_dimension: 384);
remove the duplicate so only one vector_stores definition remains (delete the
redundant block outside/inside the registered_resources section as appropriate),
ensure YAML indentation and surrounding keys (registered_resources, tool_groups,
etc.) remain valid after removal and that vector_store_id: portal-rag stays in
the intended scope.
- Around line 162-165: The YAML sets shield_id: llama-guard but incorrectly uses
provider_shield_id: openai/gpt-4o-mini; change provider_shield_id to reference
the correct llama-guard shield identifier (e.g., provider_shield_id:
llama-guard) so the `shields` entry for the `llama-guard` provider uses a valid
llama-guard shield ID; update the `provider_shield_id` value in the same
`shields` mapping to match the `llama-guard` shield name.

In `@src/app/endpoints/query_v2.py`:
- Around line 482-534: The loop assumes query_response.chunks always exists and
concatenates doc_id in offline mode even when doc_id is None; fix by
early-guarding and iterating over retrieved_chunks (use the previously computed
retrieved_chunks or empty list) instead of accessing query_response.chunks
directly, and when building the offline doc_url only concatenate the base host
with a non-None doc_id (fall back to reference_url or skip/continue when neither
doc_id nor reference_url is present). Update references in the loop (chunk, md,
cm, doc_id, reference_url, offline) so you use the safe retrieved_chunks
variable and add a check before offline concatenation to avoid doc_id being
None.
- Around line 562-580: The code may pass non-string metadata and content into
urljoin and RAGChunk; ensure parent_id and reference_url are converted to
strings and chunk.content is a string before creating RAGChunk: when building
source, call urljoin("https://mimir.corp.redhat.com", str(parent_id)) and set
source = str(chunk.metadata.get("reference_url")) if reference_url exists (or
None if missing); when constructing the RAGChunk ensure content is a string
(e.g., use chunk.content.text or str(chunk.content)) and keep the existing score
logic using retrieved_scores.

In `@src/app/endpoints/streaming_query_v2.py`:
- Around line 567-587: The RAGChunk constructor is getting wrong types: ensure
source is a str or None and content is a str before calling RAGChunk in the loop
over retrieved_chunks. For the offline branch, coerce parent_id to str (e.g.,
parent_id = str(parent_id)) before passing to urljoin; for the non-offline
branch ensure reference_url is cast to str; for content, replace chunk.content
with a string representation (e.g., use chunk.content.text if that attribute
exists, otherwise str(chunk.content)) so RAGChunk(content=...) receives a str;
update any uses of source and score to match the expected types in RAGChunk.
- Around line 525-549: The code building document URLs uses
doc_id/reference_url/reference_doc without guaranteeing they are strings,
causing TypeErrors; update the block that constructs doc_url (the variables
doc_id, reference_url, reference_doc, doc_url, metadata_doc_ids, and the
ReferencedDocument creation) to first ensure reference_doc is a non-None string
(e.g., skip if None or cast to str only after a non-None check), only call
startswith on a confirmed string, and when concatenating with
"https://mimir.corp.redhat.com" ensure the left operand is a string; in short,
add explicit None guards and string coercion before using startswith or '+' so
ReferencedDocument is only created with valid string doc_url/doc_title.

In `@tests/unit/app/endpoints/test_query_v2.py`:
- Around line 292-305: The new mock block has trailing whitespace and an overly
long line causing lint/black failures; edit the block that defines
mock_query_response and assigns mock_client.vector_io.query (and the
mocker.patch calls for get_system_prompt and run_shield_moderation) to remove
any trailing spaces and break or shorten the long line (e.g., assign the
AsyncMock to mock_client.vector_io.query on its own line or use a shorter
return_value variable) so the lines conform to Black's line length and no
trailing whitespace remains; keep the same mock objects: mock_query_response,
mock_client.vector_io.query, get_system_prompt, and run_shield_moderation.

In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 437-439: Remove the unused mock TurnSummary instances that linters
flag: delete the mock_summary variable assignments (instances of TurnSummary) in
the test module where they are not passed to stream_end_event; specifically
remove the lines creating mock_summary (TurnSummary(llm_response="",
tool_calls=[], tool_results=[], rag_chunks=[])) and any other identical unused
mock_summary declarations (also at the other occurrences noted), leaving the
rest of the test setup and calls to stream_end_event intact.
🧹 Nitpick comments (3)
src/models/requests.py (1)

169-175: Update class docstring to include the new solr field.

The QueryRequest class docstring (lines 72-91) lists all attributes but doesn't include the newly added solr field. As per coding guidelines, all class attributes should be documented.

📝 Suggested docstring update

Add to the Attributes section around line 86:

         vector_store_ids: The optional list of specific vector store IDs to query for RAG.
+        solr: Optional Solr-specific query parameters including filter queries.
 
     Example:
src/app/endpoints/query.py (1)

585-614: Remove or uncomment the dead code block.

This large commented-out function get_rag_toolgroups adds noise. If it's needed for future reference, consider moving it to a separate branch or issue tracker instead of keeping it in the codebase.

src/app/endpoints/streaming_query_v2.py (1)

388-388: Consider refactoring retrieve_response to reduce complexity.

The function has too many branches (31/12) and statements (108/50) per linter. Extract the vector querying logic (lines 448-587) into a separate helper function.

♻️ Suggested extraction
async def _query_vector_store_for_rag_context(
    client: AsyncLlamaStackClient,
    query_request: QueryRequest,
    offline: bool,
) -> tuple[list[RAGChunk], list[ReferencedDocument]]:
    """Query vector stores and extract RAG chunks and referenced documents."""
    # Move lines 448-587 here
    ...

Then call from retrieve_response:

rag_chunks, doc_ids_from_chunks = await _query_vector_store_for_rag_context(
    client, query_request, offline
)

Comment on lines 162 to 165
shields:
- shield_id: llama-guard
provider_id: llama-guard
provider_shield_id: openai/gpt-4o-mini
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "provider_shield_id" -n

Repository: lightspeed-core/lightspeed-stack

Length of output: 1306


🏁 Script executed:

sed -n '1,60p' dev-tools/test-configs/llama-stack-mcp-test.yaml

Repository: lightspeed-core/lightspeed-stack

Length of output: 1308


Invalid provider_shield_id for llama-guard

Lines 162-165: provider_shield_id is set to openai/gpt-4o-mini under the llama-guard provider; it must reference a llama-guard shield ID (e.g. llama-guard).

🤖 Prompt for AI Agents
In `@run.yaml` around lines 162 - 165, The YAML sets shield_id: llama-guard but
incorrectly uses provider_shield_id: openai/gpt-4o-mini; change
provider_shield_id to reference the correct llama-guard shield identifier (e.g.,
provider_shield_id: llama-guard) so the `shields` entry for the `llama-guard`
provider uses a valid llama-guard shield ID; update the `provider_shield_id`
value in the same `shields` mapping to match the `llama-guard` shield name.

Comment on lines +166 to +181
vector_stores:
- vector_store_id: portal-rag
provider_id: solr-vector
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
embedding_dimension: 384
datasets: []
scoring_fns: []
benchmarks: []
tool_groups:
- toolgroup_id: builtin::rag # Register the RAG tool
provider_id: rag-runtime
vector_stores:
default_provider_id: faiss
default_embedding_model: # Define the default embedding model for RAG
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
vector_store_id: portal-rag
provider_id: solr-vector
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
embedding_dimension: 384
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate vector_stores definition.

vector_stores is defined both under registered_resources (lines 166-170) and at the top level (lines 177-181) with identical content. This duplication could cause confusion and maintenance issues.

🔧 Remove duplicate block
   tool_groups:
   - toolgroup_id: builtin::rag # Register the RAG tool
     provider_id: rag-runtime
-vector_stores:
-  vector_store_id: portal-rag
-  provider_id: solr-vector
-  embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
-  embedding_dimension: 384
 safety:
   default_shield_id: llama-guard
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
vector_stores:
- vector_store_id: portal-rag
provider_id: solr-vector
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
embedding_dimension: 384
datasets: []
scoring_fns: []
benchmarks: []
tool_groups:
- toolgroup_id: builtin::rag # Register the RAG tool
provider_id: rag-runtime
vector_stores:
default_provider_id: faiss
default_embedding_model: # Define the default embedding model for RAG
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
vector_store_id: portal-rag
provider_id: solr-vector
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
embedding_dimension: 384
vector_stores:
- vector_store_id: portal-rag
provider_id: solr-vector
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
embedding_dimension: 384
datasets: []
scoring_fns: []
benchmarks: []
tool_groups:
- toolgroup_id: builtin::rag # Register the RAG tool
provider_id: rag-runtime
🤖 Prompt for AI Agents
In `@run.yaml` around lines 166 - 181, There are two identical vector_stores
blocks (both defining vector_store_id: portal-rag, provider_id: solr-vector,
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR},
embedding_dimension: 384); remove the duplicate so only one vector_stores
definition remains (delete the redundant block outside/inside the
registered_resources section as appropriate), ensure YAML indentation and
surrounding keys (registered_resources, tool_groups, etc.) remain valid after
removal and that vector_store_id: portal-rag stays in the intended scope.

Comment on lines 482 to 534
if query_response.chunks:
retrieved_chunks = query_response.chunks
retrieved_scores = (
query_response.scores if hasattr(query_response, "scores") else []
)

# Extract doc_ids from chunks for referenced_documents
metadata_doc_ids = set()

for chunk in query_response.chunks:
logger.info("Extract doc ids from chunk: %s", chunk)

# 1) dict metadata (what your code expects today)
md = getattr(chunk, "metadata", None) or {}
doc_id = md.get("doc_id") or md.get("document_id")
title = md.get("title")

# 2) typed chunk_metadata (what your provider/logs are actually populating)
if not doc_id:
cm = getattr(chunk, "chunk_metadata", None)
if cm is not None:
# cm might be a pydantic model or a dict depending on caller
if isinstance(cm, dict):
doc_id = cm.get("doc_id") or cm.get("document_id")
title = title or cm.get("title")
reference_url = cm.get("reference_url")
else:
doc_id = getattr(cm, "doc_id", None) or getattr(
cm, "document_id", None
)
title = title or getattr(cm, "title", None)
reference_url = getattr(cm, "reference_url", None)
else:
reference_url = None
else:
reference_url = md.get("reference_url")

if not doc_id and not reference_url:
continue

# Build URL based on offline flag
if offline:
# Use parent/doc path
reference_doc = doc_id
doc_url = "https://mimir.corp.redhat.com" + reference_doc
else:
# Use reference_url if online
reference_doc = reference_url or doc_id
doc_url = (
reference_doc
if reference_doc.startswith("http")
else ("https://mimir.corp.redhat.com" + reference_doc)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard chunk parsing when chunks is missing and avoid offline doc_id None crashes.

The loop currently runs even if query_response.chunks is None, and offline mode concatenates doc_id even when only reference_url is available. Both can raise at runtime and match the type‑checker errors.

Proposed fix
-            if query_response.chunks:
-                retrieved_chunks = query_response.chunks
-                retrieved_scores = (
-                    query_response.scores if hasattr(query_response, "scores") else []
-                )
-
-                # Extract doc_ids from chunks for referenced_documents
-            metadata_doc_ids = set()
-
-            for chunk in query_response.chunks:
+            if query_response.chunks:
+                retrieved_chunks = query_response.chunks
+                retrieved_scores = (
+                    query_response.scores if hasattr(query_response, "scores") else []
+                )
+
+                # Extract doc_ids from chunks for referenced_documents
+                metadata_doc_ids: set[str] = set()
+
+                for chunk in query_response.chunks:
@@
-                if offline:
+                if offline:
+                    if not doc_id:
+                        continue
                     # Use parent/doc path
                     reference_doc = doc_id
                     doc_url = "https://mimir.corp.redhat.com" + reference_doc
🧰 Tools
🪛 GitHub Actions: Pyright

[error] 526-526: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None"


[error] 532-532: "startswith" is not a known attribute of "None" (reportOptionalMemberAccess)


[error] 533-533: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None"

🪛 GitHub Actions: Type checks

[error] 526-526: Unsupported operand types for + ("str" and "None") [operator]


[error] 532-532: Item "None" of "Any | None" has no attribute "startswith" [union-attr]


[error] 533-533: Unsupported operand types for + ("str" and "None") [operator]

🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 482 - 534, The loop assumes
query_response.chunks always exists and concatenates doc_id in offline mode even
when doc_id is None; fix by early-guarding and iterating over retrieved_chunks
(use the previously computed retrieved_chunks or empty list) instead of
accessing query_response.chunks directly, and when building the offline doc_url
only concatenate the base host with a non-None doc_id (fall back to
reference_url or skip/continue when neither doc_id nor reference_url is
present). Update references in the loop (chunk, md, cm, doc_id, reference_url,
offline) so you use the safe retrieved_chunks variable and add a check before
offline concatenation to avoid doc_id being None.

Comment on lines 562 to 580
# Extract source from chunk metadata based on offline flag
source = None
if chunk.metadata:
if offline:
parent_id = chunk.metadata.get("parent_id")
if parent_id:
source = urljoin("https://mimir.corp.redhat.com", parent_id)
else:
source = chunk.metadata.get("reference_url")

# Get score from retrieved_scores list if available
score = retrieved_scores[i] if i < len(retrieved_scores) else None

rag_chunks.append(
RAGChunk(
content=chunk.content,
source=source,
score=score,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Locate RAGChunk definition
rg --files-with-matches "class RAGChunk" -n .

Repository: lightspeed-core/lightspeed-stack

Length of output: 96


Convert metadata and content fields to str

  • Pass str(parent_id) to urljoin
  • Wrap reference_url in str(...) (or use None if missing)
  • Ensure chunk.content is a str (e.g. extract .text or str(...)) before creating RAGChunk
Diff
- source = urljoin("https://mimir.corp.redhat.com", parent_id)
+ source = urljoin("https://mimir.corp.redhat.com", str(parent_id))
- source = chunk.metadata.get("reference_url")
+ source = str(chunk.metadata.get("reference_url")) if chunk.metadata.get("reference_url") else None
- rag_chunks.append(RAGChunk(content=chunk.content, source=source, score=score))
+ content = chunk.content
+ if not isinstance(content, str):
+     content = getattr(content, "text", str(content))
+ rag_chunks.append(RAGChunk(content=content, source=source, score=score))
🧰 Tools
🪛 GitHub Actions: Pyright

[error] 568-568: Argument of type "object" cannot be assigned to parameter "url" of type "AnyStr@urljoin | None" in function "urljoin"


[error] 577-577: Argument of type "ChunkContent" cannot be assigned to parameter "content" of type "str" in function "init"


[error] 578-578: Argument of type "str | object | None" cannot be assigned to parameter "source" of type "str | None" in function "init"

🪛 GitHub Actions: Type checks

[error] 568-568: Value of type variable "AnyStr" of "urljoin" cannot be "object" [type-var]

🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 562 - 580, The code may pass
non-string metadata and content into urljoin and RAGChunk; ensure parent_id and
reference_url are converted to strings and chunk.content is a string before
creating RAGChunk: when building source, call
urljoin("https://mimir.corp.redhat.com", str(parent_id)) and set source =
str(chunk.metadata.get("reference_url")) if reference_url exists (or None if
missing); when constructing the RAGChunk ensure content is a string (e.g., use
chunk.content.text or str(chunk.content)) and keep the existing score logic
using retrieved_scores.

Comment on lines 525 to 549
if not doc_id and not reference_url:
continue

# Build URL based on offline flag
if offline:
# Use parent/doc path
reference_doc = doc_id
doc_url = "https://mimir.corp.redhat.com" + reference_doc
else:
# Use reference_url if online
reference_doc = reference_url or doc_id
doc_url = (
reference_doc
if reference_doc.startswith("http")
else ("https://mimir.corp.redhat.com" + reference_doc)
)

if reference_doc and reference_doc not in metadata_doc_ids:
metadata_doc_ids.add(reference_doc)
doc_ids_from_chunks.append(
ReferencedDocument(
doc_title=title,
doc_url=doc_url,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix type errors when building document URLs.

Pipeline reports multiple type errors because doc_id, reference_doc, and reference_url can be None. The string concatenation and startswith calls will fail at runtime.

🐛 Proposed fix for type safety
                 if not doc_id and not reference_url:
                     continue
 
                 # Build URL based on offline flag
+                doc_url: Optional[str] = None
                 if offline:
                     # Use parent/doc path
-                    reference_doc = doc_id
-                    doc_url = "https://mimir.corp.redhat.com" + reference_doc
+                    if doc_id:
+                        reference_doc = str(doc_id)
+                        doc_url = "https://mimir.corp.redhat.com" + reference_doc
+                    else:
+                        continue
                 else:
                     # Use reference_url if online
-                    reference_doc = reference_url or doc_id
-                    doc_url = (
-                        reference_doc
-                        if reference_doc.startswith("http")
-                        else ("https://mimir.corp.redhat.com" + reference_doc)
-                    )
+                    reference_doc = str(reference_url) if reference_url else (str(doc_id) if doc_id else None)
+                    if reference_doc:
+                        doc_url = (
+                            reference_doc
+                            if reference_doc.startswith("http")
+                            else ("https://mimir.corp.redhat.com" + reference_doc)
+                        )
+                    else:
+                        continue
 
-                if reference_doc and reference_doc not in metadata_doc_ids:
+                if reference_doc and doc_url and reference_doc not in metadata_doc_ids:
                     metadata_doc_ids.add(reference_doc)
🧰 Tools
🪛 GitHub Actions: Pyright

[error] 532-532: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None"


[error] 538-538: "startswith" is not a known attribute of "None" (reportOptionalMemberAccess)


[error] 539-539: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None"

🪛 GitHub Actions: Type checks

[error] 532-532: Unsupported operand types for + ("str" and "None") [operator]


[error] 538-538: Item "None" of "Any | None" has no attribute "startswith" [union-attr]


[error] 539-539: Unsupported operand types for + ("str" and "None") [operator]

🤖 Prompt for AI Agents
In `@src/app/endpoints/streaming_query_v2.py` around lines 525 - 549, The code
building document URLs uses doc_id/reference_url/reference_doc without
guaranteeing they are strings, causing TypeErrors; update the block that
constructs doc_url (the variables doc_id, reference_url, reference_doc, doc_url,
metadata_doc_ids, and the ReferencedDocument creation) to first ensure
reference_doc is a non-None string (e.g., skip if None or cast to str only after
a non-None check), only call startswith on a confirmed string, and when
concatenating with "https://mimir.corp.redhat.com" ensure the left operand is a
string; in short, add explicit None guards and string coercion before using
startswith or '+' so ReferencedDocument is only created with valid string
doc_url/doc_title.

Comment on lines 567 to 587
for i, chunk in enumerate(retrieved_chunks):
# Extract source from chunk metadata based on offline flag
source = None
if chunk.metadata:
if offline:
parent_id = chunk.metadata.get("parent_id")
if parent_id:
source = urljoin("https://mimir.corp.redhat.com", parent_id)
else:
source = chunk.metadata.get("reference_url")

# Get score from retrieved_scores list if available
score = retrieved_scores[i] if i < len(retrieved_scores) else None

rag_chunks.append(
RAGChunk(
content=chunk.content,
source=source,
score=score,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix type errors in RAGChunk construction.

Pipeline reports:

  • Line 574: urljoin receives object type instead of str
  • Line 583: chunk.content is ChunkContent, not str
  • Line 584: source type mismatch
🐛 Proposed fix
     for i, chunk in enumerate(retrieved_chunks):
         # Extract source from chunk metadata based on offline flag
         source = None
         if chunk.metadata:
             if offline:
                 parent_id = chunk.metadata.get("parent_id")
-                if parent_id:
+                if parent_id and isinstance(parent_id, str):
                     source = urljoin("https://mimir.corp.redhat.com", parent_id)
             else:
-                source = chunk.metadata.get("reference_url")
+                ref_url = chunk.metadata.get("reference_url")
+                source = str(ref_url) if ref_url else None
 
         # Get score from retrieved_scores list if available
         score = retrieved_scores[i] if i < len(retrieved_scores) else None
 
         rag_chunks.append(
             RAGChunk(
-                content=chunk.content,
-                source=source,
+                content=str(chunk.content) if chunk.content else "",
+                source=source if isinstance(source, str) else None,
                 score=score,
             )
         )
🧰 Tools
🪛 GitHub Actions: Pyright

[error] 574-574: Argument of type "object" cannot be assigned to parameter "url" of type "AnyStr@urljoin | None" in function "urljoin"


[error] 583-583: Argument of type "ChunkContent" cannot be assigned to parameter "content" of type "str" in function "init"


[error] 584-584: Argument of type "str | object | None" cannot be assigned to parameter "source" of type "str | None" in function "init"

🪛 GitHub Actions: Type checks

[error] 574-574: Value of type variable "AnyStr" of "urljoin" cannot be "object" [type-var]

🤖 Prompt for AI Agents
In `@src/app/endpoints/streaming_query_v2.py` around lines 567 - 587, The RAGChunk
constructor is getting wrong types: ensure source is a str or None and content
is a str before calling RAGChunk in the loop over retrieved_chunks. For the
offline branch, coerce parent_id to str (e.g., parent_id = str(parent_id))
before passing to urljoin; for the non-offline branch ensure reference_url is
cast to str; for content, replace chunk.content with a string representation
(e.g., use chunk.content.text if that attribute exists, otherwise
str(chunk.content)) so RAGChunk(content=...) receives a str; update any uses of
source and score to match the expected types in RAGChunk.

Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query_v2.py (1)

636-644: ⚠️ Potential issue | 🔴 Critical

Return statement missing third element - incompatible with function signature.

Pipeline reports incompatible return type. The moderation-blocked path returns a 2-tuple but the function signature at line 394 declares a 3-tuple return type including list[ReferencedDocument].

🐛 Proposed fix
     if moderation_result.blocked:
         violation_message = moderation_result.message or ""
         await append_turn_to_conversation(
             client, llama_stack_conv_id, input_text, violation_message
         )
         return (
             create_violation_stream(violation_message, moderation_result.shield_model),
             normalize_conversation_id(conversation_id),
+            [],  # Empty list for doc_ids_from_chunks when blocked
         )
🧹 Nitpick comments (2)
src/app/endpoints/query_v2.py (2)

1009-1017: Consider reducing function arguments.

Pipeline flags too-many-arguments (6/5). Consider grouping related parameters into a dataclass or config object.

♻️ Example refactor using a context object
`@dataclass`
class ToolPrepContext:
    client: AsyncLlamaStackClient
    query_request: QueryRequest
    token: str
    config: AppConfig
    mcp_headers: Optional[dict[str, dict[str, str]]] = None
    skip_rag_tools: bool = False

async def prepare_tools_for_responses_api(
    context: ToolPrepContext,
) -> Optional[list[dict[str, Any]]]:
    ...

1-3: Module has grown large (1087 lines).

Pipeline flags the module exceeds 1000 lines. Consider extracting the RAG/vector querying logic (lines 431-597) into a separate utility module to improve maintainability and enable reuse with streaming_query_v2.py.

Signed-off-by: Anxhela Coba <acoba@redhat.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/app/endpoints/query_v2.py`:
- Around line 561-581: The RAGChunk constructor is receiving wrong types:
urljoin is being passed a non-str parent_id, chunk.content is a ChunkContent
object not a str, and source must be Optional[str]; fix by coercing values to
the expected types before creating RAGChunk: when deriving parent_id from
chunk.metadata call str(parent_id) (or skip urljoin if parent_id is None) so
urljoin(MIMIR_DOC_URL, str(parent_id)) receives strings; for the content field
extract the string payload from chunk.content (e.g., chunk.content.text or
similar) or fallback to str(chunk.content) so content is a str; ensure source is
either a str or None (set to None if not present) and keep score as the numeric
value from retrieved_scores if available. Apply these changes around the
RAGChunk(...) creation in the loop that references retrieved_chunks,
chunk.metadata, urljoin, MIMIR_DOC_URL, retrieved_scores, and RAGChunk.
- Around line 519-534: The offline/online URL construction must avoid using None
for doc_id/reference_url: change the offline branch so it only assigns
reference_doc = doc_id if doc_id is truthy; otherwise fall back to using
reference_url (and build doc_url from reference_url with MIMIR_DOC_URL if
needed). In the online branch compute reference_doc = reference_url or doc_id
and ensure you check that reference_doc is truthy before calling
reference_doc.startswith("http") (i.e., guard the .startswith call and treat a
None/empty reference_doc as needing MIMIR_DOC_URL + (reference_url or doc_id)).
Update the variables reference_doc and doc_url accordingly so neither branch
calls methods on None.
🧹 Nitpick comments (3)
src/app/endpoints/query_v2.py (3)

12-12: Consolidate imports from llama_stack_client.

Pipeline flags ungrouped imports. Lines 12 and 24 both import from the same package and should be combined.

♻️ Proposed fix
-from llama_stack_client import APIConnectionError, APIStatusError
+from llama_stack_client import APIConnectionError, APIStatusError, AsyncLlamaStackClient
 from llama_stack_api.openai_responses import (
     ...
 )
-from llama_stack_client import AsyncLlamaStackClient

Also applies to: 24-24


1009-1017: Consider reducing function arguments.

Pipeline warning: function has 6 arguments (exceeds limit of 5). Consider grouping related parameters into a config object or using **kwargs for optional parameters.


1-3: Module exceeds 1000 lines (currently 1080).

Pipeline warning C0302: Too many lines in module. Consider extracting the direct vector DB querying logic (lines 431-598) and related helpers into a separate module (e.g., utils/vector_query.py) to improve maintainability.

Comment on lines +519 to +534
if not doc_id and not reference_url:
continue

# Build URL based on offline flag
if offline:
# Use parent/doc path
reference_doc = doc_id
doc_url = MIMIR_DOC_URL + reference_doc
else:
# Use reference_url if online
reference_doc = reference_url or doc_id
doc_url = (
reference_doc
if reference_doc.startswith("http")
else (MIMIR_DOC_URL + reference_doc)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix type errors in offline/online URL construction.

Pipeline reports multiple type errors. The guard at line 519 allows cases where doc_id is None but reference_url exists, causing the offline branch to fail. Additionally, the online branch calls .startswith() on a potentially None value.

🐛 Proposed fix
                 if not doc_id and not reference_url:
                     continue

                 # Build URL based on offline flag
                 if offline:
-                    # Use parent/doc path
-                    reference_doc = doc_id
-                    doc_url = MIMIR_DOC_URL + reference_doc
+                    # Use parent/doc path - skip if no doc_id in offline mode
+                    if not doc_id:
+                        continue
+                    reference_doc = str(doc_id)
+                    doc_url = MIMIR_DOC_URL + reference_doc
                 else:
                     # Use reference_url if online
-                    reference_doc = reference_url or doc_id
+                    reference_doc = str(reference_url) if reference_url else (str(doc_id) if doc_id else None)
+                    if not reference_doc:
+                        continue
                     doc_url = (
                         reference_doc
-                        if reference_doc.startswith("http")
+                        if reference_doc.startswith("http")
                         else (MIMIR_DOC_URL + reference_doc)
                     )
🧰 Tools
🪛 GitHub Actions: Pyright

[error] 526-526: pyright: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None" (reportOperatorIssue)


[error] 532-532: pyright: "startswith" is not a known attribute of "None" (reportOptionalMemberAccess)


[error] 533-533: pyright: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None" (reportOperatorIssue)

🪛 GitHub Actions: Type checks

[error] 532-532: mypy error: Item "None" of "Any | None" has no attribute "startswith" [union-attr].

🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 519 - 534, The offline/online URL
construction must avoid using None for doc_id/reference_url: change the offline
branch so it only assigns reference_doc = doc_id if doc_id is truthy; otherwise
fall back to using reference_url (and build doc_url from reference_url with
MIMIR_DOC_URL if needed). In the online branch compute reference_doc =
reference_url or doc_id and ensure you check that reference_doc is truthy before
calling reference_doc.startswith("http") (i.e., guard the .startswith call and
treat a None/empty reference_doc as needing MIMIR_DOC_URL + (reference_url or
doc_id)). Update the variables reference_doc and doc_url accordingly so neither
branch calls methods on None.

Comment on lines +561 to +581
for i, chunk in enumerate(retrieved_chunks):
# Extract source from chunk metadata based on offline flag
source = None
if chunk.metadata:
if offline:
parent_id = chunk.metadata.get("parent_id")
if parent_id:
source = urljoin(MIMIR_DOC_URL, parent_id)
else:
source = chunk.metadata.get("reference_url")

# Get score from retrieved_scores list if available
score = retrieved_scores[i] if i < len(retrieved_scores) else None

rag_chunks.append(
RAGChunk(
content=chunk.content,
source=source,
score=score,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix type errors in RAGChunk creation.

Pipeline reports: urljoin receives object instead of str, chunk.content is ChunkContent not str, and source type doesn't match. Convert values explicitly.

🐛 Proposed fix
     # Convert retrieved chunks to RAGChunk format
     for i, chunk in enumerate(retrieved_chunks):
         # Extract source from chunk metadata based on offline flag
         source = None
         if chunk.metadata:
             if offline:
                 parent_id = chunk.metadata.get("parent_id")
                 if parent_id:
-                    source = urljoin(MIMIR_DOC_URL, parent_id)
+                    source = urljoin(MIMIR_DOC_URL, str(parent_id))
             else:
-                source = chunk.metadata.get("reference_url")
+                ref_url = chunk.metadata.get("reference_url")
+                source = str(ref_url) if ref_url else None

         # Get score from retrieved_scores list if available
         score = retrieved_scores[i] if i < len(retrieved_scores) else None

+        # Extract string content from chunk
+        content = chunk.content
+        if not isinstance(content, str):
+            content = getattr(content, "text", None) or str(content)
+
         rag_chunks.append(
             RAGChunk(
-                content=chunk.content,
+                content=content,
                 source=source,
                 score=score,
             )
         )
🧰 Tools
🪛 GitHub Actions: Pyright

[error] 568-568: pyright: Argument of type "object" cannot be assigned to parameter "url" of type "AnyStr@urljoin | None" in function "urljoin" (reportArgumentType)


[error] 577-577: pyright: Argument of type "ChunkContent" cannot be assigned to parameter "content" of type "str" in function "init" (reportArgumentType)


[error] 578-578: pyright: Argument of type "str | object | None" cannot be assigned to parameter "source" of type "str | None" in function "init" (reportArgumentType)

🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 561 - 581, The RAGChunk
constructor is receiving wrong types: urljoin is being passed a non-str
parent_id, chunk.content is a ChunkContent object not a str, and source must be
Optional[str]; fix by coercing values to the expected types before creating
RAGChunk: when deriving parent_id from chunk.metadata call str(parent_id) (or
skip urljoin if parent_id is None) so urljoin(MIMIR_DOC_URL, str(parent_id))
receives strings; for the content field extract the string payload from
chunk.content (e.g., chunk.content.text or similar) or fallback to
str(chunk.content) so content is a str; ensure source is either a str or None
(set to None if not present) and keep score as the numeric value from
retrieved_scores if available. Apply these changes around the RAGChunk(...)
creation in the loop that references retrieved_chunks, chunk.metadata, urljoin,
MIMIR_DOC_URL, retrieved_scores, and RAGChunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants