-
Notifications
You must be signed in to change notification settings - Fork 70
[LCORE-873] Support Solr Vector I/O Provider in LCORE #868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
WalkthroughThis PR introduces direct vector database querying for RAG functionality and restructures the configuration and query endpoints. Changes include expanded run.yaml provider/backend configuration, a new direct vector DB query path in query_v2 and streaming_query_v2 endpoints, updated request/response models to expose referenced documents, and constants updates for RAG tool defaults and document URL references. Changes
Sequence DiagramsequenceDiagram
participant Client
participant QueryEndpoint as Query Endpoint (v2)
participant VectorDB as Vector DB
participant ResponsesAPI as Responses API
participant LLMBackend as LLM Backend
Client->>QueryEndpoint: send query request + optional solr filters
alt Direct Vector Querying Path
QueryEndpoint->>VectorDB: query(vector_store_id, k=5, solr_params)
VectorDB-->>QueryEndpoint: return chunks with metadata & doc IDs
QueryEndpoint->>QueryEndpoint: extract doc IDs, URLs, construct RAGChunk objects
QueryEndpoint->>QueryEndpoint: build rag_context from top 5 chunks
end
QueryEndpoint->>ResponsesAPI: call with input_text + rag_context (injected) + skip_rag_tools=True
ResponsesAPI->>LLMBackend: forward request to LLM
LLMBackend-->>ResponsesAPI: return response
ResponsesAPI-->>QueryEndpoint: return response + parsed referenced documents
QueryEndpoint->>QueryEndpoint: merge all_rag_chunks (vector query + response API)
QueryEndpoint->>QueryEndpoint: merge all_referenced_documents
QueryEndpoint-->>Client: return response with merged RAG chunks & documents
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/utils/checks.py (1)
60-78: Update docstring to reflect new parent directory validation.The docstring doesn't describe the new behavior introduced in lines 82-91. When
must_exists=Falseandmust_be_writable=True, the function now validates parent directory existence and writability.Apply this diff to update the docstring:
def directory_check( path: FilePath, must_exists: bool, must_be_writable: bool, desc: str ) -> None: """ - Ensure the given path is an existing directory. + Ensure the given path is a valid directory. - If the path is not a directory, raises InvalidConfigurationError. + If must_exists is True, verifies the directory exists. If must_be_writable + is True and the directory doesn't exist, validates that its parent directory + exists and is writable. Parameters: path (FilePath): Filesystem path to validate. - must_exists (bool): Should the directory exists? - must_be_writable (bool): Should the check test if directory is writable? + must_exists (bool): If True, the directory must exist. + must_be_writable (bool): If True, the directory (or its parent if the + directory doesn't exist) must be writable. desc (str): Short description of the value being checked; used in error messages. Raises: InvalidConfigurationError: If `path` does not point to a directory or - is not writable when required. + is not writable when required, or if parent directory validation fails. """src/app/endpoints/query.py (1)
746-769: Avoid relying on vector_db_ids being undefined when no_tools=TrueIn
retrieve_response,vector_db_idsis only populated in theelsebranch (whenno_toolsis False), but it’s later read inside the RAG block:try: if vector_db_ids: ...When
no_toolsis True, this condition currently triggers anUnboundLocalErrorthat is caught by the blanketexcept, producing misleading “Failed to query vector database for chunks” warnings even though RAG is intentionally disabled.Define
vector_db_idsup front and use it consistently so the RAG path is skipped cleanly when no_tools is set:@@ - agent, conversation_id, session_id = await get_agent( + agent, conversation_id, session_id = await get_agent( client, model_id, system_prompt, available_input_shields, available_output_shields, query_request.conversation_id, query_request.no_tools or False, ) @@ - logger.debug("Conversation ID: %s, session ID: %s", conversation_id, session_id) - # bypass tools and MCP servers if no_tools is True - if query_request.no_tools: + logger.debug("Conversation ID: %s, session ID: %s", conversation_id, session_id) + + # Track available vector DBs (used for toolgroups and RAG); default to empty + vector_db_ids: list[str] = [] + + # bypass tools and MCP servers if no_tools is True + if query_request.no_tools: mcp_headers = {} agent.extra_headers = {} toolgroups = None else: @@ - # Include RAG toolgroups when vector DBs are available - vector_dbs = await client.vector_dbs.list() - vector_db_ids = [vdb.identifier for vdb in vector_dbs] + # Include RAG toolgroups when vector DBs are available + vector_dbs = await client.vector_dbs.list() + vector_db_ids = [vdb.identifier for vdb in vector_dbs]With this change, the later
if vector_db_ids:guard in the RAG block works as intended without spurious exceptions/logs whenno_tools=True.
🧹 Nitpick comments (11)
tests/unit/cache/test_sqlite_cache.py (1)
394-415: Stronger field-level checks look good; final assertions are slightly redundantThe new field-by-field assertions and the
str(...)comparison fordoc_urlcorrectly exercise the serialization/deserialization path and the wideneddoc_urltype. Functionally this test now does a much better job of pinning down the expected shape ofCacheEntry.You could trim a bit of redundancy by either:
- Dropping the later
assert retrieved_entries[0].referenced_documents is not None/ title check, since they’re implied by the new assertions, or- Reusing
retrieved_entryeverywhere for consistency.This is purely a readability/maintenance tweak; behavior is fine as-is.
src/models/requests.py (1)
3-168: Clarify solr field shape and avoid set literal in examplesThe new
solr: Optional[dict[str, Any]]field is a reasonable flexible hook for Solr filters and matches how it’s consumed invector_io.query. Two small improvements:
- The example currently uses a Python set:
examples=[ {"fq": {"product:*openshift*", "product_version:*4.16*"}}, ]which is unusual in a JSON-ish example. A list is clearer and serializes more predictably:
- examples=[ - {"fq": {"product:*openshift*", "product_version:*4.16*"}}, - ], + examples=[ + {"fq": ["product:*openshift*", "product_version:*4.16*"]}, + ],
- Consider briefly documenting in the docstring or description that
solris an opaque bag of Solr parameters (e.g.,fq,q.op, etc.) so clients know they’re passing them through verbatim.src/models/context.py (1)
8-51: New RAG fields on ResponseGeneratorContext look good; update docstring
vector_io_rag_chunks: list[RAGChunk] | Noneandvector_io_referenced_docs: list[ReferencedDocument] | Noneare well-typed and align with the response models.To keep documentation in sync, consider extending the class docstring’s Attributes section to include brief descriptions of these two new fields.
src/app/endpoints/query.py (1)
771-897: Refine RAG integration: config for OFFLINE source URL and reduce verbose payload loggingThe new vector-DB/RAG flow is a good step, but two aspects are worth tightening up:
Hard‑coded internal base URL and OFFLINE flag
OFFLINE = Trueand the inline"https://mimir.corp.redhat.com"base URL (both fordoc_urlandsource) effectively bake an environment-specific detail into the endpoint.- This makes it hard to run in other environments and leaks internal domain knowledge in code.
Consider:
-# TODO: move this setting to a higher level configuration
-OFFLINE = True
+OFFLINE = True # TODO: drive from configuration instead of a module constantand pushing the base URL into configuration (or deriving it from chunk metadata) so you don’t need `"https://mimir.corp.redhat.com"` literals here. 2. **Logging entire vector_io query_response at info level** - `logger.info(f"The query response total payload: {query_response}")` will dump all retrieved chunks and metadata on every query. - This can be very noisy and may expose large or sensitive content in logs. Recommend downgrading and summarizing: ```diff - logger.info(f"The query response total payload: {query_response}") + logger.debug("vector_io.query response: %s", query_response) + logger.info( + "vector_io.query returned %d chunks for vector_db_id=%s", + len(getattr(query_response, "chunks", []) or []), + vector_db_id, + )This keeps observability while avoiding heavy, content-rich logs at info level.
run.yaml (1)
117-130: Consider parameterizing the Solr URL for production deployments.The hardcoded
localhost:8983URL will not work in containerized or distributed environments. Consider using an environment variable reference similar to line 19.vector_io: - provider_id: solr-vector provider_type: remote::solr_vector_io config: - solr_url: "http://localhost:8983/solr" + solr_url: ${env.SOLR_URL:-http://localhost:8983/solr} collection_name: "portal-rag"src/app/endpoints/streaming_query.py (6)
81-84: HardcodedOFFLINEflag should be moved to configuration.The TODO acknowledges this needs to be addressed. This module-level constant affects source URL resolution behavior and should be configurable, especially for toggling between environments.
Would you like me to help implement this as part of the configuration system, or open an issue to track this task?
165-171: Update docstring to document the newvector_io_referenced_docsparameter.The function signature now includes
vector_io_referenced_docs, but the docstring's Parameters section doesn't document it.Parameters: metadata_map (dict): A mapping containing metadata about referenced documents. summary (TurnSummary): Summary of the conversation turn. token_usage (TokenCounter): Token usage information. media_type (str): The media type for the response format. + vector_io_referenced_docs (list[ReferencedDocument] | None): Referenced + documents from vector_io query. Returns:
487-515: Reconsider emitting "No Violation" token for successful shield validations.Yielding "No Violation" as a
LLM_TOKEN_EVENTfor every successful shield validation may create noise in the client stream. Clients typically only need to know about violations, not successful passes. Consider yielding nothing or using a different event type (e.g., debug/info level).else: - # Yield "No Violation" message for successful shield validations - yield stream_event( - data={ - "id": chunk_id, - "token": "No Violation", - }, - event_type=LLM_TOKEN_EVENT, - media_type=media_type, - ) + # Successful shield validations are silently passed + pass
1230-1231: Use more Pythonic empty list check.Comparing with
== []is less idiomatic than using truthiness.# Convert empty list to None for consistency with existing behavior - if toolgroups == []: + if not toolgroups: toolgroups = None
1296-1298: Missingpylint: disable=broad-exceptcomment for consistency.Line 1086 has the pylint disable comment for the broad except, but line 1296 does not, which may trigger linting warnings.
- except Exception as e: + except Exception as e: # pylint: disable=broad-except logger.warning("Failed to query vector database for chunks: %s", e) logger.debug("Vector DB query error details: %s", traceback.format_exc())
1304-1307: RAG context formatting could benefit from a helper function.The RAG context formatting logic could be extracted for reusability and testability.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
run.yaml(3 hunks)src/app/endpoints/query.py(6 hunks)src/app/endpoints/streaming_query.py(14 hunks)src/models/context.py(2 hunks)src/models/requests.py(2 hunks)src/models/responses.py(1 hunks)src/utils/checks.py(1 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(4 hunks)tests/unit/cache/test_postgres_cache.py(1 hunks)tests/unit/cache/test_sqlite_cache.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/cache/test_postgres_cache.pytests/unit/app/endpoints/test_query.pytests/unit/cache/test_sqlite_cache.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/cache/test_postgres_cache.pytests/unit/app/endpoints/test_query.pytests/unit/cache/test_sqlite_cache.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/models/context.pysrc/utils/checks.pysrc/models/responses.pysrc/models/requests.pysrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/context.pysrc/models/responses.pysrc/models/requests.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
🧠 Learnings (3)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models
Applied to files:
src/models/requests.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
Applied to files:
src/app/endpoints/streaming_query.py
🧬 Code graph analysis (3)
src/models/context.py (1)
src/models/responses.py (2)
RAGChunk(25-30)ReferencedDocument(326-336)
src/app/endpoints/query.py (1)
src/models/responses.py (2)
RAGChunk(25-30)ReferencedDocument(326-336)
src/app/endpoints/streaming_query.py (3)
src/models/responses.py (2)
RAGChunk(25-30)ReferencedDocument(326-336)src/utils/types.py (1)
TurnSummary(89-163)src/app/endpoints/query.py (1)
get_rag_toolgroups(950-977)
🪛 GitHub Actions: E2E Tests
src/utils/checks.py
[error] 87-87: InvalidConfigurationError: Check directory to store feedback '/tmp/data/feedback' cannot be created - parent directory does not exist
🔇 Additional comments (10)
src/utils/checks.py (1)
82-91: Unable to verify review comment claims due to repository access constraints.The review comment makes specific claims about a pipeline failure and broken E2E tests that require access to the full codebase, test files, and pipeline logs to verify. Without repository access, I cannot:
- Confirm the reported pipeline failure with
/tmp/data/feedback- Verify all usages of
directory_checkand assess impact- Review test code to understand breaking changes
- Determine if the parent directory validation is the root cause
Manual verification is required by a developer with repository access.
src/models/responses.py (1)
326-336: doc_url union type change looks correctAllowing
doc_url: AnyUrl | str | Nonealigns with the new tests and preserves backwards compatibility for existing AnyUrl usage; no issues seen here.tests/unit/app/endpoints/test_conversations_v2.py (1)
93-116: Updated expectation for doc_url string is appropriateAsserting
str(ref_docs[0]["doc_url"]) == "http://example.com"makes the test compatible withdoc_urlbeing either anAnyUrlor a plain string, matching the updated model typing.tests/unit/cache/test_postgres_cache.py (1)
516-537: More granular CacheEntry assertions improve robustnessThe revised test that checks each CacheEntry field and stringifies
doc_urlbefore comparison is clearer and more resilient to representation differences than relying on whole-object equality.tests/unit/app/endpoints/test_query.py (1)
1007-1102: Tests correctly capture new doc_url semantics (including malformed URLs)The updated expectations:
- Using
str(doc.doc_url)/str(docs[i].doc_url)for valid URLs, and- Asserting that malformed URLs yield a non-
NoneReferencedDocumentwithdoc.doc_url == "not a valid url"are consistent with
ReferencedDocument.doc_urlnow allowing raw strings as well asAnyUrl. This gives better resilience to imperfect metadata without dropping referenced documents.src/app/endpoints/query.py (1)
289-423: RAG chunk propagation into transcripts and QueryResponse is wired correctlyUnpacking
summary.rag_chunks, converting them once torag_chunks_dictforstore_transcript, and exposingrag_chunks=summary.rag_chunks or []viaQueryResponsekeeps the same source of truth for RAG data across storage and API response. The additional referenced_documents handling remains consistent with that contract.run.yaml (2)
19-19: LGTM!Using environment variable reference for
external_providers_dirimproves configurability across different deployment environments.
144-160: LGTM!The vector_dbs and embedding model configurations are consistent with the new Solr provider setup. The 384 embedding dimension correctly matches the granite-embedding-30m-english model specification.
src/app/endpoints/streaming_query.py (2)
78-78: LGTM!The logger correctly uses the shared logger name "app.endpoints.handlers" as per the project convention for unified logging across endpoint handlers. Based on learnings.
1218-1227: PotentialTypeErrorwhen concatenating toolgroups.If
get_rag_toolgroupsreturnsNone(whenvector_db_idsis empty), andmcp_toolgroupsis a list, line 1225 could attemptNone + listcausing aTypeError. However, the condition on line 1224 guards this case.The logic is correct but could be clearer. The current flow works because line 1224 checks
if vector_db_idsbefore concatenation.
asimurka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rebase, you will need to change vector_db to vector_stores, messages from UserMessage format to pure string and copy everything to query v2 implementations that use responses api. Overall LGTM, great work!
eaf66dd to
60c3c4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/database.py (1)
116-136: Add a default case to handle unsupported database types.The
matchstatement lacks a default case. Ifdb_config.db_typeis neither "sqlite" nor "postgres",engineremainsNoneand Line 136 creates asessionmakerwithbind=None. This could lead to confusing runtime errors later.case "postgres": logger.info("Initialize PostgreSQL database") postgres_config = db_config.config logger.debug("Configuration: %s", postgres_config) if not isinstance(postgres_config, PostgreSQLDatabaseConfiguration): raise TypeError( f"Expected PostgreSQLDatabaseConfiguration, got {type(postgres_config)}" ) engine = _create_postgres_engine(postgres_config, **create_engine_kwargs) + case _: + raise ValueError(f"Unsupported database type: {db_config.db_type}") session_local = sessionmaker(autocommit=False, autoflush=False, bind=engine)
♻️ Duplicate comments (4)
src/app/endpoints/streaming_query.py (4)
1046-1051: Hardcoded URL should be extracted to configuration - still present.The URL
https://mimir.corp.redhat.comis hardcoded in multiple places. This was flagged in a previous review.Also applies to: 1067-1069, 1275-1277
905-908: RAG chunks queried twice per streaming request - still present.The vector DB is queried at line 906-908 via
query_vector_io_for_chunks, and again withinretrieve_responseat lines 1243-1298. This doubles latency and load. Consider passing the pre-fetched chunks toretrieve_responseor consolidating the query logic.Also applies to: 1243-1298
1022-1022: Avoid logging entire query response payloads - still present.This was flagged in a previous review. Logging the full
query_responsepayload at INFO level may expose sensitive document content.- logger.info("The query response total payload: %s", query_response) + logger.debug("Query response chunk count: %d", len(query_response.chunks) if query_response.chunks else 0)
1025-1033: Redundant rag_chunks creation - still present.The
rag_chunkslist created on lines 1025-1033 is never used; it's overwritten byfinal_rag_chunkson lines 1059-1082. Remove the unused code.if query_response.chunks: - rag_chunks = [ - RAGChunk( - content=str(chunk.content), # Convert to string if needed - source=getattr(chunk, "doc_id", None) - or getattr(chunk, "source", None), - score=getattr(chunk, "score", None), - ) - for chunk in query_response.chunks[:5] # Limit to top 5 chunks - ] - logger.info("Retrieved %d chunks from vector DB", len(rag_chunks)) - # Extract doc_ids from chunks for referenced_documentsAlso applies to: 1059-1084
🧹 Nitpick comments (41)
tests/e2e/features/steps/common_http.py (1)
123-132: Richer status-code assertion is good; consider a small style tweakIncluding the parsed JSON (or text fallback) in the failure message is very helpful for debugging and keeps behavior correct. The
if ...: assert False, ...pattern is slightly unconventional, though; consider raisingAssertionErrordirectly for clarity and to avoid the dummyFalse:- if context.response.status_code != status: - # Include response body in error message for debugging - try: - error_body = context.response.json() - except Exception: - error_body = context.response.text - assert False, ( - f"Status code is {context.response.status_code}, expected {status}. " - f"Response: {error_body}" - ) + if context.response.status_code != status: + # Include response body in error message for debugging + try: + error_body = context.response.json() + except Exception: + error_body = context.response.text + raise AssertionError( + f"Status code is {context.response.status_code}, expected {status}. " + f"Response: {error_body}" + )Optionally, you could also narrow the exception to
ValueError/json.JSONDecodeErrorinstead of catchingException, but that’s not critical here.tests/e2e/features/conversation_cache_v2.feature (1)
13-29: TODO: Track and address the skipped test for empty vector DB bug.This scenario is currently skipped due to a known bug where an empty vector database causes a 500 error. The test should be re-enabled once the underlying issue is resolved.
Do you want me to create an issue to track this bug and the test re-enablement, or is this already tracked under one of the referenced stories (LCORE-1036, LCORE-935, LCORE-882)?
scripts/gen_doc.py (2)
10-10: Adding tests directories toDIRECTORIESmay overwrite existing READMEsIncluding
"tests/unit","tests/integration", and"tests/e2e"means this script will now (re)generateREADME.mdin every one of those directories and their subdirectories, overwriting any manually maintained READMEs there. If that’s intentional, this is fine; if some test folders have curated documentation, you may want to either narrow the list or skip directories that already contain a non‑generated README.
53-63: Harden main loop against missing directories and reduce brittleness of path handlingAs written,
generate_documentation_on_path(f"{directory}/")will raise if any directory inDIRECTORIESdoesn’t exist (e.g.,tests/e2ein an older or partial checkout). You can make this more robust by checking for existence once and reusing aPathobject:def main(): """Entry point to this script, regenerates documentation in all directories.""" - for directory in DIRECTORIES: - generate_documentation_on_path(f"{directory}/") - for path in Path(directory).rglob("*"): - if path.is_dir(): - if ( - path.name == "lightspeed_stack.egg-info" - or path.name == "__pycache__" - or ".ruff_cache" in str(path) - ): - continue - generate_documentation_on_path(path) + for directory in DIRECTORIES: + base = Path(directory) + if not base.exists(): + print(f"[gendoc] Skipping missing directory: {base}") + continue + + generate_documentation_on_path(base) + for path in base.rglob("*"): + if not path.is_dir(): + continue + if ( + path.name == "lightspeed_stack.egg-info" + or path.name == "__pycache__" + or ".ruff_cache" in str(path) + ): + continue + generate_documentation_on_path(path)This keeps behavior the same when directories are present, but avoids failures (and makes the assumption about directory existence explicit) when they’re not.
docs/demos/lcore/lcore.md (3)
34-37: Improve language clarity.The phrase "It's a real framework independent on programming language" is awkward. Consider revising to: "It's a language-independent framework with providers for RAG, quota control, guardrails, metrics, etc."
23-104: Consider the scope and audience for the extensive Llama Stack content.The PR introduces significant Llama Stack framework documentation (lines 23-104) comprising roughly 40% of the file's content. While this provides valuable context, verify that this level of detail is appropriate for a Lightspeed Core demo presentation. The new Solr Vector I/O Provider, streaming query support, and keyword filtering features are indirectly covered under RAG and LLM Inference sections, but not explicitly highlighted.
If this is intended as an architectural overview that positions Lightspeed Core within the broader Llama Stack ecosystem, the current approach works well. However, if the primary audience expects practical Solr-specific details or streaming implementation guidance, consider adding a dedicated technical section.
75-104: Consider emoji accessibility for documentation portability.The section headers use emoji (🤖, 🛡️, 🔧, 📚, 🎯) for visual appeal. While this works well for browser/HTML presentation, verify that emoji render correctly across all planned distribution formats (PDF, markdown viewers, screen readers, etc.). If the documentation will be widely shared or translated, consider adding text fallback descriptions or using a more neutral header style.
src/app/endpoints/tools.py (4)
40-59: Docstring now slightly under-describes builtin toolgroupsThe implementation clearly returns tools from both builtin toolgroups and MCP server toolgroups (distinguished via
server_source), but the docstring still frames this primarily as “from all configured MCP servers.” Consider updating the description (and possibly the Raises section) to reflect that builtin toolgroups are also included and that only upstream connectivity issues currently surface as HTTP errors while missing toolgroups are skipped.
68-76: Good upstream connection handling; consider redacting low-level error detailsMapping connection failures to a 503 using
ServiceUnavailableResponseand raisingHTTPException(**response.model_dump())gives a clean, centralized error path and aligns with the “handle Llama Stack connection errors” guidance. The only concern is thatcause=str(e)may expose low‑level connection details to clients (hosts, ports, internal messages). If you want a stricter separation between internal diagnostics and public API, consider logging the full exception but returning a more genericcause(or omitting it) in the response body.If you want to double‑check, please confirm in the Llama Stack client docs that the connection error you’re catching here is the one you intend to surface and that its message doesn’t contain sensitive data.
85-97: BadRequestError is silently converted into partial successFor each toolgroup, a
BadRequestErroris logged and then skipped, resulting in a 200 response with that toolgroup’s tools omitted. This may be exactly what you want (best‑effort aggregation), but it means misconfigured/removed toolgroups never surface to clients as errors, only as missing tools and log noise.If misconfiguration is something operators or clients should notice, consider one of:
- Downgrading log level to
warningif this is expected and transient, or keepingerrorbut adding metrics; and/or- Surfacing partial‑failure information in
ToolsResponse(e.g., a “skipped_toolgroups” field) instead of only logs.This keeps the endpoint resilient while making failures more visible and debuggable.
You may want to confirm in the Llama Stack client docs whether
BadRequestErrorhere strictly means “toolgroup not found” vs broader 4xx classes, to decide how user‑visible this should be.
78-83: Aggregation and server_source derivation are correct; possible micro-refactorsThe aggregation over
toolgroups_response, derivation ofserver_source(builtin vs MCP URL), and the final summary logging all look logically sound. A couple of optional refinements if this endpoint becomes hot ormcp_serversgrows:
- Instead of recomputing
next(s for s in configuration.mcp_servers ...)for each tool, prebuild a{name: server}dict once so lookup is O(1).- The builtin vs MCP counts in the info log require two full passes over
consolidated_tools; if needed, you could maintain these counters while appending tools.These are minor and only worth doing if this endpoint turns into a high‑traffic or large‑dataset path.
Also applies to: 99-127, 135-140
scripts/llama_stack_tutorial.sh (1)
1-24: Consider adding error handling.The script lacks error handling directives like
set -ewhich would cause it to exit on errors. While this might be intentional for a tutorial script (to continue even if some commands fail), consider addingset -eor at least documenting this behavior.Add error handling at the top of the script:
#!/bin/bash +set -e # Exit on error # Llama Stack Tutorial - Interactive GuideOr if you want the tutorial to continue even on errors, add a comment explaining this:
#!/bin/bash +# Note: Error handling is intentionally omitted to allow the tutorial to continue +# even if some endpoints are unavailable # Llama Stack Tutorial - Interactive Guidesrc/authentication/k8s.py (1)
261-284: Variable shadowing:responseis reused with different types.The variable
responseis first assigned the SAR result (line 272), then conditionally reassigned to aForbiddenResponse(line 283). This shadowing makes the code harder to follow and could lead to bugs if the logic changes.try: authorization_api = K8sClientSingleton.get_authz_api() sar = kubernetes.client.V1SubjectAccessReview( spec=kubernetes.client.V1SubjectAccessReviewSpec( user=user_info.user.username, groups=user_info.user.groups, non_resource_attributes=kubernetes.client.V1NonResourceAttributes( path=self.virtual_path, verb="get" ), ) ) - response = authorization_api.create_subject_access_review(sar) + sar_response = authorization_api.create_subject_access_review(sar) except Exception as e: logger.error("API exception during SubjectAccessReview: %s", e) response = ServiceUnavailableResponse( backend_name="Kubernetes API", cause="Unable to perform authorization check", ) raise HTTPException(**response.model_dump()) from e - if not response.status.allowed: + if not sar_response.status.allowed: response = ForbiddenResponse.endpoint(user_id=user_info.user.uid) raise HTTPException(**response.model_dump())src/authentication/jwk_token.py (1)
37-46: TODO: Missing error handling for connection issues.The function has a TODO comment (line 42) about handling connection errors and timeouts. While
aiohttp.ClientErroris caught in the caller (lines 134-139), theresp.raise_for_status()on line 44 will raiseaiohttp.ClientResponseErrorwhich is a subclass ofClientErrorand is handled.However, consider adding an explicit timeout to prevent hanging on slow/unresponsive servers:
async def get_jwk_set(url: str) -> KeySet: """Fetch the JWK set from the cache, or fetch it from the URL if not cached.""" async with _jwk_cache_lock: if url not in _jwk_cache: - async with aiohttp.ClientSession() as session: - # TODO(omertuc): handle connection errors, timeouts, etc. - async with session.get(url) as resp: + timeout = aiohttp.ClientTimeout(total=10) + async with aiohttp.ClientSession(timeout=timeout) as session: + async with session.get(url) as resp: resp.raise_for_status() _jwk_cache[url] = JsonWebKey.import_key_set(await resp.json()) return _jwk_cache[url]src/configuration.py (1)
22-22: Type rename from ConversationCacheConfiguration to ConversationHistoryConfiguration.The import and return type annotation have been updated to use
ConversationHistoryConfiguration, aligning with the broader rename across the codebase.Consider renaming the method to match the new type.
The property method name
conversation_cache_configurationstill contains "cache" while the return type is nowConversationHistoryConfiguration. For consistency, consider renaming toconversation_history_configurationin a future refactor.Also applies to: 141-141
docs/streaming_query_endpoint.puml (2)
27-28: Consider clarifying shield event behavior.Based on the AI summary, shield events are only emitted on violations (silent on success). The diagram shows unconditional "emit shield validation event" which may be misleading. Consider adding a note or updating the alt text to clarify this conditional behavior.
else Chunk Type: shield - EventHandler->>SSE: emit shield validation event + EventHandler->>SSE: emit shield violation event (if violation detected)
13-14: Consider documenting RAG retrieval in the flow.The AI summary indicates the streaming flow incorporates RAG data via
query_vector_io_for_chunks, producingrag_chunksandvector_io_referenced_docsthat are included in end-stream payloads. Consider adding this to the diagram for completeness.src/constants.py (1)
102-104: New authentication module constants are properly defined.The constants follow the existing naming pattern and use descriptive string values. Consider adding brief comments describing each module's purpose, consistent with other constants in this file (e.g., lines 93-98 have comments).
As per coding guidelines, constants should have descriptive comments:
AUTH_MOD_NOOP_WITH_TOKEN = "noop-with-token" +# API key based authentication module AUTH_MOD_APIKEY_TOKEN = "api-key-token" AUTH_MOD_JWK_TOKEN = "jwk-token" +# Red Hat Identity header authentication module AUTH_MOD_RH_IDENTITY = "rh-identity"src/cache/cache_factory.py (1)
4-4: Config type rename wiring is correct; optional docstring polishSwitching
conversation_cacheto acceptConversationHistoryConfigurationand extending the invalid type error message to includeCACHE_TYPE_NOOPare consistent with the new configuration model; behavior of the factory remains the same. You may optionally update the function docstring to mentionNoopCacheas a possible return type for completeness, but it's not functionally required.Also applies to: 20-20, 45-49
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)
1-32: Verifyauth_enabledfor noop-with-token e2e configThis config is named for the noop-with-token auth flow and sets
authentication.module: "noop-with-token", butservice.auth_enabledisfalse. If the intent is to exercise token-based auth in the e2e suite (Authorized-tag scenarios), you may wantauth_enabled: true; otherwise this config might not actually enforce the auth module. Please double-check against how other noop-with-token configs are used in the tests.
Based on learnings, noop-with-token tests typically rely on a dedicated config when the Authorized tag is present.src/authorization/middleware.py (1)
35-45: Confirm APIKEY_TOKEN → Noop resolvers mapping and structured error usageThe switch to
InternalServerErrorResponse/ForbiddenResponsefor raising HTTP exceptions viaresponse.model_dump()looks consistent with the new response model pattern and keeps error shapes centralized.For
AUTH_MOD_APIKEY_TOKEN,get_authorization_resolversnow returnsNoopRolesResolverandNoopAccessResolver, which effectively disables role-based checks—any request that passes API key authentication will satisfycheck_access(action, user_roles). If API keys are meant to grant blanket access, this is fine; if you intend to keep per-Actionauthorization semantics even for API keys, you may want a dedicated resolver instead of the noop pair. Please confirm the intended behavior.Also applies to: 68-69, 85-86, 94-95
src/app/endpoints/rags.py (2)
53-99: Consider log level for configuration details.Line 82 logs the entire
llama_stack_configurationobject atinfolevel. This could potentially expose sensitive configuration details in production logs. Consider usingdebuglevel instead, or logging only non-sensitive attributes.- logger.info("Llama stack config: %s", llama_stack_configuration) + logger.debug("Llama stack config: %s", llama_stack_configuration)
101-154: Same log level concern applies here.Line 130 has the same pattern—consider using
debuglevel to avoid leaking configuration details in production logs.- logger.info("Llama stack config: %s", llama_stack_configuration) + logger.debug("Llama stack config: %s", llama_stack_configuration)tests/e2e/features/environment.py (1)
84-102: Consider usingeliffor mutually exclusive config scenarios.Lines 92-102 use separate
ifstatements forInvalidFeedbackStorageConfigandNoCacheConfig. If a scenario were tagged with both,NoCacheConfigwould overwrite the config and trigger an immediate restart. If these are mutually exclusive, usingelifwould make this explicit and prevent unexpected behavior.if "InvalidFeedbackStorageConfig" in scenario.effective_tags: context.scenario_config = f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml" - if "NoCacheConfig" in scenario.effective_tags: + elif "NoCacheConfig" in scenario.effective_tags: context.scenario_config = ( f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml" )src/app/endpoints/root.py (2)
779-784: Consider adding 200 response for OpenAPI consistency.Unlike other endpoints in this PR (e.g.,
shields_responses,models_responses),root_responsesdoesn't include a 200 response entry. While theresponse_class=HTMLResponsehandles the success case, adding an explicit 200 entry would maintain consistency across endpoints.
787-808: Parameter order differs from other endpoints.The signature has
authbeforerequest, while other endpoints in this PR (e.g.,rags.py,shields.py,models.py) userequestbeforeauth. Consider reordering for consistency.async def root_endpoint_handler( - auth: Annotated[AuthTuple, Depends(get_auth_dependency())], request: Request, + auth: Annotated[AuthTuple, Depends(get_auth_dependency())], ) -> HTMLResponse:src/app/endpoints/metrics.py (1)
64-64: Unnecessarystr()cast onCONTENT_TYPE_LATEST.
CONTENT_TYPE_LATESTfromprometheus_clientis already a string constant. The explicitstr()conversion is redundant.- return PlainTextResponse(generate_latest(), media_type=str(CONTENT_TYPE_LATEST)) + return PlainTextResponse(generate_latest(), media_type=CONTENT_TYPE_LATEST).github/workflows/e2e_tests.yaml (1)
188-191: Consider using health check polling instead of fixed sleep.A fixed 20-second sleep may be insufficient under load or excessive when services start quickly. Consider polling the health endpoint with a timeout.
- name: Wait for services run: | echo "Waiting for services to be healthy..." - sleep 20 + for i in {1..30}; do + if curl -sf http://localhost:8080/health > /dev/null 2>&1; then + echo "Services healthy after $i seconds" + break + fi + sleep 1 + donedocs/config.json (1)
156-162: Minor: Consider minimum value of 1 for embedding_dimension.An embedding dimension of 0 is technically allowed by the schema but would be semantically invalid.
"embedding_dimension": { "default": 768, "description": "Dimensionality of embedding vectors.", - "minimum": 0, + "minimum": 1, "title": "Embedding dimension", "type": "integer" },src/utils/endpoints.py (2)
637-659: Consider early return for disabled transcripts.The nested if-else structure can be simplified with an early return pattern for better readability.
# Store transcript if enabled - if not is_transcripts_enabled_func(): - logger.debug("Transcript collection is disabled in the configuration") - else: - # Prepare attachments - attachments = query_request.attachments or [] - - # Determine rag_chunks: use provided value or empty list - transcript_rag_chunks = rag_chunks if rag_chunks is not None else [] - - store_transcript_func( - user_id=user_id, - conversation_id=conversation_id, - model_id=model_id, - provider_id=provider_id, - query_is_valid=True, - query=query_request.query, - query_request=query_request, - summary=summary, - rag_chunks=transcript_rag_chunks, - truncated=False, - attachments=attachments, - ) + if is_transcripts_enabled_func(): + attachments = query_request.attachments or [] + transcript_rag_chunks = rag_chunks if rag_chunks is not None else [] + store_transcript_func( + user_id=user_id, + conversation_id=conversation_id, + model_id=model_id, + provider_id=provider_id, + query_is_valid=True, + query=query_request.query, + query_request=query_request, + summary=summary, + rag_chunks=transcript_rag_chunks, + truncated=False, + attachments=attachments, + ) + else: + logger.debug("Transcript collection is disabled in the configuration")
591-610: Consider refactoringcleanup_after_streamingto reduce parameter count.This function has 19 parameters (triggering pylint R0913/R0917). Consider grouping related parameters into a dataclass or TypedDict to improve maintainability and reduce the chance of argument ordering errors.
Example approach:
@dataclass class StreamingCleanupContext: user_id: str conversation_id: str model_id: str provider_id: str llama_stack_model_id: str query_request: QueryRequest summary: TurnSummary metadata_map: dict[str, Any] started_at: str skip_userid_check: bool rag_chunks: list[dict[str, Any]] | None = None async def cleanup_after_streaming( context: StreamingCleanupContext, client: AsyncLlamaStackClient, config: AppConfig, callbacks: StreamingCleanupCallbacks, # another dataclass for the 4 funcs ) -> None:src/app/endpoints/conversations_v2.py (1)
229-241: Redundant cache type check at line 232.The
check_conversation_existencefunction checks ifconfiguration.conversation_cache_configuration.type is Nonebut this is already validated in the calling handlers before this function is invoked. Consider removing the redundant check or documenting why it's needed for safety.def check_conversation_existence(user_id: str, conversation_id: str) -> None: """Check if conversation exists.""" - # checked already, but we need to make pyright happy - if configuration.conversation_cache_configuration.type is None: - return + # Callers must validate cache availability before calling this function + assert configuration.conversation_cache_configuration.type is not None conversations = configuration.conversation_cache.list(user_id, False)Alternatively, if keeping for type narrowing, the comment already explains the intent which is acceptable.
src/app/endpoints/query.py (4)
79-82: Configuration TODO noted - consider addressing before merge.The
OFFLINEflag controls chunk source resolution but is currently hardcoded. The TODO comment acknowledges this should be moved to configuration. Consider prioritizing this to avoid environment-specific issues.
783-790: Use lazy % formatting for logger calls instead of f-strings.f-strings are evaluated even when the log level is disabled, causing unnecessary overhead. Use
%splaceholders with logger for deferred formatting.- logger.info(f"Initial params: {params}") - logger.info(f"query_request.solr: {query_request.solr}") + logger.info("Initial params: %s", params) + logger.info("query_request.solr: %s", query_request.solr) if query_request.solr: # Pass the entire solr dict under the 'solr' key params["solr"] = query_request.solr - logger.info(f"Final params with solr filters: {params}") + logger.info("Final params with solr filters: %s", params) else: logger.info("No solr filters provided") - logger.info(f"Final params being sent to vector_io.query: {params}") + logger.info("Final params being sent to vector_io.query: %s", params)
816-821: Hardcoded URL should be extracted to configuration.The URL
https://mimir.corp.redhat.comis hardcoded. This should be configurable for different environments.Consider adding a configuration value like
mimir_base_url:# In configuration mimir_base_url = configuration.mimir_base_url or "https://mimir.corp.redhat.com" # Then use: doc_url=urljoin(mimir_base_url, reference_doc),
827-830: Consider narrowing the exception type or logging the exception class.Catching bare
Exceptioncan mask unexpected errors. Consider catching more specific exceptions or at minimum logging the exception type for debugging.- except Exception as e: - logger.warning(f"Failed to query vector database for chunks: {e}") - logger.debug(f"Vector DB query error details: {traceback.format_exc()}") + except Exception as e: + logger.warning("Failed to query vector database for chunks: %s (%s)", e, type(e).__name__) + logger.debug("Vector DB query error details: %s", traceback.format_exc())src/app/endpoints/streaming_query.py (2)
505-515: Inconsistent behavior: "No Violation" token emitted for successful shield validations.For successful shield validations (no violations), the code now emits a "No Violation" token to the stream. This may clutter the response with unnecessary tokens. Consider whether this should be silently skipped instead.
else: - # Yield "No Violation" message for successful shield validations - yield stream_event( - data={ - "id": chunk_id, - "token": "No Violation", - }, - event_type=LLM_TOKEN_EVENT, - media_type=media_type, - ) + # Successful shield validations are silently ignored + pass
1296-1298: Bare Exception catch should log exception type.Consider logging the exception type for better debugging.
except Exception as e: - logger.warning("Failed to query vector database for chunks: %s", e) + logger.warning("Failed to query vector database for chunks: %s (%s)", e, type(e).__name__) logger.debug("Vector DB query error details: %s", traceback.format_exc())src/app/endpoints/streaming_query_v2.py (2)
56-56: Use__name__for module logger pattern.Per coding guidelines, use
logger = logging.getLogger(__name__)instead of a hardcoded string. This ensures consistent logger naming across the codebase.-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)
101-103: Consider specifying a more precise return type.The return type
Anyis vague. Consider using aCallabletype hint for better documentation and IDE support.-def create_responses_response_generator( # pylint: disable=too-many-locals,too-many-statements - context: ResponseGeneratorContext, -) -> Any: +from collections.abc import Callable + +def create_responses_response_generator( # pylint: disable=too-many-locals,too-many-statements + context: ResponseGeneratorContext, +) -> Callable[[AsyncIterator[OpenAIResponseObjectStream]], AsyncIterator[str]]:src/models/config.py (1)
841-850: Redundant validation check.The check
if self.api_key_config.api_key.get_secret_value() is Noneon line 847 is redundant. TheAPIKeyTokenConfigurationclass already hasmin_length=1on theapi_keyfield, which would fail validation before this check is reached. Additionally,get_secret_value()returns a string, notNone, for a validSecretStr.Consider simplifying or removing this check:
if self.module == constants.AUTH_MOD_APIKEY_TOKEN: if self.api_key_config is None: raise ValueError( "API Key configuration section must be specified " "when using API Key token authentication" ) - if self.api_key_config.api_key.get_secret_value() is None: - raise ValueError( - "api_key parameter must be specified when using API_KEY token authentication" - )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (12)
docs/architecture.pngis excluded by!**/*.pngdocs/architecture.svgis excluded by!**/*.svgdocs/config.pngis excluded by!**/*.pngdocs/config.svgis excluded by!**/*.svgdocs/convesation_history.svgis excluded by!**/*.svgdocs/demos/lcore/images/journey.pngis excluded by!**/*.pngdocs/demos/lcore/images/llama_stack_as_library.svgis excluded by!**/*.svgdocs/demos/lcore/images/llama_stack_as_service.svgis excluded by!**/*.svgdocs/query_endpoint.svgis excluded by!**/*.svgdocs/rest_api.svgis excluded by!**/*.svgdocs/streaming_query_endpoint.svgis excluded by!**/*.svguv.lockis excluded by!**/*.lock
📒 Files selected for processing (107)
.coderabbit.yaml(1 hunks).github/PULL_REQUEST_TEMPLATE.md(1 hunks).github/workflows/e2e_tests.yaml(7 hunks).tekton/lightspeed-stack-pull-request.yaml(3 hunks).tekton/lightspeed-stack-push.yaml(3 hunks)CONTRIBUTING.md(6 hunks)Containerfile(2 hunks)Makefile(3 hunks)README.md(12 hunks)docker-compose-library.yaml(1 hunks)docker-compose.yaml(1 hunks)docs/auth.md(1 hunks)docs/config.html(1 hunks)docs/config.json(1 hunks)docs/config.md(1 hunks)docs/config.puml(9 hunks)docs/demos/lcore/lcore.md(5 hunks)docs/query_endpoint.puml(1 hunks)docs/rag_guide.md(2 hunks)docs/streaming_query_endpoint.puml(1 hunks)examples/lightspeed-stack-api-key-auth.yaml(1 hunks)examples/lightspeed-stack-rh-identity.yaml(1 hunks)pyproject.toml(2 hunks)requirements.hermetic.txt(1 hunks)requirements.torch.txt(1 hunks)rpms.in.yaml(1 hunks)rpms.lock.yaml(1 hunks)scripts/gen_doc.py(2 hunks)scripts/llama_stack_tutorial.sh(1 hunks)scripts/remove_torch_deps.sh(1 hunks)src/app/database.py(2 hunks)src/app/endpoints/README.md(2 hunks)src/app/endpoints/authorized.py(1 hunks)src/app/endpoints/config.py(3 hunks)src/app/endpoints/conversations.py(11 hunks)src/app/endpoints/conversations_v2.py(8 hunks)src/app/endpoints/feedback.py(6 hunks)src/app/endpoints/health.py(2 hunks)src/app/endpoints/info.py(4 hunks)src/app/endpoints/metrics.py(3 hunks)src/app/endpoints/models.py(3 hunks)src/app/endpoints/providers.py(3 hunks)src/app/endpoints/query.py(12 hunks)src/app/endpoints/query_v2.py(9 hunks)src/app/endpoints/rags.py(1 hunks)src/app/endpoints/root.py(3 hunks)src/app/endpoints/shields.py(3 hunks)src/app/endpoints/streaming_query.py(13 hunks)src/app/endpoints/streaming_query_v2.py(1 hunks)src/app/endpoints/tools.py(3 hunks)src/app/main.py(3 hunks)src/app/routers.py(3 hunks)src/authentication/README.md(2 hunks)src/authentication/__init__.py(2 hunks)src/authentication/api_key_token.py(1 hunks)src/authentication/jwk_token.py(2 hunks)src/authentication/k8s.py(4 hunks)src/authentication/rh_identity.py(1 hunks)src/authentication/utils.py(1 hunks)src/authorization/middleware.py(5 hunks)src/cache/cache_factory.py(3 hunks)src/configuration.py(2 hunks)src/constants.py(2 hunks)src/metrics/utils.py(2 hunks)src/models/README.md(1 hunks)src/models/config.py(21 hunks)src/models/context.py(1 hunks)src/models/requests.py(4 hunks)src/models/responses.py(17 hunks)src/quota/__init__.py(1 hunks)src/quota/quota_limiter.py(2 hunks)src/quota/user_quota_limiter.py(1 hunks)src/utils/README.md(1 hunks)src/utils/checks.py(1 hunks)src/utils/common.py(3 hunks)src/utils/endpoints.py(6 hunks)src/utils/quota.py(2 hunks)src/utils/responses.py(1 hunks)src/utils/shields.py(1 hunks)test.containerfile(1 hunks)tests/configuration/rh-identity-config.yaml(1 hunks)tests/e2e/README.md(1 hunks)tests/e2e/__init__.py(1 hunks)tests/e2e/configs/README.md(1 hunks)tests/e2e/configs/run-azure.yaml(1 hunks)tests/e2e/configs/run-ci.yaml(3 hunks)tests/e2e/configs/run-rhelai.yaml(1 hunks)tests/e2e/configuration/README.md(1 hunks)tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml(1 hunks)tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml(1 hunks)tests/e2e/configuration/lightspeed-stack-library-mode.yaml(1 hunks)tests/e2e/configuration/lightspeed-stack-no-cache.yaml(1 hunks)tests/e2e/configuration/lightspeed-stack-server-mode.yaml(1 hunks)tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml(1 hunks)tests/e2e/features/README.md(1 hunks)tests/e2e/features/authorized_noop_token.feature(2 hunks)tests/e2e/features/conversation_cache_v2.feature(1 hunks)tests/e2e/features/conversations.feature(5 hunks)tests/e2e/features/environment.py(4 hunks)tests/e2e/features/feedback.feature(4 hunks)tests/e2e/features/health.feature(2 hunks)tests/e2e/features/info.feature(4 hunks)tests/e2e/features/query.feature(3 hunks)tests/e2e/features/steps/README.md(1 hunks)tests/e2e/features/steps/__init__.py(1 hunks)tests/e2e/features/steps/common_http.py(2 hunks)tests/e2e/features/steps/conversation.py(5 hunks)
⛔ Files not processed due to max files limit (25)
- tests/e2e/features/steps/llm_query_response.py
- tests/e2e/features/streaming_query.feature
- tests/e2e/test_api.py
- tests/e2e/utils/README.md
- tests/integration/README.md
- tests/integration/endpoints/README.md
- tests/integration/endpoints/test_config_integration.py
- tests/integration/endpoints/test_health_integration.py
- tests/integration/endpoints/test_info_integration.py
- tests/integration/endpoints/test_query_v2_integration.py
- tests/integration/test_middleware_integration.py
- tests/integration/test_openapi_json.py
- tests/integration/test_rh_identity_integration.py
- tests/unit/README.md
- tests/unit/app/README.md
- tests/unit/app/endpoints/README.md
- tests/unit/app/endpoints/test_authorized.py
- tests/unit/app/endpoints/test_config.py
- tests/unit/app/endpoints/test_conversations.py
- tests/unit/app/endpoints/test_conversations_v2.py
- tests/unit/app/endpoints/test_feedback.py
- tests/unit/app/endpoints/test_health.py
- tests/unit/app/endpoints/test_info.py
- tests/unit/app/endpoints/test_metrics.py
- tests/unit/app/endpoints/test_models.py
✅ Files skipped from review due to trivial changes (16)
- src/models/README.md
- rpms.in.yaml
- tests/e2e/features/steps/README.md
- .github/PULL_REQUEST_TEMPLATE.md
- src/quota/init.py
- examples/lightspeed-stack-api-key-auth.yaml
- tests/e2e/README.md
- tests/e2e/configuration/README.md
- docs/config.md
- tests/e2e/configs/README.md
- tests/e2e/init.py
- Containerfile
- src/quota/user_quota_limiter.py
- tests/e2e/features/README.md
- tests/configuration/rh-identity-config.yaml
- requirements.hermetic.txt
🚧 Files skipped from review as they are similar to previous changes (1)
- src/utils/checks.py
🧰 Additional context used
📓 Path-based instructions (10)
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/info.featuretests/e2e/features/feedback.featuretests/e2e/features/health.featuretests/e2e/features/conversation_cache_v2.featuretests/e2e/features/query.featuretests/e2e/features/authorized_noop_token.featuretests/e2e/features/conversations.feature
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/e2e/features/steps/__init__.pytests/e2e/features/steps/common_http.pytests/e2e/features/steps/conversation.pytests/e2e/features/environment.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/main.pysrc/metrics/utils.pysrc/authentication/utils.pysrc/authorization/middleware.pysrc/app/endpoints/models.pysrc/utils/endpoints.pysrc/cache/cache_factory.pysrc/models/context.pysrc/app/endpoints/tools.pysrc/authentication/jwk_token.pysrc/configuration.pysrc/app/routers.pysrc/app/endpoints/feedback.pysrc/app/endpoints/rags.pysrc/authentication/rh_identity.pysrc/utils/shields.pysrc/app/endpoints/health.pysrc/models/requests.pysrc/app/endpoints/query.pysrc/utils/common.pysrc/authentication/k8s.pysrc/app/endpoints/providers.pysrc/utils/responses.pysrc/utils/quota.pysrc/app/endpoints/info.pysrc/app/database.pysrc/constants.pysrc/authentication/api_key_token.pysrc/app/endpoints/metrics.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/config.pysrc/authentication/__init__.pysrc/app/endpoints/conversations.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/shields.pysrc/app/endpoints/authorized.pysrc/quota/quota_limiter.pysrc/app/endpoints/root.pysrc/models/responses.pysrc/app/endpoints/query_v2.pysrc/models/config.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/models.pysrc/app/endpoints/tools.pysrc/app/endpoints/feedback.pysrc/app/endpoints/rags.pysrc/app/endpoints/health.pysrc/app/endpoints/query.pysrc/app/endpoints/providers.pysrc/app/endpoints/info.pysrc/app/endpoints/metrics.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/config.pysrc/app/endpoints/conversations.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/shields.pysrc/app/endpoints/authorized.pysrc/app/endpoints/root.pysrc/app/endpoints/query_v2.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/models.pysrc/app/endpoints/tools.pysrc/app/endpoints/feedback.pysrc/app/endpoints/rags.pysrc/app/endpoints/health.pysrc/app/endpoints/query.pysrc/app/endpoints/providers.pysrc/app/endpoints/info.pysrc/app/endpoints/metrics.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/config.pysrc/app/endpoints/conversations.pysrc/app/endpoints/conversations_v2.pysrc/app/endpoints/shields.pysrc/app/endpoints/authorized.pysrc/app/endpoints/root.pysrc/app/endpoints/query_v2.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Configure pylint withsource-roots = "src"
Excludesrc/auth/k8s.pyfrom pyright type checking
Files:
pyproject.toml
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/context.pysrc/models/requests.pysrc/models/responses.pysrc/models/config.py
src/**/constants.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define shared constants in central
constants.pyfile with descriptive comments
Files:
src/constants.py
src/**/__init__.py
📄 CodeRabbit inference engine (CLAUDE.md)
Package
__init__.pyfiles must contain brief package descriptions
Files:
src/authentication/__init__.py
src/models/config.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/config.py: All configuration must use Pydantic models extendingConfigurationBasewithextra="forbid"to reject unknown fields
Use type hintsOptional[FilePath],PositiveInt,SecretStrfor Pydantic configuration models
Files:
src/models/config.py
🧠 Learnings (19)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
tests/e2e/features/info.featuresrc/app/main.pysrc/metrics/utils.pysrc/app/endpoints/models.pysrc/app/endpoints/tools.pysrc/app/endpoints/health.pysrc/app/endpoints/query.pysrc/app/endpoints/info.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/conversations.pysrc/app/endpoints/shields.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamlexamples/lightspeed-stack-rh-identity.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/lightspeed-stack-library-mode.yamlsrc/authentication/README.mdtests/e2e/features/authorized_noop_token.featuretests/e2e/configuration/lightspeed-stack-server-mode.yamldocs/auth.mdtests/e2e/configuration/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/features/authorized_noop_token.featuretests/e2e/configuration/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
src/app/main.pysrc/authentication/utils.pysrc/authorization/middleware.pysrc/app/endpoints/models.pysrc/utils/endpoints.pysrc/app/endpoints/feedback.pysrc/utils/quota.pysrc/app/endpoints/info.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/conversations.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Run `uv run make verify` to run all linters (black, pylint, pyright, ruff, docstyle, check-types) before completion
Applied to files:
MakefileCONTRIBUTING.md
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt` for end-to-end tests
Applied to files:
Makefile.github/workflows/e2e_tests.yamltests/e2e/configuration/lightspeed-stack-library-mode.yamltests/e2e/features/health.feature
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Applied to files:
Makefile.github/workflows/e2e_tests.yamltests/e2e/features/environment.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.
Applied to files:
docker-compose-library.yamlREADME.mddocs/demos/lcore/lcore.md
📚 Learning: 2025-07-23T14:26:40.340Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 278
File: src/app/endpoints/feedback.py:113-118
Timestamp: 2025-07-23T14:26:40.340Z
Learning: In the lightspeed-stack project after PR #278, the `UserDataCollection.feedback_storage` property dynamically constructs paths from `user_data_dir` and will always return a valid string, making fallback logic like `or Path("")` redundant.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Always check `pyproject.toml` for existing dependencies and versions before adding new ones
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Use `uv sync --group dev --group llslibdev` to install development dependencies
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/{unit,integration}/**/*.py : Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Applied to files:
CONTRIBUTING.md
📚 Learning: 2025-08-18T11:45:59.961Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:77-77
Timestamp: 2025-08-18T11:45:59.961Z
Learning: torch==2.7.1 is available on PyPI and is a valid version that can be used in dependency specifications.
Applied to files:
requirements.torch.txt
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Use `field_validator` and `model_validator` for custom validation in Pydantic models
Applied to files:
src/models/requests.py
📚 Learning: 2025-08-06T06:02:21.060Z
Learnt from: eranco74
Repo: lightspeed-core/lightspeed-stack PR: 348
File: src/utils/endpoints.py:91-94
Timestamp: 2025-08-06T06:02:21.060Z
Learning: The direct assignment to `agent._agent_id` in `src/utils/endpoints.py` is a necessary workaround for the missing agent rehydration feature in the LLS client SDK. This allows preserving conversation IDs when handling existing agents.
Applied to files:
src/app/endpoints/conversations.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/config.py : All configuration must use Pydantic models extending `ConfigurationBase` with `extra="forbid"` to reject unknown fields
Applied to files:
docs/config.pumlsrc/models/config.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/config.py : Use type hints `Optional[FilePath]`, `PositiveInt`, `SecretStr` for Pydantic configuration models
Applied to files:
docs/config.pumlsrc/models/config.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/models/**/*.py : Pydantic configuration classes must extend `ConfigurationBase`; data models must extend `BaseModel`
Applied to files:
docs/config.puml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (16)
tests/integration/endpoints/test_query_v2_integration.py (2)
357-375: Resolve merge conflict in mock_tool_output.results structure.Unresolved merge conflict markers prevent the file from being parsed. The conflict shows divergent mock structures for tool search results:
- HEAD: Inline list of mocks with
file_id,filename, andscoreattributes only- upstream/main: Separate
mock_resultobject with additionalattributesdict containingdoc_urlandlinkfieldsChoose the structure that matches the actual Llama Stack API response format. The upstream/main version appears more complete with URL metadata.
-<<<<<<< HEAD - mock_tool_output.results = [ - mocker.MagicMock( - file_id="doc-1", - filename="ansible-docs.txt", - score=0.95, - ) - ] -======= - mock_result = mocker.MagicMock() - mock_result.file_id = "doc-1" - mock_result.filename = "ansible-docs.txt" - mock_result.score = 0.95 - mock_result.attributes = { - "doc_url": "https://example.com/ansible-docs.txt", - "link": "https://example.com/ansible-docs.txt", - } - mock_tool_output.results = [mock_result] ->>>>>>> upstream/main + mock_result = mocker.MagicMock() + mock_result.file_id = "doc-1" + mock_result.filename = "ansible-docs.txt" + mock_result.score = 0.95 + mock_result.attributes = { + "doc_url": "https://example.com/ansible-docs.txt", + "link": "https://example.com/ansible-docs.txt", + } + mock_tool_output.results = [mock_result]
1244-1275: Resolve merge conflict in transcript behavior test.Unresolved merge conflict markers prevent the file from being parsed. The conflict affects the function signature and transcript mocking:
- HEAD: No
mockerparameter, no mocking ofstore_transcript(may create actual files during tests)- upstream/main: Includes
mockerparameter and mocksstore_transcriptto prevent file creationThe upstream/main approach is preferred for integration tests, as it prevents unwanted side effects (actual file I/O) while still verifying the integration logic.
async def test_query_v2_endpoint_transcript_behavior( test_config: AppConfig, mock_llama_stack_client: AsyncMockType, test_request: Request, test_auth: AuthTuple, patch_db_session: Session, -<<<<<<< HEAD -======= + mocker: MockerFixture, ->>>>>>> upstream/main ) -> None: """Test transcript storage behavior based on configuration. This integration test verifies: - Endpoint succeeds with transcripts enabled - Endpoint succeeds with transcripts disabled - Conversation is persisted regardless of transcript setting - Integration between query handler and transcript configuration Args: test_config: Test configuration mock_llama_stack_client: Mocked Llama Stack client test_request: FastAPI request test_auth: noop authentication tuple patch_db_session: Test database session -<<<<<<< HEAD - """ - _ = mock_llama_stack_client - -======= + mocker: pytest-mock fixture + """ + _ = mock_llama_stack_client + + # Mock store_transcript to prevent file creation + mocker.patch("app.endpoints.query.store_transcript") + ->>>>>>> upstream/main test_config.user_data_collection_configuration.transcripts_enabled = Truetests/e2e/features/environment.py (1)
92-112: Harden NoCacheConfig config swapping and restore logicThe new immediate switch/restart for
NoCacheConfiginbefore_scenariois fine, but the restore path inafter_scenariohas a couple of pitfalls:
after_scenarioalways restores usingcontext.feature_configforNoCacheConfig, butcontext.feature_configis only set inbefore_featureforAuthorizedfeatures. If aNoCacheConfigscenario ever appears in a non‑Authorized feature, this will raise anAttributeErrorand leavelightspeed-stack.yamlin the no‑cache state.- Unlike the feature‑level
Authorizedpath, there is no dedicated backup oflightspeed-stack.yamlwhen you overwrite it forNoCacheConfig, so the restore logic depends on knowing the original config path instead of having a real snapshot of the active config.You can make this path robust and self‑contained by introducing a per‑scenario backup for
NoCacheConfigand cleaning it up afterwards, e.g.:@@ def before_scenario(context: Context, scenario: Scenario) -> None: - if "InvalidFeedbackStorageConfig" in scenario.effective_tags: - context.scenario_config = f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml" - if "NoCacheConfig" in scenario.effective_tags: - context.scenario_config = ( - f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml" - ) - # Switch config and restart immediately - switch_config( - context.scenario_config - ) # Copies to default lightspeed-stack.yaml - restart_container("lightspeed-stack") + if "InvalidFeedbackStorageConfig" in scenario.effective_tags: + context.scenario_config = ( + f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-invalid-feedback-storage.yaml" + ) + if "NoCacheConfig" in scenario.effective_tags: + context.scenario_config = ( + f"tests/e2e/configuration/{mode_dir}/lightspeed-stack-no-cache.yaml" + ) + # Backup current config, switch to no-cache, and restart immediately + context.scenario_config_backup = create_config_backup("lightspeed-stack.yaml") + switch_config(context.scenario_config) # Copies to default lightspeed-stack.yaml + restart_container("lightspeed-stack") @@ def after_scenario(context: Context, scenario: Scenario) -> None: - if "InvalidFeedbackStorageConfig" in scenario.effective_tags: - switch_config(context.feature_config) - restart_container("lightspeed-stack") - if "NoCacheConfig" in scenario.effective_tags: - switch_config(context.feature_config) - restart_container("lightspeed-stack") + if "InvalidFeedbackStorageConfig" in scenario.effective_tags: + switch_config(context.feature_config) + restart_container("lightspeed-stack") + if "NoCacheConfig" in scenario.effective_tags: + # Restore the backed-up config for this scenario if available + if hasattr(context, "scenario_config_backup"): + switch_config(context.scenario_config_backup) + restart_container("lightspeed-stack") + remove_config_backup(context.scenario_config_backup) + delattr(context, "scenario_config_backup")This removes the dependency on
context.feature_configforNoCacheConfigrestores, avoidsAttributeErrorwhen the tag is used outsideAuthorizedfeatures, and ensures per‑scenario state is properly cleaned up so behavior doesn’t leak across scenarios. Based on learnings, Behave’sContextpersists across scenarios, so this cleanup is important for test isolation.tests/unit/app/endpoints/test_rags.py (1)
192-206: Resolve merge conflict in RagInfo mock class.The
RagInfo.__init__method contains unresolved merge conflict markers, causing multiple pipeline failures:
- IndentationError (syntax error)
- Ruff, Black, and Pylint parsing failures
Remove the conflict markers and decide whether to include the verbose docstring from upstream/main or keep the minimal version from HEAD. Ensure the resulting code is syntactically valid.
requirements.aarch64.txt (4)
1-4197: Unresolved merge conflicts in requirements.aarch64.txt—regenerate via uv from pyproject.toml.This file contains unresolved conflict markers (<<<<<<< HEAD, =======, >>>>>>>). Any marker will break pip/uv and CI. Do not hand-edit. Fix all conflicts in pyproject.toml, then regenerate all requirements-*.txt files using uv to ensure consistency across architectures.
The file includes conflicting pins for faiss-cpu, fastapi, greenlet, kubernetes, litellm, llama-stack, llama-stack-client, mcp, openai, opentelemetry packages (api, sdk, instrumentation, exporter variants, proto), protobuf, and urllib3. Additionally, python-jose (and ecdsa) appear only in the HEAD section and are absent from upstream/main—since authlib is present in both branches, python-jose is likely redundant and should be removed from pyproject.toml before regeneration.
2115-2213: Resolve the OpenTelemetry version conflict by selecting one coherent version set.The requirements file contains an unresolved merge conflict with split OpenTelemetry versions: HEAD uses 1.38.0 family + instrumentation 0.59b0, while upstream/main uses 1.39.0 family + instrumentation 0.60b0. Select one coherent set (e.g., 1.39.0 family + 0.60b0 instrumentation) and regenerate the requirements file to ensure all opentelemetry* packages maintain version coherence throughout requirements.aarch64.txt and related requirement files.
654-696: Resolve the merge conflict in requirements.aarch64.txt and regenerate with uv.The file contains unresolved conflict markers for faiss-cpu and fastapi. Accept the upstream/main versions (faiss-cpu==1.13.1 and fastapi==0.124.0), which match pyproject.toml. Then regenerate:
uv pip compile pyproject.toml -o requirements.aarch64.txt --generate-hashes --group llslibdev --python-platform aarch64-unknown-linux-gnu --torch-backend cpu --python-version 3.12faiss-cpu is pinned at top-level to maintain consistent control across dependency graphs; keep it uniform across architecture files.
1000-1118: Resolve merge conflict: greenlet requires a single pinned version.Remove the unresolved merge markers and choose one greenlet version (3.2.4 or 3.3.0). Both are compatible with SQLAlchemy on Python 3.12; 3.3.0 is newer with improved wheel packaging. After resolving the conflict, regenerate the requirements.aarch64.txt with uv to ensure consistency across all requirements files.
requirements.x86_64.txt (4)
654-696: Pick one: faiss-cpu 1.13.0 vs 1.13.1; keep policy pin.Choose a single version (likely 1.13.1) and ensure it’s pinned in pyproject before compiling, then propagate into all platform requirement files. Based on learnings, keep faiss-cpu pinned at the top-level.
Based on learnings, ...
2115-2212: Resolve the OpenTelemetry package version conflict in requirements.x86_64.txt.Git merge conflict markers show HEAD pinned to 1.38.x/0.59b0 while upstream/main uses 1.39.x/0.60b0. All opentelemetry packages (api, exporter-otlp, exporter-otlp-proto-*, instrumentation, proto, sdk, semantic-conventions) must be aligned to the same minor version. Select one version set, remove the conflict markers, update all affected packages consistently, and recompile to avoid compatibility issues.
1508-1531: litellm / llama-stack[‑client] bumps: verify API compatibility.Upgrading to llama-stack 0.3.0 introduces breaking changes: tools parameter schema changed from "parameters" to { "input_schema", "output_schema" }, API routes reorganized (v1/, v1alpha/, v1beta/), and some endpoints removed/unpublished. If these versions are merged, code must be updated for tool definitions and API endpoint usage; otherwise, keep existing pins (1.80.7, 0.2.22) and regenerate.
673-696: FastAPI and Starlette versions are incompatible.FastAPI 0.124.0 requires Starlette >=0.40.0 and <0.49.0. Starlette 0.50.0 exceeds the upper bound and will not work with FastAPI 0.124.0. Either downgrade Starlette to <0.49.0 or select a different FastAPI version. Resolve the merge conflict and pin compatible versions in pyproject.toml.
Also applies to: 3607-3613
tests/unit/models/responses/test_error_responses.py (1)
9-36: Resolve merge conflicts and adoptPromptTooLongResponsetestsThis test module currently contains unresolved merge conflict markers (
<<<<<<< HEAD,=======,>>>>>>> upstream/main), which breaks parsing and unit tests.You should:
- Keep the upstream/main side that introduces
PROMPT_TOO_LONG_DESCRIPTIONandPromptTooLongResponsein the imports.- Drop the conflict markers and ensure the new
TestPromptTooLongResponseclass is present once.For example:
-from models.responses import ( - BAD_REQUEST_DESCRIPTION, - FORBIDDEN_DESCRIPTION, - INTERNAL_SERVER_ERROR_DESCRIPTION, - NOT_FOUND_DESCRIPTION, -<<<<<<< HEAD -======= - PROMPT_TOO_LONG_DESCRIPTION, ->>>>>>> upstream/main - QUOTA_EXCEEDED_DESCRIPTION, +from models.responses import ( + BAD_REQUEST_DESCRIPTION, + FORBIDDEN_DESCRIPTION, + INTERNAL_SERVER_ERROR_DESCRIPTION, + NOT_FOUND_DESCRIPTION, + PROMPT_TOO_LONG_DESCRIPTION, + QUOTA_EXCEEDED_DESCRIPTION, @@ - NotFoundResponse, -<<<<<<< HEAD -======= - PromptTooLongResponse, ->>>>>>> upstream/main + NotFoundResponse, + PromptTooLongResponse, @@ -<<<<<<< HEAD -======= -class TestPromptTooLongResponse: - """Test cases for PromptTooLongResponse.""" - - def test_constructor_with_default_response(self) -> None: - """Test PromptTooLongResponse with default response.""" - response = PromptTooLongResponse( - cause="The prompt exceeds the maximum allowed length." - ) - assert isinstance(response, AbstractErrorResponse) - assert response.status_code == status.HTTP_413_REQUEST_ENTITY_TOO_LARGE - assert isinstance(response.detail, DetailModel) - assert response.detail.response == "Prompt is too long" - assert response.detail.cause == "The prompt exceeds the maximum allowed length." - - def test_openapi_response(self) -> None: - """Test PromptTooLongResponse.openapi_response() method.""" - schema = PromptTooLongResponse.model_json_schema() - model_examples = schema.get("examples", []) - expected_count = len(model_examples) - - result = PromptTooLongResponse.openapi_response() - assert result["description"] == PROMPT_TOO_LONG_DESCRIPTION - assert result["model"] == PromptTooLongResponse - assert "examples" in result["content"]["application/json"] - examples = result["content"]["application/json"]["examples"] - - # Verify example count matches schema examples count - assert len(examples) == expected_count - assert expected_count == 1 - - # Verify example structure - assert "prompt too long" in examples - prompt_example = examples["prompt too long"] - assert "value" in prompt_example - assert "detail" in prompt_example["value"] - assert prompt_example["value"]["detail"]["response"] == "Prompt is too long" - assert ( - prompt_example["value"]["detail"]["cause"] - == "The prompt exceeds the maximum allowed length." - ) - - def test_openapi_response_with_explicit_examples(self) -> None: - """Test PromptTooLongResponse.openapi_response() with explicit examples.""" - result = PromptTooLongResponse.openapi_response(examples=["prompt too long"]) - examples = result["content"]["application/json"]["examples"] - - # Verify only 1 example is returned when explicitly specified - assert len(examples) == 1 - assert "prompt too long" in examples - - ->>>>>>> upstream/main +class TestPromptTooLongResponse: + """Test cases for PromptTooLongResponse.""" + + def test_constructor_with_default_response(self) -> None: + """Test PromptTooLongResponse with default response.""" + response = PromptTooLongResponse( + cause="The prompt exceeds the maximum allowed length." + ) + assert isinstance(response, AbstractErrorResponse) + assert response.status_code == status.HTTP_413_REQUEST_ENTITY_TOO_LARGE + assert isinstance(response.detail, DetailModel) + assert response.detail.response == "Prompt is too long" + assert response.detail.cause == "The prompt exceeds the maximum allowed length." + + def test_openapi_response(self) -> None: + """Test PromptTooLongResponse.openapi_response() method.""" + schema = PromptTooLongResponse.model_json_schema() + model_examples = schema.get("examples", []) + expected_count = len(model_examples) + + result = PromptTooLongResponse.openapi_response() + assert result["description"] == PROMPT_TOO_LONG_DESCRIPTION + assert result["model"] == PromptTooLongResponse + assert "examples" in result["content"]["application/json"] + examples = result["content"]["application/json"]["examples"] + + # Verify example count matches schema examples count + assert len(examples) == expected_count + assert expected_count == 1 + + # Verify example structure + assert "prompt too long" in examples + prompt_example = examples["prompt too long"] + assert "value" in prompt_example + assert "detail" in prompt_example["value"] + assert prompt_example["value"]["detail"]["response"] == "Prompt is too long" + assert ( + prompt_example["value"]["detail"]["cause"] + == "The prompt exceeds the maximum allowed length." + ) + + def test_openapi_response_with_explicit_examples(self) -> None: + """Test PromptTooLongResponse.openapi_response() with explicit examples.""" + result = PromptTooLongResponse.openapi_response(examples=["prompt too long"]) + examples = result["content"]["application/json"]["examples"] + + # Verify only 1 example is returned when explicitly specified + assert len(examples) == 1 + assert "prompt too long" in examplesAfter this, the SyntaxError and Ruff “merge conflict markers detected” should go away.
Also applies to: 666-719
docs/config.html (1)
1-21: Resolve merge conflicts in generated config HTML (and regenerate from Markdown)
docs/config.htmlstill contains merge conflict markers across multiple sections (<<<<<<< HEAD,=======,>>>>>>> upstream/main). This makes the file invalid HTML and breaks tooling (Ruff/Black are already complaining).Given this file is clearly generated from
docs/config.md, the safest approach is:
- First, resolve the conflicts in
docs/config.md(see separate comment) so that it contains a single, correct version of the configuration schema, including the newAPIKeyTokenConfigurationsection andapi_key_configfield inAuthenticationConfiguration.- Then, regenerate
docs/config.htmlfrom the Markdown (e.g., via your existing pandoc or docs build pipeline), rather than hand‑editing the HTML.- Ensure the regenerated HTML no longer includes any conflict markers and that the tables (e.g., AuthenticationConfiguration) mention
api_key_configalongsidejwk_configandrh_identity_config.Until these conflicts are resolved and the HTML is regenerated, this doc will remain broken.
Also applies to: 191-268, 443-721, 1009-1266, 1570-1815, 2275-2452
docs/config.md (1)
10-23: Resolve Markdown merge conflicts and keep the API key schema updates
docs/config.mdalso has unresolved merge conflict markers, which breaks the Markdown and the HTML generation:
- At the top around
APIKeyTokenConfiguration(Lines ~10‑23).- In
AuthenticationConfigurationwhereapi_key_configis conditionally present (Lines ~59‑63).- In narrative text for
ModelContextProtocolServer,PostgreSQLDatabaseConfiguration, andServiceConfiguration.To fix:
- Remove all
<<<<<<< HEAD/=======/>>>>>>> upstream/mainmarkers.- Keep the upstream/main additions that introduce:
- The
## APIKeyTokenConfigurationsection with itsapi_keyfield table.- The
api_key_configrow in theAuthenticationConfigurationtable.- The reflowed one‑line paragraphs (they are equivalent content, just wrapped differently).
Conceptually, the cleaned sections should look like:
## APIKeyTokenConfiguration API Key Token configuration. | Field | Type | Description | |--------|--------|-------------| | api_key | string | | ... | Field | Type | Description | |----------------|--------|-------------| | module | string | | | skip_tls_verification | boolean | | | k8s_cluster_api | string | | | k8s_ca_cert_path | string | | | jwk_config | | | | api_key_config | | | | rh_identity_config | | |Once these conflicts are resolved, re‑run your docs build to regenerate
docs/config.htmland clear the markdownlint errors.Also applies to: 59-63, 318-331, 351-359, 473-482
tests/unit/app/endpoints/test_streaming_query_v2.py (1)
1-637: Critical: Resolve all merge conflicts before proceeding.This file contains unresolved merge conflicts throughout (lines 1-5, 12-19, 50-56, 89-95, 148-152, 184-198, 220-224, 240-244, 263-289, 309-315, 348-354, 405-409, 501-505, 557-637), causing all pipeline checks to fail:
- Unit tests:
SyntaxError: invalid syntax- Ruff/Black/Pylint: Cannot parse due to conflict markers
Key decisions to make when resolving:
Imports (lines 12-19): Choose between
HTTPException(HEAD) orRateLimitError(upstream) — likely need both if keeping the quota test.
stream_end_eventsignature (lines 148-152, 405-409, 501-505): Upstream useslambda _m, _t, _aq, _rd, _mediawhich aligns with new RAG chunk fields (rag_chunks,referenced_documents). HEAD useslambda _m, _s, _t, _media. Choose based on the production signature instreaming_query_v2.py.Conversation mocking (lines 50-56, 89-95, 309-315, 348-354): Upstream adds
mock_client.conversations.createmocks — include these ifretrieve_responsenow creates conversations.Error handling pattern (lines 263-289): HEAD raises
HTTPException; upstream returnsStreamingResponsewith error status code. Align with the production implementation.Conversation ID expectations (lines 220-224, 240-244): HEAD expects
conv-xyz; upstream expectsabc123def456. Align with howretrieve_responsereturns the conversation ID.Quota test (lines 557-637): Upstream adds a new test for
RateLimitError— keep if this error handling exists in production.
♻️ Duplicate comments (5)
src/utils/checks.py (1)
90-91: Fix misleading error message.The error message references
pathbut the actual check is forparent_pathwritability, making debugging more difficult.Apply this diff to clarify the error message:
if not os.access(parent_path, os.W_OK): - raise InvalidConfigurationError(f"{desc} '{path}' is not writable") + raise InvalidConfigurationError( + f"{desc} '{path}' cannot be created - parent directory '{parent_path}' is not writable" + )Based on past review comments, this issue was previously flagged but remains unresolved.
src/models/responses.py (1)
749-788:ConversationDeleteResponse.successshould reflectdeletedand examples must be updated
__init__still hardcodessuccess=True(Line 763), so the response claims success even whendeleted=False. The “not found” example also showssuccess: Truewith a failure message and uses the inconsistent phrase “can not be deleted”.This is the same issue previously flagged; please wire
successtodeletedand fix the example payload.Apply something like:
def __init__(self, *, deleted: bool, conversation_id: str) -> None: @@ response_msg = ( "Conversation deleted successfully" if deleted else "Conversation cannot be deleted" ) super().__init__( conversation_id=conversation_id, # type: ignore[call-arg] - success=True, # type: ignore[call-arg] + success=deleted, # type: ignore[call-arg] response=response_msg, # type: ignore[call-arg] ) @@ { "label": "not found", "value": { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", - "success": True, - "response": "Conversation can not be deleted", + "success": False, + "response": "Conversation cannot be deleted", }, },src/app/endpoints/query.py (2)
978-1005: Fix “attatchment” typos in validation errorsBoth error messages use “attatchment” instead of “attachment”, and a previous review called this out. These strings surface in HTTP 422 responses and should be correct and consistent.
- message = ( - f"Invalid attatchment type {attachment.attachment_type}: " + message = ( + f"Invalid attachment type {attachment.attachment_type}: " @@ - message = ( - f"Invalid attatchment content type {attachment.content_type}: " + message = ( + f"Invalid attachment content type {attachment.content_type}: "You may also want to align these messages with the example texts in
UnprocessableEntityResponse(“Invalid attribute value”) if you care about strict consistency.
792-797: Avoid logging full vector DB payloads at INFOBoth vector DB queries log the entire
query_responseat INFO, which can contain sensitive content and is also noisy:logger.info(f"The query response total payload: {query_response}")Prefer logging lightweight metadata (e.g., chunk count) at DEBUG instead, as in the diff above. This aligns with the earlier review feedback on the same issue.
Also applies to: 890-897
tests/unit/models/responses/test_successful_responses.py (1)
673-737: AlignConversationDeleteResponsetests with correctedsuccesssemanticsOnce
ConversationDeleteResponseis fixed to setsuccess=deleted, the tests here must be updated accordingly:
test_constructor_not_deletedshould expectsuccess is False.- The “not found” OpenAPI example should assert
"success": Falseand the corrected message"Conversation cannot be deleted".Proposed changes:
def test_constructor_not_deleted(self) -> None: """Test ConversationDeleteResponse when conversation cannot be deleted.""" response = ConversationDeleteResponse(deleted=False, conversation_id="conv-123") - assert response.success is True - assert response.response == "Conversation cannot be deleted" + assert response.success is False + assert response.response == "Conversation cannot be deleted" @@ not_found_example = examples["not found"] @@ - assert not_found_example["value"]["success"] is True - assert ( - not_found_example["value"]["response"] == "Conversation can not be deleted" - ) + assert not_found_example["value"]["success"] is False + assert ( + not_found_example["value"]["response"] == "Conversation cannot be deleted" + )This keeps the tests consistent with the corrected model and with the updated examples in
ConversationDeleteResponse.model_config.
🧹 Nitpick comments (2)
src/models/responses.py (1)
25-30: Avoid duplicatingRAGChunkmodel across modules
RAGChunkis now defined here and also insrc/utils/types.pywith the same fields. Maintaining two nearly identical models will drift over time and complicates typing (e.g.,TurnSummary.rag_chunksvsQueryResponse.rag_chunks). Consider making a single canonical definition (e.g., inutils.types) and importing/re-exporting it here, or at least aliasing one to the other to keep them in sync.src/app/endpoints/query.py (1)
879-936: Second vector DB query duplicates work and repeats issuesThe post‑agent block re‑lists vector DBs and re‑queries
vector_io.querywith almost identical parameters, again logging the full payload and re‑importingRAGChunk/ReferencedDocumentlocally. It also buildsRAGChunkinstances withcontent=str(chunk.content)andsourcefrom loosely typed attributes.Even if you decide to keep the second query (e.g., for different params vs. Solr filtering), it should:
- Reuse the module‑level
RAGChunk/ReferencedDocumentimports.- Avoid logging the full payload at INFO.
- Normalize
contentandsourcetypes the same way as in the pre‑agent block (e.g.,interleaved_content_as_str+str | Noneguarded from metadata).- Consider reusing the initial
vector_db_idsinstead of callingclient.vector_dbs.list()again.You may also want to reconsider whether two separate
vector_io.querycalls per request are necessary; a single call whose results are reused for both RAG context injection and finalrag_chunks/referenced_documentswould simplify the flow and reduce load on the vector backend.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
docs/config.html(12 hunks)docs/config.md(6 hunks)requirements.aarch64.txt(21 hunks)requirements.hermetic.txt(1 hunks)requirements.x86_64.txt(21 hunks)src/app/endpoints/query.py(10 hunks)src/models/requests.py(2 hunks)src/models/responses.py(19 hunks)src/utils/checks.py(1 hunks)tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml(1 hunks)tests/e2e/features/conversation_cache_v2.feature(2 hunks)tests/e2e/features/environment.py(1 hunks)tests/integration/endpoints/test_query_v2_integration.py(10 hunks)tests/unit/app/endpoints/test_rags.py(2 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(15 hunks)tests/unit/authentication/test_api_key_token.py(3 hunks)tests/unit/models/responses/test_error_responses.py(4 hunks)tests/unit/models/responses/test_successful_responses.py(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/models/requests.py
🧰 Additional context used
📓 Path-based instructions (7)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/e2e/features/environment.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_successful_responses.pytests/unit/authentication/test_api_key_token.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_rags.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_successful_responses.pytests/unit/authentication/test_api_key_token.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/endpoints/test_rags.py
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/conversation_cache_v2.feature
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/utils/checks.pysrc/app/endpoints/query.pysrc/models/responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/responses.py
🧠 Learnings (9)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/features/environment.pytests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/unit/authentication/test_api_key_token.py
📚 Learning: 2025-08-25T09:11:38.701Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 445
File: tests/e2e/features/environment.py:0-0
Timestamp: 2025-08-25T09:11:38.701Z
Learning: In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared, so per-scenario state should be reset in hooks to maintain test isolation.
Applied to files:
tests/e2e/features/environment.py
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-08-18T10:57:39.266Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:59-59
Timestamp: 2025-08-18T10:57:39.266Z
Learning: In the lightspeed-stack project, transitive dependencies like faiss-cpu are intentionally pinned as top-level dependencies to maintain better control over the dependency graph and avoid version conflicts when bundling ML/LLM tooling packages.
Applied to files:
requirements.aarch64.txtrequirements.x86_64.txt
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.
Applied to files:
requirements.aarch64.txtrequirements.x86_64.txt
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.
Applied to files:
requirements.aarch64.txtrequirements.x86_64.txt
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
src/app/endpoints/query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Use Python package manager `uv` with `uv run` prefix for all development commands
Applied to files:
requirements.hermetic.txt
🧬 Code graph analysis (5)
tests/e2e/features/environment.py (1)
tests/e2e/utils/utils.py (2)
switch_config(98-106)restart_container(130-144)
tests/unit/models/responses/test_successful_responses.py (1)
src/models/responses.py (3)
RAGChunk(25-30)ToolCall(33-38)ToolsResponse(90-125)
tests/unit/app/endpoints/test_streaming_query_v2.py (2)
src/app/endpoints/streaming_query_v2.py (1)
streaming_query_endpoint_handler_v2(318-345)src/models/requests.py (1)
QueryRequest(73-275)
src/app/endpoints/query.py (2)
src/models/responses.py (4)
ToolCall(33-38)model(1525-1529)ReferencedDocument(326-339)RAGChunk(25-30)src/utils/types.py (2)
TurnSummary(135-234)RAGChunk(127-132)
src/models/responses.py (2)
src/utils/types.py (1)
RAGChunk(127-132)src/quota/quota_exceed_error.py (1)
QuotaExceedError(6-38)
🪛 GitHub Actions: Black
tests/integration/endpoints/test_query_v2_integration.py
[error] 84-84: Cannot parse for target version Python 3.13: 84:0: <<<<<<< HEAD
tests/unit/models/responses/test_successful_responses.py
[error] 1-1: Cannot parse for target version Python 3.13: 1:0: <<<<<<< HEAD
[warning] Black would reformat this file. Run 'black --write' to fix code style issues.
tests/unit/authentication/test_api_key_token.py
[error] 5-5: Cannot parse for target version Python 3.13: 5:0: <<<<<<< HEAD
tests/unit/models/responses/test_error_responses.py
[error] 14-14: Cannot parse for target version Python 3.13: 14:0: <<<<<<< HEAD
[warning] Black would reformat this file. Run 'black --write' to fix code style issues.
tests/unit/app/endpoints/test_streaming_query_v2.py
[error] 1-1: Cannot parse for target version Python 3.13: 1:0: <<<<<<< HEAD
tests/unit/app/endpoints/test_rags.py
[error] 193-193: Cannot parse for target version Python 3.13: 193:0: <<<<<<< HEAD
src/app/endpoints/query.py
[error] Black would reformat this file. Run 'black --write' to fix code style issues.
src/models/responses.py
[error] 1-1: Black would reformat this file. Run 'black --write' to fix code style issues.
🪛 GitHub Actions: Integration tests
tests/integration/endpoints/test_query_v2_integration.py
[error] 84-84: SyntaxError: invalid syntax due to merge conflict markers (<<<<<<< HEAD)
🪛 GitHub Actions: Pyright
src/app/endpoints/query.py
[error] 753-753: pyright: Operator '+' not supported for 'None' (reportOptionalOperand).
[error] 778-778: pyright: 'vector_db_ids' is possibly unbound (reportPossiblyUnboundVariable).
[error] 816-816: pyright: Argument missing for parameter 'doc_id' (reportCallIssue).
[error] 840-840: pyright: Argument of type 'float | str | List[object] | object | Literal[True]' cannot be assigned to parameter 'url' of type 'AnyStr@urljoin | None' in function 'urljoin'.
[error] 848-848: pyright: 'RAGChunk' is possibly unbound (reportPossiblyUnboundVariable).
[error] 849-849: pyright: Argument of type 'InterleavedContent' cannot be assigned to parameter 'content' of type 'str' in function 'init'.
[error] 850-850: pyright: Argument of type 'str | bool | float | List[object] | object | None' cannot be assigned to parameter 'source' of type 'str | None' in function 'init'.
[error] 875-875: pyright: Argument of type 'list[str | ToolgroupAgentToolGroupWithArgs] | Unknown | list[str] | None' cannot be assigned to parameter 'toolgroups' of type 'List[Toolgroup] | None'.
[error] 937-937: pyright: Argument missing for parameter 'tool_results' (reportCallIssue).
🪛 GitHub Actions: Python linter
tests/integration/endpoints/test_query_v2_integration.py
[error] 84-84: Pylint parsing failed: 'invalid syntax (tests.integration.endpoints.test_query_v2_integration, line 84)'
tests/unit/models/responses/test_successful_responses.py
[error] 56-56: Pylint parsing failed: 'unmatched ')' (tests.unit.models.responses.test_successful_responses, line 56)'
tests/unit/authentication/test_api_key_token.py
[error] 5-5: Pylint parsing failed: 'invalid syntax (tests.unit.authentication.test_api_key_token, line 5)'
tests/unit/models/responses/test_error_responses.py
[error] 14-14: Pylint parsing failed: 'invalid syntax (tests.unit.models.responses.test_error_responses, line 14)'
tests/unit/app/endpoints/test_streaming_query_v2.py
[error] 1-1: Pylint parsing failed: 'invalid syntax (tests.unit.app.endpoints.test_streaming_query_v2, line 1)'
tests/unit/app/endpoints/test_rags.py
[error] 193-193: Pylint parsing failed: 'expected an indented block after function definition on line 192 (tests.unit.app.endpoints.test_rags, line 193)'
🪛 GitHub Actions: Ruff
tests/integration/endpoints/test_query_v2_integration.py
[error] 84-84: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
tests/unit/models/responses/test_successful_responses.py
[error] 1-100: Merge conflict markers detected in file (<<<<<<< HEAD / ======= / >>>>>>> upstream/main). Resolve merge conflicts to restore valid syntax.
tests/unit/authentication/test_api_key_token.py
[error] 5-5: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
tests/unit/models/responses/test_error_responses.py
[error] 14-14: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
tests/unit/app/endpoints/test_streaming_query_v2.py
[error] 1-100: Merge conflict markers detected in file (<<<<<<< HEAD / ======= / >>>>>>> upstream/main). Resolve merge conflicts to restore valid syntax.
tests/unit/app/endpoints/test_rags.py
[error] 192-192: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
src/app/endpoints/query.py
[error] 14-14: F401: 'pydantic.ValidationError' imported but unused. Remove unused import.
[error] 1-1: Unresolved merge conflict markers detected in file (<<<<<<< HEAD).
🪛 GitHub Actions: Type checks
src/app/endpoints/query.py
[error] 753-753: mypy: Unsupported left operand type for + ("None").
[error] 793-793: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | NotGiven'.
[error] 840-840: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'.
🪛 GitHub Actions: Unit tests
tests/unit/models/responses/test_error_responses.py
[error] 14-14: SyntaxError: invalid syntax due to merge conflict markers (<<<<< HEAD) in test file
tests/unit/app/endpoints/test_streaming_query_v2.py
[error] 1-1: SyntaxError: invalid syntax due to merge conflict markers (<<<<< HEAD) in test file
tests/unit/app/endpoints/test_rags.py
[error] 193-193: IndentationError / SyntaxError: invalid syntax due to merge conflict markers (e.g., '<<<<<<< HEAD') in test_rags.py
🪛 LanguageTool
docs/config.md
[style] ~323-~323: Consider using “inaccessible” to avoid wordiness.
Context: ...figured in the llama-stack run.yaml are not accessible to lightspeed-core agents. ======= MCP ...
(NOT_ABLE_PREMIUM)
[style] ~328-~328: Consider using “inaccessible” to avoid wordiness.
Context: ...figured in the llama-stack run.yaml are not accessible to lightspeed-core agents. >>>>>>> upst...
(NOT_ABLE_PREMIUM)
🪛 markdownlint-cli2 (0.18.1)
docs/config.md
10-10: Heading style
Expected: atx; Actual: setext
(MD003, heading-style)
59-59: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe
(MD055, table-pipe-style)
59-59: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe
(MD055, table-pipe-style)
59-59: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
60-60: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe
(MD055, table-pipe-style)
60-60: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe
(MD055, table-pipe-style)
60-60: Table column count
Expected: 3; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
61-61: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
318-318: Heading style
Expected: atx; Actual: setext
(MD003, heading-style)
351-351: Heading style
Expected: atx; Actual: setext
(MD003, heading-style)
472-472: Heading style
Expected: atx; Actual: setext
(MD003, heading-style)
🪛 YAMLlint (1.37.1)
tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
[error] 19-19: syntax error: could not find expected ':'
(syntax)
🔇 Additional comments (20)
tests/unit/authentication/test_api_key_token.py (1)
134-139: Verify inconsistentdetailformat across error cases.This test expects
detailto be a plain string ("Invalid API Key"), while other error tests expectdetailto be a dict with a"cause"key. Confirm this asymmetry is intentional in the implementation, or align the assertion style for consistency.tests/integration/endpoints/test_query_v2_integration.py (1)
398-405: Resolve merge conflicts for tool call field name assertions.Unresolved merge conflict markers at three locations prevent the file from being parsed:
- Line 398-405:
test_query_v2_endpoint_with_tool_calls- Line 469-476:
test_query_v2_endpoint_with_mcp_list_tools- Line 543-547:
test_query_v2_endpoint_with_multiple_tool_typesHEAD uses
tool_calls[0].tool_name, while upstream/main usestool_calls[0].name. Determine the correct field name by checking theToolCallSummarymodel definition, then resolve all three conflicts consistently. Also remove the conditionalif response.rag_chunks:guard introduced in HEAD (lines 401-402) if it was not in upstream/main.requirements.aarch64.txt (7)
1234-1236: Clean up conflict markers in requirements.aarch64.txt.Lines 1234-1236 contain stray conflict markers (<<<<<<, =======, >>>>>>>) around a comment. Regenerate the file with
uvafter resolving the conflict between upstream and HEAD.
2587-2610: Resolve protobuf version conflict in requirements.aarch64.txt (6.33.1 vs 6.33.2). Pick one version and ensure it's consistent with googleapis-common-protos and opentelemetry-proto dependencies. Update pyproject.toml if needed.
1508-1531: Versions shown are already in lockstep, not diverged.The snippet shows versions moving together: litellm 1.80.7→1.80.8, llama-stack 0.2.22→0.3.0, and llama-stack-client 0.2.22→0.3.0. All three are synchronized. Verify the actual merged state of requirements.aarch64.txt to confirm this was resolved correctly.
Likely an incorrect or invalid review comment.
3001-3007: Verify python-jose necessity before merging; confirm it's not redundant with existing authlib/pyjwt stack.The review raises a valid concern about duplicate JWT/cryptography dependencies—repo already has authlib and pyjwt. However, direct verification of python-jose usage in the codebase could not be completed due to technical constraints. Before merging:
- Search the codebase for
from jose importorimport josepatterns to confirm if python-jose is actually used- If unused, remove python-jose and regenerate lock files
- If required, document the rationale in code comments or dependency documentation
- Ensure cryptography dependency pins remain consistent across the stack
Also check lines 639-644, 3525-3527, 3374-3379 for consistency.
1494-1501: Resolve kubernetes version conflict between 33.1.0 and 34.1.0.This merge conflict requires a decision on which version to use. Consult the kubernetes-client/python CHANGELOG between v33.1.0 and v34.1.0 to identify any breaking changes, then verify compatibility with your kubernetes client usage patterns and regenerate with uv.
2098-2110: Resolve openai version drift (2.8.1 vs 2.9.0) by pinning a single version.Version 2.9.0 includes streaming fixes (stream closure, response config handling) but maintains API compatibility with 2.8.1; no breaking changes to streaming behavior. However, verify jiter (Rust-based JSON parser dependency) compatibility in your target environments, as it can be problematic in some runtimes. Align the pin in pyproject.toml and recompile with uv.
1830-1837: Resolve the mcp version conflict (1.22.0 vs 1.23.1) in requirements.aarch64.txt.This file contains a merge conflict. Determine which version is specified in pyproject.toml and update the requirements file accordingly, or manually choose the intended version and regenerate the locked requirements if this file is auto-generated.
requirements.x86_64.txt (5)
2639-2706: Retain psycopg2-binary pin (policy).psycopg2-binary is correctly pinned; ensure it remains in pyproject and survives re‑compile since some llama‑stack providers require it. Based on learnings, ...
3001-3007: Verify python-jose dependency requirement before resolving merge conflict.This file shows a merge conflict where HEAD adds python-jose==3.5.0 while upstream removes it. Before resolving, confirm whether python-jose is actually used in the codebase. Search for
joseimports and usage (includingjose.jwtand related auth functions). If the new auth implementation doesn't depend on it, remove the dependency from both the requirements file and pyproject.toml. If it is required, ensure the pinned version and hashes are intentional.Note: This concern also applies to the related dependency occurrences at lines 639-644, 2771-2773, 3525-3527, and 3374-3379.
605-607: The review comment addresses a critical issue (unresolved merge conflicts in lockfiles making them unusable), but verification of the specific technical claims requires direct access to the repository code, which is currently unavailable.
1000-1117: Resolve the greenlet merge conflict by choosing version 3.3.0.Remove the conflicting versions (3.2.4 vs 3.3.0) and standardize on greenlet 3.3.0, which is compatible with SQLAlchemy 2.0.44 and includes wheels and improved support for CPython 3.14. Also applies to lines 3607-3613.
3808-3818: Prefer urllib3==2.6.1; both versions compatible with requests 2.32.5 and kubernetes client.urllib3 2.6.1 and 2.3.0 are both compatible with requests 2.32.5 (which supports urllib3 >=1.21.1 and <3). The kubernetes Python client imposes no specific urllib3 version constraint and uses the environment's installed version. Accept the upstream version (2.6.1) as it is newer and includes recent fixes. Before merging, verify no other transitive dependencies in the environment pin urllib3<2.6 (e.g., older botocore versions can conflict), though this is unlikely given the upstream choice.
src/models/responses.py (3)
332-339:ReferencedDocumentextension looks consistentThe addition of
doc_idand wideningdoc_urltoAnyUrl | str | Nonematch how it’s used inquery.py(vector DB metadata and Solr docs). No issues from a modeling perspective.
370-452:QueryResponseRAG/tool fields wiring is soundAdding
rag_chunks: list[RAGChunk]andtool_calls: list[ToolCall] | Noneplus updating the examples keeps the model aligned with the new query pipeline (TurnSummary + tool execution metadata). Defaults ([]/None) also match the usage inquery.py.
1102-1143: Abstract error response OpenAPI helpers look correctThe new
AbstractErrorResponse.__init__,get_description(), andopenapi_response()implementations are consistent with the subclasses and the tests (labelled examples, description-from-class-variable, JSON examples underdetail). No functional issues here.src/app/endpoints/query.py (3)
427-445: Improved database and rate‑limit error handling looks goodCatching
SQLAlchemyErrorwithlogger.exceptionand mapping it toInternalServerErrorResponse.database_error(), plus translatingRateLimitErrorintoQuotaExceededResponse.model(), is consistent with the error model and the FastAPIHTTPExceptionpattern you’re using elsewhere.
1008-1035:get_rag_toolgroupshelper matchesToolgrouptyping, but consider simplifying call sitesThe helper correctly returns
list[Toolgroup] | None, usingToolgroupAgentToolGroupWithArgsfor the RAG knowledge search tool. This is compatible with the llama‑stack types.Given the call sites now need to deal with
Noneand concatenate with MCP toolgroup names, using this helper together with the “extend and or None” pattern (shown in the earlier diff) will avoidNone + listissues and make the intent clearer.No functional issues in the helper itself.
746-769: Fix vector DB / RAG setup: unboundvector_db_ids, type issues, andRAGChunkconstructionThis block has several correctness and typing issues that CI is flagging:
vector_db_idsis only assigned inside theelsebranch of theno_toolscheck, but it's used here unconditionally (if vector_db_ids:). Whenquery_request.no_toolsisTrue, this will be empty/undefined, leading to the pyright "possibly unbound" and a potential runtime error.toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroupsis problematic becauseget_rag_toolgroupscan returnNone, and you're adding a list ofstrto a list ofToolgroup(type mismatch andNone + listrisk). This is what mypy/pyright are complaining about.params = {"k": 5, "score_threshold": 0.0}does not match the union type expected byAsyncVectorIoResource.query; CI reports an incompatible dict type.urljoin("https://mimir.corp.redhat.com", parent_id)is called withparent_idfrom chunk metadata, which is typed as a very broadobjectunion; pyright rightfully complains aboutAnyStrincompatibility.RAGChunk(content=chunk.content, source=source, score=score)passes anInterleavedContentand an untyped metadata value into fields typed asstr/str | None, which triggers the content/source type errors.Consider applying the suggested fixes to resolve the unbound variable, type mismatches for
params,urljoin, andRAGChunkfields, and remove the INFO‑level logging of the full vector DB payload. Re‑run mypy/pyright after applying.
60c3c4f to
c971439
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (11)
src/utils/types.py (1)
36-50: Alignget_tool_callssignature, docstring, and guard condition.The method is typed as
get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall], but the docstring still documentsCompletionMessage | Noneand you guard withif output_message and output_message.tool_calls:. Given the non‑optional type, thatif output_messageis redundant and the docstring is misleading.Consider tightening this to keep everything in sync:
- def get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall]: + def get_tool_calls(self, output_message: CompletionMessage) -> list[ToolCall]: @@ - Parameters: - output_message (CompletionMessage | None): Completion - message potentially containing `tool_calls`. + Parameters: + output_message (CompletionMessage): Completion message potentially + containing `tool_calls`. @@ - if output_message and output_message.tool_calls: - return output_message.tool_calls - return [] + return output_message.tool_calls or [](or, if
Noneis actually possible at call sites, make the parameterCompletionMessage | Noneand keep the guard.)src/utils/endpoints.py (1)
345-375: Inconsistent return order betweenget_agentandget_temp_agent.
get_agentreturns(agent, conversation_id, session_id)(line 342), butget_temp_agentreturns(agent, session_id, conversation_id)(line 375). This inconsistency is error-prone for callers.Additionally, the docstring (lines 359-360) says it returns
tuple[AsyncAgent, str]with "(agent, session_id)" but the actual return type annotation showstuple[AsyncAgent, str, str].- Returns: - tuple[AsyncAgent, str]: A tuple containing the agent and session_id. + Returns: + tuple[AsyncAgent, str, str]: A tuple containing (agent, conversation_id, session_id). """ ... - return agent, session_id, conversation_id + return agent, conversation_id, session_idtests/unit/app/endpoints/test_streaming_query.py (2)
147-172: Assertions insidepytest.raisesblock are never executedIn
test_streaming_query_endpoint_handler_configuration_not_loaded, the status/detail assertions are placed inside thewith pytest.raises(HTTPException) as e:block, after the awaited call. Once the exception is raised, code after the call inside the context is not executed, so those assertions never run.Move the assertions outside the
withblock, e.g.:- with pytest.raises(HTTPException) as e: - await streaming_query_endpoint_handler(request, query_request, auth=MOCK_AUTH) - assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR - assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore + with pytest.raises(HTTPException) as e: + await streaming_query_endpoint_handler(request, query_request, auth=MOCK_AUTH) + + assert e.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR + assert e.value.detail["response"] == "Configuration is not loaded" # type: ignore
175-205: Connection‑error test likely not exercising the real failure path
test_streaming_query_endpoint_on_connection_errorcurrently:
- Patches
mock_client.models.side_effect = APIConnectionError(...), and- Patches
client.AsyncLlamaStackClientHolder.get_clienttwice on the same target, and- Only asserts that a
StreamingResponseis returned (no check that an error event is actually streamed).If the production code calls
await client.models.list(...)(as other tests suggest by settingmock_client.models.list.return_value), the side effect onmodelswill never be triggered, so this test will not verify the connection‑error behavior.Consider:
- mock_client = mocker.AsyncMock() - mock_client.models.side_effect = APIConnectionError(request=query_request) # type: ignore - mock_lsc = mocker.patch("client.AsyncLlamaStackClientHolder.get_client") - mock_lsc.return_value = mock_client - mock_async_lsc = mocker.patch("client.AsyncLlamaStackClientHolder.get_client") - mock_async_lsc.return_value = mock_client + mock_client = mocker.AsyncMock() + mock_client.models.list = mocker.AsyncMock( + side_effect=APIConnectionError(request=query_request) # type: ignore + ) + mocker.patch( + "client.AsyncLlamaStackClientHolder.get_client", + return_value=mock_client, + )and then asserting that the streamed content actually contains an error event (status 503, appropriate message) rather than only checking the response type.
docs/openapi.json (5)
1366-1373: 422 example contradicts schema: QueryRequest only requires 'query'.Examples say missing ['query','model','provider'], but model/provider are optional in the schema. Update examples to only require 'query' (or change the schema if intentional).
- "cause": "Missing required attributes: ['query', 'model', 'provider']", + "cause": "Missing required attributes: ['query']",Applies to v1 query 422, v2 query 422, and v2 streaming_query 422.
Also applies to: 3586-3593, 3868-3875
2608-2613: 'not found' delete example sets success=true; fix success=false and grammar.A failed delete should not return success=true; also prefer “cannot”.
- "response": "Conversation can not be deleted", - "success": true + "response": "Conversation cannot be deleted", + "success": falseReflect the same in components/ConversationDeleteResponse examples.
Also applies to: 5005-5011
1374-1380: Typo: 'attatchment' → 'attachment'.Correct the spelling across all error examples.
- "cause": "Invalid attatchment type: must be one of [...] + "cause": "Invalid attachment type: must be one of [...]Also applies to: 1654-1661, 3594-3601, 3877-3881, 7702-7707
4284-4290: metrics error responses: text/plain uses JSON object schemas.For 401/403/500/503 you declare text/plain but reference JSON object schemas. Either:
- Return JSON only (application/json), or
- Keep text/plain with a simple string schema.
Recommend JSON-only for errors to match the rest of the API.
- "text/plain": { "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" } } + // remove text/plain; keep application/json onlyRepeat for 403/500/503 blocks.
Also applies to: 4306-4311, 4328-4333, 4351-4355
67-71: Root 401 text/html schema references a JSON object; use string.Change text/html schema to a simple string for the error page (or remove if you always emit JSON for errors).
- "text/html": { "schema": { "$ref": "#/components/schemas/UnauthorizedResponse" } } + "text/html": { "schema": { "type": "string" } }src/app/endpoints/query_v2.py (1)
131-136: PotentialAttributeErrorifidattribute is missing.On line 132,
getattr(output_item, "id")is called without a default value, which will raiseAttributeErrorif the attribute doesn't exist. Other similar calls in this function use fallback patterns (e.g., line 90-92 forfunction_call).Apply this diff to add a fallback:
return ToolCallSummary( - id=str(getattr(output_item, "id")), + id=str(getattr(output_item, "id", "") or ""), name=DEFAULT_RAG_TOOL, args=args, response=response_payload, )src/models/responses.py (1)
1-3: Black formatting check failed.The pipeline indicates Black formatting issues in this file. Run
black src/models/responses.pyto fix formatting.#!/bin/bash # Check Black formatting issues uv tool run black --check --diff src/models/responses.py 2>&1 | head -50
♻️ Duplicate comments (18)
tests/unit/authentication/test_api_key_token.py (1)
5-5: Tests read cleanly after conflict resolution; directdetail["cause"]checks are fine here.The FastAPI imports are now normalized, and the updated assertions on
exc_info.value.detail["cause"]keep the tests concise while still validating the exact error causes. Given that you control the exception construction in the auth dependency, this level of defensiveness is sufficient for unit tests.Also applies to: 74-74, 97-97
requirements.hermetic.txt (1)
1-2: Clarify the rationale for pinning older versions of build tools, particularly pip 24.2 which contains known security vulnerabilities.The versions pinned in requirements.hermetic.txt are indeed downgrades: uv 0.8.15 (September 3, 2025) is approximately 3 months older than uv 0.9.16 (December 6, 2025), and pip 24.2 (July 28, 2024) is over a year older than pip 25.3 (October 24-25, 2025).
The pip 24.2 downgrade is particularly concerning. This version contains multiple unpatched security vulnerabilities:
- CVE-2025-8869: Symlink/tar extraction vulnerability allowing file extraction outside target directory (unfixed in pip <25.0)
- Arbitrary code execution via self_outdated_check: pip loads untrusted code after installation (unfixed in pip <25.0)
If these downgrades are necessary for compatibility with specific components, that rationale should be documented in a comment or README. If not, consider upgrading to the latest stable versions.
src/utils/checks.py (1)
79-91: New parent-directory check is breaking tests; also keep error message aligned with what is checkedThe added parent-directory checks change behavior in a way that’s currently failing integration tests:
- When
must_exists=Falseandmust_be_writable=True, a missing parent now raisesInvalidConfigurationError("... cannot be created - parent directory does not exist"). The pipeline failure for'/tmp/data/feedback'shows this is hit in normal test configuration, so either:
- tests/infra need to ensure the parent directory (e.g.
/tmp/data) is created up front, or- this check should be relaxed or adjusted (e.g. test writability of the nearest existing ancestor, or let later code
os.makedirs()the full path and surface that error instead).Additionally, the branch that checks
os.access(parent_path, os.W_OK)still reports"{desc} '{path}' is not writable", even though the actual check is on the parent directory. To keep the error self-explanatory, consider something like:- if not os.access(parent_path, os.W_OK): - raise InvalidConfigurationError(f"{desc} '{path}' is not writable") + if not os.access(parent_path, os.W_OK): + raise InvalidConfigurationError( + f"{desc} '{path}' cannot be created - parent directory '{parent_path}' is not writable" + )This both fixes the message and makes it clear why creation of
pathwould fail.requirements.aarch64.txt (1)
1409-1411: urllib3 downgrade reintroduces known security issues; resolve via authoritative constraints, not this lockfile.You’ve moved to
kubernetes==34.1.0while downgradingurllib3to2.3.0. Earlier analysis on this PR already noted that 2.3.x carries known redirect/info‑disclosure and DoS‑style vulnerabilities that were fixed in later 2.x releases, and recommended landing on a secureurllib3version that also satisfies the kubernetes client’s constraint.Right now this looks like the compatibility problem was “fixed” by picking an insecure
urllib3. Instead:
- Decide the authoritative constraint (in
pyproject.tomlor your constraints file) for bothkubernetesandurllib3so you can use a version ofurllib3that includes the security fixes (per the earlier bot comment, a >=2.5.x/2.6.x line, or whatever the current secure guidance is) while remaining within the kubernetes client’s supported range.- Re‑run
uv pip compileto regenerate this file instead of editingrequirements.aarch64.txtby hand.Until that’s done, this environment is exposed to the urllib3 issues that motivated the original upgrade.
To double‑check the current safe combo for this exact kubernetes client version, please run a web search like:
What urllib3 version range is officially supported by the Python kubernetes client version 34.1.0, and what is the earliest urllib3 2.x release that includes fixes for the 2.3.x redirect/info‑disclosure and decompression/chained-encoding DoS issues?Also applies to: 3601-3603
src/app/endpoints/query.py (4)
796-796: Avoid logging entire query response payloads at INFO level.This was flagged in a past review. Logging the full
query_responsepayload may expose sensitive document content.
929-941: Fix typo: "attatchment" should be "attachment".This was flagged in a past review. The typo appears in error messages at lines 930 and 940.
836-852: Fix type errors in RAGChunk construction.Pipeline reports multiple type errors:
- Line 840:
urljoinreceivesobjectinstead ofstr- Line 848:
RAGChunkis possibly unbound (local import issue)- Line 849:
chunk.contentisInterleavedContent, notstr- Line 850:
sourcetype mismatchThe
RAGChunkimport at lines 799-800 is inside a conditional block and may not execute. Also,chunk.metadata.get()returnsobject, notstr.- if chunk.metadata: - if OFFLINE: - parent_id = chunk.metadata.get("parent_id") - if parent_id: - source = urljoin("https://mimir.corp.redhat.com", parent_id) + if chunk.metadata: + if OFFLINE: + parent_id = chunk.metadata.get("parent_id") + if parent_id and isinstance(parent_id, str): + source = urljoin("https://mimir.corp.redhat.com", parent_id) else: - source = chunk.metadata.get("reference_url") + ref_url = chunk.metadata.get("reference_url") + source = ref_url if isinstance(ref_url, str) else None # ... rag_chunks.append( RAGChunk( - content=chunk.content, + content=str(chunk.content) if chunk.content else "", source=source, score=score, ) )Also, move the
RAGChunkimport to module level (this was flagged in past review).
14-14: Remove unused importValidationError.Pipeline failure confirms this import is unused. Remove it to fix the Ruff F401 error.
-from pydantic import ValidationErrorsrc/app/endpoints/streaming_query.py (5)
81-84: OFFLINE flag duplicated from query.py.This was flagged in a past review. Extract to a shared constants module.
925-930:ResponseGeneratorContextdoes not havevector_io_rag_chunksorvector_io_referenced_docsattributes.Pipeline reports
Cannot assign to attribute "vector_io_rag_chunks"and"vector_io_referenced_docs". Thehasattrcheck will always returnFalsefor a dataclass/Pydantic model without these fields defined.This was flagged in a past review. Add these fields to
ResponseGeneratorContext.
1022-1022: Avoid logging entire query response payloads.This was flagged in a past review. Move to DEBUG level and log metadata only.
Also applies to: 1264-1264
1046-1051: Hardcoded URL should be extracted to configuration.This was flagged in a past review. The URL
https://mimir.corp.redhat.comappears multiple times.Also applies to: 1067-1069
1243-1298: Significant code duplication withquery_vector_io_for_chunks.This was flagged in a past review. The vector DB querying logic is duplicated between
query_vector_io_for_chunks(lines 982-1091) andretrieve_response(lines 1243-1298). Additionally,query_vector_io_for_chunksis called at line 906-908 ANDretrieve_responsequeries vector DB again, resulting in duplicate queries per request.src/app/endpoints/conversations_v2.py (1)
99-101: Typo "Converastion" still present.The typo "Converastion" should be "Conversation" at lines 99, 126, and 161.
docs/config.html (1)
205-247: Clarify auth config references and RHIdentity semantics in schema docIn the
AuthenticationConfigurationtable, thejwk_configandrh_identity_configrows have empty Type/Description cells, even though dedicated sections forJwkConfigurationandRHIdentityConfigurationexist below. Consider:
- Setting the Type to the concrete schema name (
JwkConfiguration,RHIdentityConfiguration), and- Adding a brief description (e.g., “JWT-based auth via JWK” / “Red Hat Identity-based auth”).
In the
RHIdentityConfigurationsection, it would also help to explicitly document what happens whenrequired_entitlementsis omitted or an empty list (e.g., whether entitlement checks are skipped vs. requests fail). This keeps runtime behavior unambiguous for operators.Also applies to: 1086-1103
tests/e2e/configs/run-azure.yaml (1)
93-103: Parameterize Braintrust OpenAI API key via env var instead of literalIn the Azure E2E config, the
braintrustscoring provider usesopenai_api_key: '********'. Similar to the CI config, it’s preferable to reference an environment variable (or remove the provider if unused) so the key is not hard‑coded and can be provided securely at runtime, e.g.:scoring: - provider_id: braintrust provider_type: inline::braintrust config: openai_api_key: ${env.BRAINTRUST_OPENAI_API_KEY}This keeps the config consistent with how other credentials (e.g.,
AZURE_API_KEY) are handled.src/models/responses.py (2)
746-762:successfield still hardcoded toTrue.This issue was flagged in a previous review but remains unfixed. When
deleted=False,successshould also beFalse.super().__init__( conversation_id=conversation_id, # type: ignore[call-arg] - success=True, # type: ignore[call-arg] + success=deleted, # type: ignore[call-arg] response=response_msg, # type: ignore[call-arg] )
775-782: Example inconsistency remains unfixed.The "not found" example still shows
success: Trueand uses "can not be deleted" (with space) while the code uses "cannot be deleted" (no space).{ "label": "not found", "value": { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", - "success": True, - "response": "Conversation can not be deleted", + "success": False, + "response": "Conversation cannot be deleted", }, },
🧹 Nitpick comments (21)
tests/integration/test_openapi_json.py (2)
87-99: Type annotation onspeccould be more precise/consistentUsing bare
dictworks, but other helpers in this file usedict[str, Any]. For consistency and stricter typing, consider annotatingspecasdict[str, Any]orMapping[str, Any]instead of plaindict.
185-187: Fixture argument type hints now use plaindictSwitching
spec_from_file/spec_from_urlannotations fromdict[str, Any]todictis harmless at runtime but slightly reduces type precision and differs from other helpers. Consider standardizing on one style (e.g.,dict[str, Any]everywhere orMapping[str, Any]) for readability.Also applies to: 255-257
docker-compose.yaml (1)
26-26: Debug logging flag is reasonable; consider making it easily tunable.Enabling
LLAMA_STACK_LOGGING=all=debugon the llama‑stack service is helpful while stabilizing the new flows. If this compose file is also used beyond local dev, you might optionally gate this via an env default (e.g.,${LLAMA_STACK_LOGGING:-all=debug}) so you can dial it back without editing the file, and confirm thatall=debugmatches the logging syntax for the llama‑stack version you now support.scripts/query_llm.py (1)
16-16: Docstring is correct but could be slightly more descriptive.The new
main()docstring is accurate but very generic. Optional: expand it to mention that it sends a single/v1/queryrequest to the local LLM endpoint and prints the response plus timing, which would make the script’s intent clearer when read in isolation.scripts/generate_openapi_schema.py (1)
26-27: Consider retaining parameter and return value documentation in docstrings.The docstrings for
read_version_from_openapiandread_version_from_pyprojecthave been condensed to single-line descriptions. While concise, they now lack parameter and return type documentation that would help future maintainers understand the function contracts.Per coding guidelines requiring Google Python docstring conventions, consider including Args and Returns sections:
def read_version_from_openapi(filename: str) -> str: - """Read version from OpenAPI.json file.""" + """Read version from OpenAPI.json file. + + Args: + filename: Path to the OpenAPI.json file. + + Returns: + Version string extracted from the 'info.version' field. + """As per coding guidelines, all functions should have docstrings with brief descriptions following Google Python docstring conventions.
Also applies to: 38-39
docs/config.md (1)
300-304: Optional: Consider more concise wording.The MCP server description has been updated for clarity. The static analysis tool suggests using "inaccessible" instead of "not accessible" on line 304 for more concise wording, though the current phrasing is perfectly clear and correct.
If you prefer more concise wording:
-Only MCP servers defined in the lightspeed-stack.yaml configuration are -available to the agents. Tools configured in the llama-stack run.yaml -are not accessible to lightspeed-core agents. +Only MCP servers defined in the lightspeed-stack.yaml configuration are +available to the agents. Tools configured in the llama-stack run.yaml +are inaccessible to lightspeed-core agents.However, the current wording is clear and acceptable as-is.
src/utils/types.py (2)
92-163: RAG chunk extraction silently drops some valid-but-unexpected JSON; consider a more forgiving fallback.
_extract_rag_chunks_from_responsehandles the expected shapes well (dict with"chunks", or list of dicts), and falls back to a single chunk when JSON parsing fails or when a structural exception is raised. But there are two notable edge cases where you currently drop the response entirely:
response_contentparses as a dict without a"chunks"key (e.g.{"text": "foo"}) – no chunks are appended and no exception is raised, so the outerexceptdoesn’t trigger.- It parses as a list of non‑dict items (e.g.
["foo", "bar"]) – they’re skipped by theisinstance(chunk, dict)guard without a fallback.If you want to retain as much signal as possible, a small tweak like this would keep behavior robust while preserving the structured cases:
- if isinstance(data, dict) and "chunks" in data: - for chunk in data["chunks"]: - self.rag_chunks.append( - RAGChunk( - content=chunk.get("content", ""), - source=chunk.get("source"), - score=chunk.get("score"), - ) - ) - elif isinstance(data, list): - # Handle list of chunks - for chunk in data: - if isinstance(chunk, dict): - self.rag_chunks.append( - RAGChunk( - content=chunk.get("content", str(chunk)), - source=chunk.get("source"), - score=chunk.get("score"), - ) - ) + if isinstance(data, dict) and "chunks" in data: + for chunk in data["chunks"]: + self.rag_chunks.append( + RAGChunk( + content=chunk.get("content", ""), + source=chunk.get("source"), + score=chunk.get("score"), + ) + ) + elif isinstance(data, list): + # Handle list of chunks (dicts or plain values) + for chunk in data: + if isinstance(chunk, dict): + self.rag_chunks.append( + RAGChunk( + content=chunk.get("content", str(chunk)), + source=chunk.get("source"), + score=chunk.get("score"), + ) + ) + else: + self.rag_chunks.append( + RAGChunk( + content=str(chunk), + source=DEFAULT_RAG_TOOL, + score=None, + ) + ) + else: + # Unknown JSON shape: keep the whole payload as a single chunk + if response_content.strip(): + self.rag_chunks.append( + RAGChunk( + content=response_content, + source=DEFAULT_RAG_TOOL, + score=None, + ) + )That way, any non‑empty response still yields at least one
RAGChunk, even when the tool payload format drifts slightly.
15-29: Minor guideline nits: logging and type hints onSingleton.__call__.Two small style/guideline points:
- Module has no
logger = logging.getLogger(__name__)despite the project guideline to standardize logging per–module.Singleton.__call__is currently untyped (*args, **kwargsonly,# type: ignore); guidelines call for complete type annotations on functions, even if the return type is justSingleton/object.These don’t block functionality, but cleaning them up would keep this module aligned with the rest of the stack.
Also applies to: 89-96
src/app/endpoints/query.py (1)
818-821: Hardcoded URL should be extracted to configuration.The URL
https://mimir.corp.redhat.comis hardcoded. This should be moved to configuration for flexibility across environments.+# At module level or in configuration +MIMIR_BASE_URL = configuration.mimir_base_url or "https://mimir.corp.redhat.com" + # In the code: - doc_url="https://mimir.corp.redhat.com" - + reference_doc, + doc_url=urljoin(MIMIR_BASE_URL, reference_doc),src/app/endpoints/streaming_query.py (1)
1126-1126: Functionretrieve_responsehas too many statements (75/50).Pipeline reports this function exceeds the statement limit. Consider extracting the RAG context injection logic into a helper function.
async def _build_rag_context( client: AsyncLlamaStackClient, query_request: QueryRequest, vector_db_ids: list[str], ) -> tuple[list[RAGChunk], str]: """Build RAG context from vector DB query.""" rag_chunks: list[RAGChunk] = [] rag_context = "" # ... extracted logic from lines 1243-1313 return rag_chunks, rag_contextrun.yaml (1)
121-121: Hardcoded Solr URL may need environment configuration.The
solr_url: "http://localhost:8983/solr"is hardcoded. For production deployments across different environments, consider using an environment variable.config: - solr_url: "http://localhost:8983/solr" + solr_url: ${env.SOLR_URL:-http://localhost:8983/solr}tests/integration/endpoints/test_query_v2_integration.py (1)
307-372: Re‑evaluate LCORE‑1025 skips for tool‑call integration testsBoth
test_query_v2_endpoint_with_tool_callsandtest_query_v2_endpoint_with_multiple_tool_typesare permanently skipped via@pytest.mark.skip(reason="LCORE-1025: ToolCallSummary.response type mismatch"). These tests exercise the new RAG/tool-call surface and are valuable regression coverage.If LCORE‑1025 is now resolved (the ToolCall / ToolCallSummary shapes look stabilized elsewhere in this PR), consider removing the skips so these paths are exercised again. If the issue is still open, keeping the skip is fine but it’s worth confirming the ticket status and planning to re‑enable once fixed.
Also applies to: 439-507
tests/unit/app/endpoints/test_streaming_query.py (1)
372-757: Inconsistent async test marking may cause tests to be skipped or mis‑runSeveral async tests (
test_retrieve_response_vector_db_available,test_retrieve_response_no_available_shields,test_retrieve_response_one_available_shields,test_retrieve_response_two_available_shields,test_retrieve_response_four_available_shields, and the attachment variants) are defined asasync defbut lack@pytest.mark.asyncio, while other async tests in this module are explicitly marked.Depending on your
pytest-asyncioconfiguration, these functions may not run as intended or may be skipped. For consistency and clarity, consider adding@pytest.mark.asyncioto these async tests as well.tests/e2e/configs/run-ci.yaml (1)
99-102: Avoid hard‑coding Braintrust OpenAI API key placeholder in CI configThe
braintrustscoring provider is configured with a literalopenai_api_key: '********'. For CI/e2e configs this is better modeled as an environment‑driven value (or omitted entirely if Braintrust isn’t used in tests), e.g.:scoring: - provider_id: braintrust provider_type: inline::braintrust config: openai_api_key: ${env.BRAINTRUST_OPENAI_API_KEY}This avoids committing even placeholder secret‑looking values and makes it clear where the real key should come from at runtime.
src/app/endpoints/streaming_query_v2.py (1)
154-162: Consider catching specific exception types.The broad
except Exceptionclause catches all exceptions, which can mask unexpected errors. Since you're accessing attributes,AttributeErrorwould be more appropriate.# Emit start on response.created if event_type == "response.created": try: conv_id = getattr(chunk, "response").id - except Exception: # pylint: disable=broad-except + except AttributeError: logger.warning("Missing response id!") conv_id = "" yield stream_start_event(conv_id) continuedocs/openapi.json (1)
6631-6645: Constrain media_type to known values.Provide an enum for media_type to guide clients and validation.
"media_type": { - "anyOf": [ { "type": "string" }, { "type": "null" } ], + "anyOf": [ + { "type": "string", "enum": ["application/json", "text/plain"] }, + { "type": "null" } + ], "title": "Media Type", "description": "Media type for the response format", "examples": ["application/json", "text/plain"] }src/app/endpoints/query_v2.py (3)
138-145: Missing fallback foridattribute inweb_search_call.Similar to the
file_search_callcase,getattr(output_item, "id")on line 141 lacks a default value. For consistency with thefunction_callpattern (lines 90-92), consider adding a fallback.if item_type == "web_search_call": args = {"status": getattr(output_item, "status", None)} + call_id = getattr(output_item, "id", None) or getattr(output_item, "call_id", None) return ToolCallSummary( - id=str(getattr(output_item, "id")), + id=str(call_id), name="web_search", args=args, response=None, )
409-415: TODO: Document parsing not yet implemented.The function returns an empty list with a clear TODO explaining the needed work. This is acceptable as a placeholder, but tracking the implementation would be valuable.
Would you like me to open an issue to track the implementation of document parsing from the Responses API?
519-539: Consider makingmax_num_resultsconfigurable.The
max_num_results: 10is hardcoded. Consider making this configurable via theAppConfigif different use cases require different limits.src/models/responses.py (1)
367-376: Inconsistent default patterns forrag_chunksvstool_calls.
rag_chunksdefaults to[](empty list) whiletool_callsdefaults toNone. Consider using consistent defaults for similar collection fields.rag_chunks: list[RAGChunk] = Field( - [], + default_factory=list, description="List of RAG chunks used to generate the response", ) tool_calls: list[ToolCall] | None = Field( - None, + default_factory=list, description="List of tool calls made during response generation", )src/models/config.py (1)
1281-1284: Consider adding error handling for file operations.The
dumpmethod opens and writes to a file without exception handling. Consider wrapping in try-except for better error reporting.def dump(self, filename: str = "configuration.json") -> None: """Dump actual configuration into JSON file.""" - with open(filename, "w", encoding="utf-8") as fout: - fout.write(self.model_dump_json(indent=4)) + try: + with open(filename, "w", encoding="utf-8") as fout: + fout.write(self.model_dump_json(indent=4)) + except OSError as e: + raise ValueError(f"Failed to dump configuration to {filename}: {e}") from e
c971439 to
4ca24a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (7)
run.yaml (1)
140-143: Updateprovider_shield_idto reference a valid Llama-Guard model.The
provider_shield_idis set to"gpt-3.5-turbo", which is an OpenAI model, not a Llama-Guard model. For thellama-guardprovider,provider_shield_idmust be a Llama-Guard model identifier (e.g.,"meta-llama/Llama-Guard-3-8B").src/app/endpoints/query.py (3)
878-889: Add missingtool_resultsparameter toTurnSummary.The
TurnSummaryconstructor is missing the requiredtool_resultsparameter. This will cause a runtimeTypeError.summary = TurnSummary( llm_response=( interleaved_content_as_str(response.output_message.content) if ( getattr(response, "output_message", None) is not None and getattr(response.output_message, "content", None) is not None ) else "" ), tool_calls=[], + tool_results=[], rag_chunks=rag_chunks, )
745-758: Fix type error:get_rag_toolgroupsmay returnNone.Line 752 concatenates the result of
get_rag_toolgroups(vector_db_ids)withmcp_toolgroups, butget_rag_toolgroupscan returnNone. This will cause a TypeError at runtime.toolgroups = None if vector_db_ids: - toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups + rag_toolgroups = get_rag_toolgroups(vector_db_ids) + toolgroups = (rag_toolgroups or []) + mcp_toolgroups elif mcp_toolgroups: toolgroups = mcp_toolgroups
798-799: Move imports to module level.Importing
RAGChunkandReferencedDocumentinside the function is inefficient. Move these to the module-level imports at the top of the file.+from models.responses import ( + ForbiddenResponse, + QueryResponse, + RAGChunk, + ReferencedDocument, + ToolCall, + UnauthorizedResponse, +) ... if query_response.chunks: - from models.responses import RAGChunk, ReferencedDocument - retrieved_chunks = query_response.chunkssrc/app/endpoints/streaming_query.py (3)
77-80: Extract OFFLINE flag to shared constants module.This constant is duplicated from query.py. Extract it to a shared constants module (e.g.,
src/app/constants.py) and import from both files to ensure consistency and follow DRY principles.
1045-1046: Extract hardcoded Mimir base URL to configuration.The URL
https://mimir.corp.redhat.comis hardcoded here (and duplicated at lines 1065, 1241). Extract this to a configuration value for flexibility across environments.
1208-1275: Significant code duplication: vector DB querying logic.Lines 1208-1275 duplicate the vector DB query logic from
query_vector_io_for_chunks(lines 979-1088). Both blocks:
- Query vector_io with identical params setup
- Process chunks with OFFLINE flag logic
- Extract metadata and build RAGChunk objects
Consider consolidating by having
retrieve_responsecallquery_vector_io_for_chunksto get the chunks, then format the RAG context from those results.- # Get RAG chunks before sending to LLM (reuse logic from query_vector_io_for_chunks) - rag_chunks = [] - try: - if vector_db_ids: - # ... 60+ lines of duplicate logic - except Exception as e: - logger.warning("Failed to query vector database for chunks: %s", e) + # Reuse query_vector_io_for_chunks for consistency + rag_chunks, _ = await query_vector_io_for_chunks(client, query_request) # Format RAG context for injection into user message rag_context = ""
🧹 Nitpick comments (1)
lightspeed-stack.yaml (1)
24-28: Consider persistent storage for production deployments.The conversation cache uses
/tmp/data/conversation-cache.dbwhich won't survive container restarts. This is fine for development/testing as indicated by the comment, but production deployments should use a persistent volume mount or external database.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
lightspeed-stack.yaml(1 hunks)run.yaml(3 hunks)src/app/endpoints/query.py(8 hunks)src/app/endpoints/streaming_query.py(14 hunks)src/models/requests.py(2 hunks)src/models/responses.py(1 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/models/requests.py
- src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_conversations_v2.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_conversations_v2.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.py
🧠 Learnings (1)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
Applied to files:
src/app/endpoints/streaming_query.py
🧬 Code graph analysis (1)
src/app/endpoints/streaming_query.py (4)
src/models/responses.py (2)
RAGChunk(149-154)ReferencedDocument(179-191)src/utils/types.py (1)
TurnSummary(89-163)src/models/requests.py (1)
QueryRequest(73-233)src/app/endpoints/query.py (1)
get_rag_toolgroups(951-978)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (5)
tests/unit/app/endpoints/test_conversations_v2.py (1)
103-103: LGTM! Test updated to match model type changes.The assertion correctly uses
str()to compare doc_url, accommodating the type change in ReferencedDocument where doc_url is nowstr | Optional[AnyUrl].tests/unit/app/endpoints/test_query.py (1)
1014-1014: LGTM! Tests updated to match model type changes.The test assertions correctly use
str()for doc_url comparisons, aligning with the ReferencedDocument type change. The malformed URL test (lines 1041-1043) properly verifies that invalid URLs are stored as raw strings rather than being rejected.Also applies to: 1041-1043, 1066-1066, 1093-1095
run.yaml (1)
155-160: LGTM! Granite embedding model configuration is consistent.The granite-embedding model configuration correctly specifies the embedding dimension (384) which matches the vector_io configuration at line 125. The model identifier
ibm-granite/granite-embedding-30m-englishis properly formatted for the sentence-transformers provider.src/app/endpoints/streaming_query.py (1)
174-246: LGTM! Streaming end event properly includes RAG data.The stream_end_event function correctly includes both
rag_chunksandvector_io_referenced_docsin the final streaming response. The implementation properly converts RAG chunks to dict format and combines metadata_map documents with vector_io documents.src/app/endpoints/query.py (1)
856-867: LGTM! RAG context injection implementation.The RAG context formatting and injection logic properly:
- Limits to top 5 chunks
- Formats each chunk with source attribution
- Prepends relevant documentation to the user query
- Logs the injection for debugging
This approach cleanly augments the user message with retrieved context before the LLM call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
run.yaml (1)
29-35: Fix duplicate top‑levelmodelsandvector_dbskeys in YAML.You now have two
models:mappings (lines 29–35 and 140–150) and twovector_dbs:mappings (lines 134–139 and 155), which YAMLlint flags as duplicate keys. Most parsers will silently let the last one win, so the earlier definitions are ignored.Consolidate these into single blocks, e.g.:
- Move all model declarations into the bottom
models:list and delete the earliermodels:at lines 29–35 (or vice‑versa).- Keep only one
vector_dbs:mapping (likely the newportal-ragblock) and remove the trailingvector_dbs: [].Until this is fixed, the effective config will be ambiguous and may not match your intent.
Also applies to: 134-150, 154-154
src/app/endpoints/streaming_query.py (3)
54-71: ResolveRAGChunk/ReferencedDocumentimport issues and redefinitions.
models.responsesapparently does not exportRAGChunk(Ruff reports “no name 'RAGChunk' in module 'models.responses'”), and you also importReferencedDocumentagain fromutils.endpoints. This leads to F811/F821 redefinition errors.- You only need a single canonical
ReferencedDocumenttype (and likely only frommodels.responses).Suggestion:
-from models.responses import ( - ForbiddenResponse, - RAGChunk, - ReferencedDocument, - UnauthorizedResponse, - ... -) -from utils.endpoints import ( - ReferencedDocument, - check_configuration_loaded, - ... -) +from models.responses import ( + ForbiddenResponse, + UnauthorizedResponse, + RAGChunk, # if actually defined there; otherwise import from its real module + ReferencedDocument, + ... +) +from utils.endpoints import ( + check_configuration_loaded, + cleanup_after_streaming, + create_rag_chunks_dict, + get_agent, + get_system_prompt, + validate_model_provider_override, +)Adjust the
RAGChunkimport to match where it is really defined. This will clear the redefinition/undefined-name warnings.
902-1064: Broken context wiring and dead nestedresponse_generatorblock.Inside
streaming_query_endpoint_handler_base:
- You define a second async
response_generator(lines 926–1047) that references many undefined names (interleaved_content_as_str,get_session,create_referenced_documents_with_metadata,CacheEntry) and acontextvariable that is never created.- The old
ResponseGeneratorContextinitialization is fully commented out (1049–1060), yet you still callcreate_response_generator_func(context)(line 1063), which will raiseNameErrorat runtime.- The nested
response_generatoris never used; the actual generator comes fromcreate_response_generator_func.This block needs to be either completed or removed. Given you already have
create_agent_response_generator, the simplest fix is to delete the unused nested generator and reinstate a validResponseGeneratorContext:- metadata_map: dict[str, dict[str, Any]] = {} - - async def response_generator(...): - ... - cache_entry = CacheEntry(...) - # Create context object for response generator -# context = ResponseGeneratorContext( -# ... -# ) + metadata_map: dict[str, dict[str, Any]] = {} + + context = ResponseGeneratorContext( + conversation_id=conversation_id, + user_id=user_id, + skip_userid_check=_skip_userid_check, + model_id=model_id, + provider_id=provider_id, + llama_stack_model_id=llama_stack_model_id, + query_request=query_request, + started_at=started_at, + client=client, + metadata_map=metadata_map, + )…and delete the entire nested
async def response_generator(...)block if it is not intended to be used anymore.As written, this function will not work and is the main source of the
context/undefined-name errors from Ruff and mypy.
1357-1455: Normalize vector DB toolgroup handling and RAG chunk typing inretrieve_response.In
retrieve_response(streaming path):
toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroupswill fail ifget_rag_toolgroupsreturnsNone(mypy error and potential TypeError).paramsandurljoinsuffer from the same typing issues asquery_vector_io_for_chunks.RAGChunk(content=chunk.content, source=source, ...)again passes non‑string content and potentially non‑string source.You can mirror the patterns used in the earlier suggestion:
- vector_dbs = await client.vector_dbs.list() - vector_db_ids = [vdb.identifier for vdb in vector_dbs] - mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers] - - toolgroups = None - if vector_db_ids: - toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups - elif mcp_toolgroups: - toolgroups = mcp_toolgroups + vector_dbs = await client.vector_dbs.list() + vector_db_ids = [vdb.identifier for vdb in vector_dbs] + mcp_toolgroups = [mcp_server.name for mcp_server in configuration.mcp_servers] + + rag_toolgroups = get_rag_toolgroups(vector_db_ids) if vector_db_ids else None + toolgroups = (rag_toolgroups or []) + mcp_toolgroups if mcp_toolgroups else rag_toolgroups @@ - params = {"k": 5, "score_threshold": 0.0} + params: dict[str, Any] = {"k": 5, "score_threshold": 0.0} @@ - parent_id = chunk.metadata.get("parent_id") - if parent_id: - source = urljoin( - "https://mimir.corp.redhat.com", parent_id - ) + parent_id = chunk.metadata.get("parent_id") + if parent_id and isinstance(parent_id, str): + source = urljoin( + "https://mimir.corp.redhat.com", parent_id + ) @@ - rag_chunks.append( - RAGChunk( - content=chunk.content, - source=source, - score=score, - ) - ) + rag_chunks.append( + RAGChunk( + content=str(chunk.content) if chunk.content else "", + source=source if isinstance(source, str) else None, + score=score, + ) + )This addresses the mypy errors and ensures
RAGChunkinstances have the expected types.src/app/endpoints/query.py (1)
582-613: Fix return type and docstring ofparse_metadata_from_text_item.Two small issues:
- The function is annotated as
Optional[ReferencedDocument]but returns alist[ReferencedDocument]whentext_itemis not aTextContentItem:docs: list[ReferencedDocument] = [] if not isinstance(text_item, TextContentItem): return docs # wrong type
- The docstring says it returns
ReferencedDocumentbut doesn’t mentionNone.Suggested fix:
- docs: list[ReferencedDocument] = [] - if not isinstance(text_item, TextContentItem): - return docs + if not isinstance(text_item, TextContentItem): + return None @@ - if url and title: - return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None) + if url and title: + return ReferencedDocument(doc_url=url, doc_title=title, doc_id=None)Optionally update the docstring “Returns” section to mention that
Noneis returned when no valid metadata is found.
♻️ Duplicate comments (5)
src/app/endpoints/query.py (4)
78-82: Shared OFFLINE flag should be centralized (optional).You now have an
OFFLINEconstant here and instreaming_query.pywith identical semantics. Consider moving this to a shared config/constants module and reading from configuration/env rather than hard‑coding it in two places, to avoid divergence later.
874-885: EnsureTurnSummaryis fully populated and referenced documents include RAG docs.You now construct:
summary = TurnSummary( llm_response=..., tool_calls=[], rag_chunks=rag_chunks, ) referenced_documents = parse_referenced_documents(response) referenced_documents.extend(doc_ids_from_chunks)Two points:
- If
TurnSummaryalso expectstool_results, confirm that its default is optional; otherwise addtool_results=[]explicitly (similar to other code paths in the repo).- Extending
referenced_documentswithdoc_ids_from_chunksis correct, but ensuredoc_ids_from_chunkselements are built with the sameReferencedDocumenttype (which they are) and that the “Mimir” base URL is eventually configurable rather than hard‑coded.If
tool_resultsis required, adjust:summary = TurnSummary( llm_response=..., - tool_calls=[], - rag_chunks=rag_chunks, + tool_calls=[], + tool_results=[], + rag_chunks=rag_chunks, )Also applies to: 887-891
915-942: Fix “attatchment” typo in validation error messages.The error messages in
validate_attachments_metadatastill contain the typo:"Invalid attatchment type ..." "Invalid attatchment content type ..."This was previously flagged. Update to:
- message = ( - f"Invalid attatchment type {attachment.attachment_type}: " + message = ( + f"Invalid attachment type {attachment.attachment_type}: " @@ - message = ( - f"Invalid attatchment content type {attachment.content_type}: " + message = ( + f"Invalid attachment content type {attachment.content_type}: "This improves API error clarity and keeps tests/docs consistent.
766-826: Fix vector IOparamstyping, logging, and exception message formatting inretrieve_response.Issues in this block:
params = {"k": 5, "score_threshold": 0.0}is inferred asdict[str, float], which doesn’t match the more generaldict[str, bool | float | str | Iterable | object | None]expected byvector_io.query(mypy error).- Logging uses f-strings with potentially large
query_responsepayloads at INFO level (may leak content and is noisy).- Exception logging uses f-strings inside logger methods (
logger.info(f"...")) instead of%sformatting (style, not correctness).Suggested cleanup:
- params = {"k": 5, "score_threshold": 0.0} - logger.info(f"Initial params: {params}") - logger.info(f"query_request.solr: {query_request.solr}") + params: dict[str, Any] = {"k": 5, "score_threshold": 0.0} + logger.info("Initial params: %s", params) + logger.info("query_request.solr: %s", query_request.solr) @@ - params["solr"] = query_request.solr - logger.info(f"Final params with solr filters: {params}") + params["solr"] = query_request.solr + logger.info("Final params with solr filters: %s", params) @@ - logger.info(f"Final params being sent to vector_io.query: {params}") + logger.info("Final params being sent to vector_io.query: %s", params) @@ - logger.info(f"The query response total payload: {query_response}") + logger.debug( + "Vector IO query returned %d chunks", + len(query_response.chunks) if query_response.chunks else 0, + ) @@ - except Exception as e: - logger.warning(f"Failed to query vector database for chunks: {e}") - logger.debug(f"Vector DB query error details: {traceback.format_exc()}") + except Exception as e: + logger.warning("Failed to query vector database for chunks: %s", e) + logger.debug("Vector DB query error details: %s", traceback.format_exc())This addresses the mypy error and improves logging hygiene.
src/app/endpoints/streaming_query.py (1)
1153-1262: Type and logging issues inquery_vector_io_for_chunks.Several problems here affect type‑checking and logging:
params = {"k": 5, "score_threshold": 0.0}is inferred asdict[str, float], which doesn’t match the vector_ioparamstype.parent_id = chunk.metadata.get("parent_id")is typed asobject; callingurljoin(..., parent_id)without a guard causes mypy/pyright errors and could pass non‑strings at runtime.RAGChunk(content=chunk.content, source=source, ...)passes anInterleavedContent(or similar) object into astrfield, andsourcemay not be astr.logger.info("The query response total payload: %s", query_response)logs the entire payload at INFO, which is noisy and may leak content (similar to past comments).Suggested fixes:
- params = {"k": 5, "score_threshold": 0.0} + params: dict[str, Any] = {"k": 5, "score_threshold": 0.0} @@ - logger.info("The query response total payload: %s", query_response) + logger.debug( + "Vector IO query returned %d chunks", + len(query_response.chunks) if query_response.chunks else 0, + ) @@ - if OFFLINE: - parent_id = chunk.metadata.get("parent_id") - if parent_id: - source = urljoin( - "https://mimir.corp.redhat.com", parent_id - ) + if OFFLINE: + parent_id = chunk.metadata.get("parent_id") + if parent_id and isinstance(parent_id, str): + source = urljoin( + "https://mimir.corp.redhat.com", parent_id + ) @@ - final_rag_chunks.append( - RAGChunk( - content=chunk.content, - source=source, - score=score, - ) - ) + final_rag_chunks.append( + RAGChunk( + content=str(chunk.content) if chunk.content else "", + source=source if isinstance(source, str) else None, + score=score, + ) + )Also consider moving the hard‑coded Mimir base URL into configuration (as prior reviews suggested).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
run.yaml(2 hunks)src/app/endpoints/query.py(8 hunks)src/app/endpoints/streaming_query.py(10 hunks)src/models/requests.py(2 hunks)src/models/responses.py(1 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/models/requests.py
- src/models/responses.py
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/app/endpoints/test_query.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/streaming_query.pysrc/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-10-29T13:05:22.438Z
Learnt from: luis5tb
Repo: lightspeed-core/lightspeed-stack PR: 727
File: src/app/endpoints/a2a.py:43-43
Timestamp: 2025-10-29T13:05:22.438Z
Learning: In the lightspeed-stack repository, endpoint files in src/app/endpoints/ intentionally use a shared logger name "app.endpoints.handlers" rather than __name__, allowing unified logging configuration across all endpoint handlers (query.py, streaming_query.py, a2a.py).
Applied to files:
src/app/endpoints/streaming_query.py
🪛 GitHub Actions: Black
src/app/endpoints/streaming_query.py
[error] 1-1: Black formatting check failed. 1 file would be reformatted.
🪛 GitHub Actions: Integration tests
src/app/endpoints/query.py
[error] 1-1: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' while importing module 'app.endpoints.query'.
🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py
[warning] 54-54: Reimport 'ForbiddenResponse' (imported line 54) (reimported)
[warning] 54-54: Reimport 'UnauthorizedResponse' (imported line 54) (reimported)
[error] 54-54: No name 'RAGChunk' in module 'models.responses' (no-name-in-module)
[warning] 70-70: Reimport 'ReferencedDocument' (imported line 54) (reimported)
[error] 159-159: Too many arguments (7/5) (too-many-arguments)
[error] 159-159: Too many positional arguments (7/5) (too-many-positional-arguments)
[warning] 164-164: Unused argument 'referenced_documents' (unused-argument)
[error] 1016-1016: Undefined variable 'get_session' (undefined-variable)
[error] 1029-1029: Undefined variable 'create_referenced_documents_with_metadata' (undefined-variable)
[error] 1037-1037: Undefined variable 'CacheEntry' (undefined-variable)
[warning] 1015-1015: Unused variable 'topic_summary' (unused-variable)
[warning] 1037-1037: Unused variable 'cache_entry' (unused-variable)
[error] 1063-1063: Undefined variable 'context' (undefined-variable)
[warning] 1435-1435: Catching too general exception Exception (broad-exception-caught)
[error] 1265-1265: Too many statements (74/50) (too-many-statements)
🪛 GitHub Actions: Ruff
src/app/endpoints/streaming_query.py
[error] 55-55: F811: Redefinition of unused 'ForbiddenResponse'.
[error] 67-67: F811: Redefinition of unused 'UnauthorizedResponse'.
[error] 71-71: F821: Undefined or redefined name 'ReferencedDocument'.
[error] 960-960: F821: Undefined name 'interleaved_content_as_str'.
[error] 1016-1016: F821: Undefined name 'get_session'.
[error] 1023-1023: F841: Local variable 'topic_summary' assigned to but never used.
[error] 1029-1029: F821: Undefined name 'create_referenced_documents_with_metadata'.
[error] 1037-1037: F841: Local variable 'cache_entry' assigned to but never used.
[error] 1037-1037: F821: Undefined name 'CacheEntry'.
[error] 1063-1063: F811: Redefinition of unused 'response_generator'.
[error] 1063-1063: F821: Undefined name 'context'.
src/app/endpoints/query.py
[error] 22-22: F401: Remove unused import: 'Document' from llama_stack_client.types.agents.turn_create_params.
🪛 GitHub Actions: Type checks
tests/unit/app/endpoints/test_conversations_v2.py
[error] Process completed with exit code 1.
tests/unit/app/endpoints/test_query.py
[error] Process completed with exit code 1.
src/app/endpoints/streaming_query.py
[error] 809-809: mypy: Argument 4 to 'stream_end_event' has incompatible type 'Any | list[Any]'; expected 'dict[str, int]'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 960-960: mypy: Name 'interleaved_content_as_str' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1016-1016: mypy: Name 'get_session' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1029-1029: mypy: Name 'create_referenced_documents_with_metadata' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1037-1037: mypy: Name 'CacheEntry' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1063-1063: mypy: Name 'context' is not defined. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1190-1190: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1238-1238: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1400-1400: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1400-1400: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 1414-1414: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] Process completed with exit code 1.
src/app/endpoints/query.py
[error] 748-748: mypy: Unsupported left operand type for + ('None'). (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 788-788: mypy: Argument 'params' to 'query' of 'AsyncVectorIoResource' has incompatible type 'dict[str, float]'; expected 'dict[str, bool | float | str | Iterable[object] | object | None] | Omit'. (Note: dict is invariant; consider using Mapping) (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] 835-835: mypy: Value of type variable 'AnyStr' of 'urljoin' cannot be 'object'. (Command: 'uv run mypy --explicit-package-bases --disallow-untyped-calls --disallow-untyped-defs --disallow-incomplete-defs --ignore-missing-imports --disable-error-code attr-defined src/')
[error] Process completed with exit code 1.
run.yaml
[error] Process completed with exit code 1.
🪛 GitHub Actions: Unit tests
tests/unit/app/endpoints/test_query.py
[error] 22-22: ImportError while importing test module due to missing dependency ('llama_stack_client.types.agents').
src/app/endpoints/query.py
[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents'. This import error occurs while importing Turn from llama_stack_client.types.agents in app.endpoints.query, causing test collection to fail during the 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing' step.
🪛 YAMLlint (1.37.1)
run.yaml
[error] 140-140: duplication of key "models" in mapping
(key-duplicates)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: server mode / azure
🔇 Additional comments (4)
tests/unit/app/endpoints/test_conversations_v2.py (1)
93-97: Updateddoc_urlassertion matches widened type.Using
str(ref_docs[0]["doc_url"]) == "http://example.com"is consistent withReferencedDocument.doc_urlnow being either a plain string or AnyUrl; the test is correct and resilient.Also applies to: 111-116
tests/unit/app/endpoints/test_query.py (1)
1009-1023: Doc URL expectations correctly adapted to string/URL union.The updated assertions (
str(doc.doc_url) == ...) and the “malformed URL” test expecting a non‑NoneReferencedDocumentwith rawdoc_urlalign with the newReferencedDocument.doc_urlsemantics. The test logic itself looks sound; remaining test failures stem from import errors inapp.endpoints.query.Also applies to: 1041-1051, 1070-1075, 1097-1103
src/app/endpoints/query.py (2)
299-310:query_endpoint_handler_basenow depends on extendedretrieve_responsecontract — ensure all call sites match.You’ve updated the base handler to expect
retrieve_response_functo return(summary, conversation_id, referenced_documents, token_usage)and to passprovider_idas a keyword argument. That’s correct for the newretrieve_responsein this module, but any other callers (e.g., alternative backends) must maintain this contract.Double‑check any other
query_endpoint_handler_base(..., retrieve_response_func=...)usages to ensure they:
- Accept
provider_idas kwarg in theirretrieve_responseimplementation.- Return the 4‑tuple so
token_usageandreferenced_documentsare well‑defined.Within this file, the wiring looks coherent.
Also applies to: 333-350
7-25: The import paths in the current code are correct forllama-stack-client==0.3.0. According to official documentation, the imports should usellama_stack_client.types.agents(not thealphanamespace):from llama_stack_client.types.agents import Turn, TurnCreateResponseThe current code's import structure aligns with this. The
ModuleNotFoundErrorcited in the review comment is not due to incorrect import paths—reverting totypes.alpha.agentswould introduce the actual error. If CI is failing, the issue lies elsewhere (e.g., dependency installation, version mismatch, or a different module).Regarding the unused
Documentimport: if this is truly unused, it should be removed per code cleanliness standards, but this is a minor concern unrelated to the import path issue.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (8)
src/app/endpoints/query.py (7)
817-818: Remove redundant local import.
RAGChunkandReferencedDocumentare imported locally inside the function, which is inefficient. Import them at module level instead. This was previously flagged.
897-908: Includetool_resultsinTurnSummaryconstruction.The
TurnSummaryconstruction is missing the requiredtool_resultsparameter. This was previously flagged and will cause aTypeErrorat runtime.🔎 Previously suggested fix
summary = TurnSummary( llm_response=( content_to_str(response.output_message.content) if ( getattr(response, "output_message", None) is not None and getattr(response.output_message, "content", None) is not None ) else "" ), tool_calls=[], + tool_results=[], rag_chunks=rag_chunks, )
948-949: Fix typo: "attatchment" should be "attachment".There's a typo in the error message at line 948 (and also line 958). This was previously flagged but remains unresolved.
764-777: Fix type error:get_rag_toolgroupscan returnNone.At line 771,
get_rag_toolgroups(vector_db_ids)can returnNone, causing aTypeErrorwhen concatenating withmcp_toolgroups. This issue was previously flagged but remains unresolved.🔎 Previously suggested fix
- toolgroups = None - if vector_db_ids: - toolgroups = get_rag_toolgroups(vector_db_ids) + mcp_toolgroups - elif mcp_toolgroups: - toolgroups = mcp_toolgroups + toolgroups = None + if vector_db_ids: + rag_toolgroups = get_rag_toolgroups(vector_db_ids) or [] + toolgroups = rag_toolgroups + mcp_toolgroups + elif mcp_toolgroups: + toolgroups = mcp_toolgroups
814-814: Avoid logging entire query response payload at INFO level.Logging the full
query_responsemay expose sensitive document content. This was previously flagged but remains unresolved.
850-871: Fix type errors in RAGChunk construction.Pipeline failures indicate:
- Line 858:
parent_idis typed asobject, causing a type error withurljoinwhich expectsstr.chunk.contentmay not be a primitive string type expected byRAGChunk.content.🔎 Previously suggested fix
if OFFLINE: parent_id = chunk.metadata.get("parent_id") - if parent_id: - source = urljoin("https://mimir.corp.redhat.com", parent_id) + if parent_id and isinstance(parent_id, str): + source = urljoin("https://mimir.corp.redhat.com", parent_id) @@ rag_chunks.append( RAGChunk( - content=chunk.content, - source=source, + content=str(chunk.content) if chunk.content else "", + source=source if isinstance(source, str) else None, score=score, ) )
799-812: Fix type incompatibility forparamsdictionary.Pipeline reports that
paramsat line 811 has typedict[str, float]butvector_io.queryexpectsdict[str, bool | float | str | Iterable[object] | object | None] | Omit. The issue arises becauseparams["solr"]is assigned a complex object (line 804).🔎 Proposed fix
Explicitly type
paramsto satisfy the broader type requirement:- params = {"k": 5, "score_threshold": 0.0} + params: dict[str, Any] = {"k": 5, "score_threshold": 0.0}run.yaml (1)
101-113: Hardcoded Solr URL should use environment variable.The
solr_url: "http://localhost:8983/solr"at line 104 remains hardcoded and won't work in containerized or distributed deployments. This was previously flagged but has not been addressed.
🧹 Nitpick comments (1)
src/app/endpoints/query.py (1)
79-82: Move OFFLINE flag to configuration.The module-level
OFFLINE = Trueconstant should be sourced from configuration or environment variables rather than hardcoded. This will allow runtime control without code changes.🔎 Suggested approach
Add to your configuration class or read from environment:
# In configuration module OFFLINE_MODE = os.getenv("LIGHTSPEED_OFFLINE_MODE", "true").lower() == "true"Then import and use in this module:
from configuration import configuration # Later in code if configuration.offline_mode: # ...
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
run.yamlsrc/app/endpoints/query.pysrc/models/requests.pytests/unit/app/endpoints/test_conversations_v2.pytests/unit/app/endpoints/test_query.py
🚧 Files skipped from review as they are similar to previous changes (3)
- src/models/requests.py
- tests/unit/app/endpoints/test_conversations_v2.py
- tests/unit/app/endpoints/test_query.py
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/query.py
🧠 Learnings (1)
📚 Learning: 2025-12-18T10:21:03.056Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:03.056Z
Learning: In run.yaml for Llama Stack 0.3.x, do not add a telemetry provider block under providers. To enable telemetry, set telemetry.enabled: true directly (no provider block required). This pattern applies specifically to run.yaml configuration in this repository.
Applied to files:
run.yaml
🧬 Code graph analysis (1)
src/app/endpoints/query.py (2)
src/models/responses.py (1)
ReferencedDocument(328-340)src/utils/types.py (1)
RAGChunk(128-133)
🪛 GitHub Actions: Integration tests
src/app/endpoints/query.py
[error] 20-20: ImportError: No module named 'llama_stack_client.types.agents'.
🪛 GitHub Actions: Ruff
src/app/endpoints/query.py
[error] 22-22: F401: Unused import Document from llama_stack_client.types.agents.turn_create_params. Remove unused import.
🪛 GitHub Actions: Type checks
src/app/endpoints/query.py
[error] 771-771: mypy: Unsupported left operand type for + ("None").
[error] 811-811: mypy: Argument "params" to "query" of "AsyncVectorIoResource" has incompatible type "dict[str, float]"; expected "dict[str, bool | float | str | Iterable[object] | object | None] | Omit".
[error] 811-811: mypy: Note: "dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance.
[error] 858-858: mypy: Value of type variable "AnyStr" of "urljoin" cannot be "object".
🪛 GitHub Actions: Unit tests
src/app/endpoints/query.py
[error] 20-20: ModuleNotFoundError: No module named 'llama_stack_client.types.agents' during pytest collection. Command: 'uv run pytest tests/unit --cov=src --cov=runner --cov-report term-missing'.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-pr
- GitHub Check: E2E: server mode / vertexai
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: server mode / azure
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (1)
src/app/endpoints/query.py (1)
875-887: RAG context injection implementation looks good.The logic correctly formats the top 5 RAG chunks and injects them into the user message. The approach of appending context to the query is appropriate for providing relevant documentation to the LLM.
35182dc to
e98ff61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query_v2.py (1)
636-644:⚠️ Potential issue | 🔴 CriticalMissing third return element when moderation blocks the request.
When
moderation_result.blockedis True, the function returns a 2-element tuple but the function signature expects a 3-element tuple includingdoc_ids_from_chunks. This will cause a runtime error when unpacking.🐛 Proposed fix
if moderation_result.blocked: violation_message = moderation_result.message or "" await append_turn_to_conversation( client, llama_stack_conv_id, input_text, violation_message ) return ( create_violation_stream(violation_message, moderation_result.shield_model), normalize_conversation_id(conversation_id), + [], # Empty doc_ids_from_chunks for blocked requests )
🤖 Fix all issues with AI agents
In `@run.yaml`:
- Around line 155-157: The provider_model_id is hardcoded to a local absolute
path which will break on other machines; update the run.yaml to read the model
path from an environment variable (e.g., replace the literal "/Users/acoba/..."
with a variable reference like ${MODEL_PATH} or similar) and document a fallback
or error if the env var is missing, ensuring the provider_model_id entry uses
that env var reference instead of the hardcoded string so other developers can
set MODEL_PATH in their environment.
- Around line 164-168: The embedding_model entry under the vector_stores block
(vector_store_id "portal-rag", provider_id "solr-vector") contains a hardcoded
local filesystem path and an incorrect double-slash format; replace it with a
portable value (e.g., a model identifier like
"sentence-transformers/<model-name>" or a reference to an environment/config
variable) and remove the extra slash so the string follows the provider's
expected format; make the value configurable (env var or shared config key)
rather than an absolute path so it works across machines and CI.
In `@src/app/endpoints/query_v2.py`:
- Around line 437-440: Hardcoded offline=True and the hardcoded domain URL
(https://mimir.corp.redhat.com) should be removed from
src/app/endpoints/query_v2.py and made configurable, and the RAG extraction
logic should be factored into a shared helper used by both query_v2.py and
streaming_query_v2.py; create a config entry (env or settings object) for the
offline flag and domain, implement a shared function (e.g.,
determine_chunk_source or extract_rag_source) that encapsulates "when offline
use parent_id, else use reference_url" logic, and replace the inline offline
variable in query_v2.py (and streaming_query_v2.py) to call that helper and read
from the config.
- Around line 1101-1106: The call to get_rag_tools in
prepare_tools_for_responses_api is missing the solr params, so Solr filters
aren't applied; update the call site where vector_store_ids is truthy to pass
query_request.solr as the second argument (e.g., get_rag_tools(vector_store_ids,
query_request.solr)) so that get_rag_tools receives the solr_params and can
apply Solr filtering when building rag_tools.
In `@src/app/endpoints/streaming_query_v2.py`:
- Around line 529-540: The code hardcodes "https://mimir.corp.redhat.com" when
building doc_url; replace that literal with a configurable base URL (e.g.,
BASE_DOC_URL or get_setting("BASE_DOC_URL")) and use it wherever doc_url is
constructed in this block (variables: offline, reference_doc, reference_url,
doc_id, doc_url). Update the logic so that when reference_doc does not start
with "http" you prepend the configurable base URL, ensure the config has
sensible defaults and is loaded once (module constant or config accessor) so all
environments (dev/staging/prod) can override it via config/env.
- Around line 443-446: The hardcoded flag offline = True in
streaming_query_v2.py should be replaced with a configurable source; update the
function to read offline from a higher-level configuration (e.g., a settings
object, app config, environment variable, or a request parameter) instead of the
literal, and use that value wherever offline is referenced (the variable named
offline controls chunk source selection and URL construction). Locate the
assignment of offline and replace it with a lookup like settings.OFFLINE or
request.args.get('offline') (or inject it as a function argument),
validate/normalize the value to a boolean, and ensure fallback default behavior
is preserved when the config is missing. Ensure all downstream uses of offline
in the function continue to reference this configurable variable (no other
hardcoded occurrences remain).
🧹 Nitpick comments (4)
src/app/main.py (1)
62-66: Remove commented-out debug code before merging.This commented-out block for deregistering a vector store appears to be leftover debugging code. Commented-out code reduces readability and should be removed. If this functionality is needed in the future, it can be retrieved from version control.
🧹 Proposed fix to remove dead code
- # try: - # await client.vector_stores.delete(vector_store_id="portal-rag") - # logger.info("Successfully deregistered vector store: portal-rag") - # except Exception as e: - # logger.warning("Failed to deregister vector store 'portal-rag': %s", e) -lightspeed-stack.yaml (1)
13-18: Remove commented-out configuration or add clarifying comments.The commented-out
use_as_library_client: falseand URL/API key lines add clutter. If these are meant as documentation for alternative configurations, consider adding explicit comments explaining when to use each mode. Otherwise, remove them.src/app/endpoints/streaming_query_v2.py (1)
437-599: Significant code duplication withquery_v2.py.The RAG chunk extraction, document ID parsing, and URL construction logic (lines 437-599) is nearly identical to the implementation in
src/app/endpoints/query_v2.py. Consider extracting this into a shared utility function insrc/utils/to maintain DRY principles and ensure consistent behavior.src/app/endpoints/query_v2.py (1)
1100-1113: Commented-out code should be removed.The commented-out Solr filter logging code adds noise without providing value. If this functionality is needed, uncomment and use it; otherwise, remove it.
🧹 Proposed fix
if vector_store_ids: - # logger.info("query_request.solr: %s", query_request.solr) rag_tools = get_rag_tools(vector_store_ids) if rag_tools: logger.info("rag_tool are: %s", rag_tools) toolgroups.extend(rag_tools) - # if query_request.solr: - # logger.info( - # "RAG tools configured with Solr filters: %s", query_request.solr - # ) - # else: - # logger.info("RAG tools configured without Solr filters") else: logger.info("No RAG tools configured")
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
e98ff61 to
19bc8ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/app/endpoints/streaming_query_v2.py (1)
641-644:⚠️ Potential issue | 🔴 CriticalFix return value mismatch in shield violation path.
The function signature returns
tuple[..., str, list[ReferencedDocument]]but the violation path at line 641 returns only a 2-tuple. This causes a type error and will fail at runtime when callers unpack 3 values.🐛 Proposed fix
return ( create_violation_stream(violation_message, moderation_result.shield_model), normalize_conversation_id(conversation_id), + [], # Empty list of ReferencedDocument for violation case )src/app/endpoints/query_v2.py (2)
12-24:⚠️ Potential issue | 🟡 MinorGroup
llama_stack_clientimports to satisfy lint.Pylint flags the split import; merging them avoids the CI error.
Proposed fix
-from llama_stack_client import APIConnectionError, APIStatusError +from llama_stack_client import ( + APIConnectionError, + APIStatusError, + AsyncLlamaStackClient, +) @@ -from llama_stack_client import AsyncLlamaStackClient
1009-1017:⚠️ Potential issue | 🟡 MinorResolve pylint
too-many-argumentsonprepare_tools_for_responses_api.CI is failing after the extra parameter. Either bundle options or suppress for this function.
Proposed minimal fix
-async def prepare_tools_for_responses_api( +async def prepare_tools_for_responses_api( # pylint: disable=too-many-arguments
🤖 Fix all issues with AI agents
In `@run.yaml`:
- Around line 166-181: There are two identical vector_stores blocks (both
defining vector_store_id: portal-rag, provider_id: solr-vector, embedding_model:
sentence-transformers/${env.EMBEDDING_MODEL_DIR}, embedding_dimension: 384);
remove the duplicate so only one vector_stores definition remains (delete the
redundant block outside/inside the registered_resources section as appropriate),
ensure YAML indentation and surrounding keys (registered_resources, tool_groups,
etc.) remain valid after removal and that vector_store_id: portal-rag stays in
the intended scope.
- Around line 162-165: The YAML sets shield_id: llama-guard but incorrectly uses
provider_shield_id: openai/gpt-4o-mini; change provider_shield_id to reference
the correct llama-guard shield identifier (e.g., provider_shield_id:
llama-guard) so the `shields` entry for the `llama-guard` provider uses a valid
llama-guard shield ID; update the `provider_shield_id` value in the same
`shields` mapping to match the `llama-guard` shield name.
In `@src/app/endpoints/query_v2.py`:
- Around line 482-534: The loop assumes query_response.chunks always exists and
concatenates doc_id in offline mode even when doc_id is None; fix by
early-guarding and iterating over retrieved_chunks (use the previously computed
retrieved_chunks or empty list) instead of accessing query_response.chunks
directly, and when building the offline doc_url only concatenate the base host
with a non-None doc_id (fall back to reference_url or skip/continue when neither
doc_id nor reference_url is present). Update references in the loop (chunk, md,
cm, doc_id, reference_url, offline) so you use the safe retrieved_chunks
variable and add a check before offline concatenation to avoid doc_id being
None.
- Around line 562-580: The code may pass non-string metadata and content into
urljoin and RAGChunk; ensure parent_id and reference_url are converted to
strings and chunk.content is a string before creating RAGChunk: when building
source, call urljoin("https://mimir.corp.redhat.com", str(parent_id)) and set
source = str(chunk.metadata.get("reference_url")) if reference_url exists (or
None if missing); when constructing the RAGChunk ensure content is a string
(e.g., use chunk.content.text or str(chunk.content)) and keep the existing score
logic using retrieved_scores.
In `@src/app/endpoints/streaming_query_v2.py`:
- Around line 567-587: The RAGChunk constructor is getting wrong types: ensure
source is a str or None and content is a str before calling RAGChunk in the loop
over retrieved_chunks. For the offline branch, coerce parent_id to str (e.g.,
parent_id = str(parent_id)) before passing to urljoin; for the non-offline
branch ensure reference_url is cast to str; for content, replace chunk.content
with a string representation (e.g., use chunk.content.text if that attribute
exists, otherwise str(chunk.content)) so RAGChunk(content=...) receives a str;
update any uses of source and score to match the expected types in RAGChunk.
- Around line 525-549: The code building document URLs uses
doc_id/reference_url/reference_doc without guaranteeing they are strings,
causing TypeErrors; update the block that constructs doc_url (the variables
doc_id, reference_url, reference_doc, doc_url, metadata_doc_ids, and the
ReferencedDocument creation) to first ensure reference_doc is a non-None string
(e.g., skip if None or cast to str only after a non-None check), only call
startswith on a confirmed string, and when concatenating with
"https://mimir.corp.redhat.com" ensure the left operand is a string; in short,
add explicit None guards and string coercion before using startswith or '+' so
ReferencedDocument is only created with valid string doc_url/doc_title.
In `@tests/unit/app/endpoints/test_query_v2.py`:
- Around line 292-305: The new mock block has trailing whitespace and an overly
long line causing lint/black failures; edit the block that defines
mock_query_response and assigns mock_client.vector_io.query (and the
mocker.patch calls for get_system_prompt and run_shield_moderation) to remove
any trailing spaces and break or shorten the long line (e.g., assign the
AsyncMock to mock_client.vector_io.query on its own line or use a shorter
return_value variable) so the lines conform to Black's line length and no
trailing whitespace remains; keep the same mock objects: mock_query_response,
mock_client.vector_io.query, get_system_prompt, and run_shield_moderation.
In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 437-439: Remove the unused mock TurnSummary instances that linters
flag: delete the mock_summary variable assignments (instances of TurnSummary) in
the test module where they are not passed to stream_end_event; specifically
remove the lines creating mock_summary (TurnSummary(llm_response="",
tool_calls=[], tool_results=[], rag_chunks=[])) and any other identical unused
mock_summary declarations (also at the other occurrences noted), leaving the
rest of the test setup and calls to stream_end_event intact.
🧹 Nitpick comments (3)
src/models/requests.py (1)
169-175: Update class docstring to include the newsolrfield.The
QueryRequestclass docstring (lines 72-91) lists all attributes but doesn't include the newly addedsolrfield. As per coding guidelines, all class attributes should be documented.📝 Suggested docstring update
Add to the Attributes section around line 86:
vector_store_ids: The optional list of specific vector store IDs to query for RAG. + solr: Optional Solr-specific query parameters including filter queries. Example:src/app/endpoints/query.py (1)
585-614: Remove or uncomment the dead code block.This large commented-out function
get_rag_toolgroupsadds noise. If it's needed for future reference, consider moving it to a separate branch or issue tracker instead of keeping it in the codebase.src/app/endpoints/streaming_query_v2.py (1)
388-388: Consider refactoringretrieve_responseto reduce complexity.The function has too many branches (31/12) and statements (108/50) per linter. Extract the vector querying logic (lines 448-587) into a separate helper function.
♻️ Suggested extraction
async def _query_vector_store_for_rag_context( client: AsyncLlamaStackClient, query_request: QueryRequest, offline: bool, ) -> tuple[list[RAGChunk], list[ReferencedDocument]]: """Query vector stores and extract RAG chunks and referenced documents.""" # Move lines 448-587 here ...Then call from
retrieve_response:rag_chunks, doc_ids_from_chunks = await _query_vector_store_for_rag_context( client, query_request, offline )
| shields: | ||
| - shield_id: llama-guard | ||
| provider_id: llama-guard | ||
| provider_shield_id: openai/gpt-4o-mini |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n "provider_shield_id" -nRepository: lightspeed-core/lightspeed-stack
Length of output: 1306
🏁 Script executed:
sed -n '1,60p' dev-tools/test-configs/llama-stack-mcp-test.yamlRepository: lightspeed-core/lightspeed-stack
Length of output: 1308
Invalid provider_shield_id for llama-guard
Lines 162-165: provider_shield_id is set to openai/gpt-4o-mini under the llama-guard provider; it must reference a llama-guard shield ID (e.g. llama-guard).
🤖 Prompt for AI Agents
In `@run.yaml` around lines 162 - 165, The YAML sets shield_id: llama-guard but
incorrectly uses provider_shield_id: openai/gpt-4o-mini; change
provider_shield_id to reference the correct llama-guard shield identifier (e.g.,
provider_shield_id: llama-guard) so the `shields` entry for the `llama-guard`
provider uses a valid llama-guard shield ID; update the `provider_shield_id`
value in the same `shields` mapping to match the `llama-guard` shield name.
| vector_stores: | ||
| - vector_store_id: portal-rag | ||
| provider_id: solr-vector | ||
| embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR} | ||
| embedding_dimension: 384 | ||
| datasets: [] | ||
| scoring_fns: [] | ||
| benchmarks: [] | ||
| tool_groups: | ||
| - toolgroup_id: builtin::rag # Register the RAG tool | ||
| provider_id: rag-runtime | ||
| vector_stores: | ||
| default_provider_id: faiss | ||
| default_embedding_model: # Define the default embedding model for RAG | ||
| provider_id: sentence-transformers | ||
| model_id: nomic-ai/nomic-embed-text-v1.5 | ||
| vector_store_id: portal-rag | ||
| provider_id: solr-vector | ||
| embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR} | ||
| embedding_dimension: 384 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate vector_stores definition.
vector_stores is defined both under registered_resources (lines 166-170) and at the top level (lines 177-181) with identical content. This duplication could cause confusion and maintenance issues.
🔧 Remove duplicate block
tool_groups:
- toolgroup_id: builtin::rag # Register the RAG tool
provider_id: rag-runtime
-vector_stores:
- vector_store_id: portal-rag
- provider_id: solr-vector
- embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR}
- embedding_dimension: 384
safety:
default_shield_id: llama-guard📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| vector_stores: | |
| - vector_store_id: portal-rag | |
| provider_id: solr-vector | |
| embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR} | |
| embedding_dimension: 384 | |
| datasets: [] | |
| scoring_fns: [] | |
| benchmarks: [] | |
| tool_groups: | |
| - toolgroup_id: builtin::rag # Register the RAG tool | |
| provider_id: rag-runtime | |
| vector_stores: | |
| default_provider_id: faiss | |
| default_embedding_model: # Define the default embedding model for RAG | |
| provider_id: sentence-transformers | |
| model_id: nomic-ai/nomic-embed-text-v1.5 | |
| vector_store_id: portal-rag | |
| provider_id: solr-vector | |
| embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR} | |
| embedding_dimension: 384 | |
| vector_stores: | |
| - vector_store_id: portal-rag | |
| provider_id: solr-vector | |
| embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR} | |
| embedding_dimension: 384 | |
| datasets: [] | |
| scoring_fns: [] | |
| benchmarks: [] | |
| tool_groups: | |
| - toolgroup_id: builtin::rag # Register the RAG tool | |
| provider_id: rag-runtime |
🤖 Prompt for AI Agents
In `@run.yaml` around lines 166 - 181, There are two identical vector_stores
blocks (both defining vector_store_id: portal-rag, provider_id: solr-vector,
embedding_model: sentence-transformers/${env.EMBEDDING_MODEL_DIR},
embedding_dimension: 384); remove the duplicate so only one vector_stores
definition remains (delete the redundant block outside/inside the
registered_resources section as appropriate), ensure YAML indentation and
surrounding keys (registered_resources, tool_groups, etc.) remain valid after
removal and that vector_store_id: portal-rag stays in the intended scope.
| if query_response.chunks: | ||
| retrieved_chunks = query_response.chunks | ||
| retrieved_scores = ( | ||
| query_response.scores if hasattr(query_response, "scores") else [] | ||
| ) | ||
|
|
||
| # Extract doc_ids from chunks for referenced_documents | ||
| metadata_doc_ids = set() | ||
|
|
||
| for chunk in query_response.chunks: | ||
| logger.info("Extract doc ids from chunk: %s", chunk) | ||
|
|
||
| # 1) dict metadata (what your code expects today) | ||
| md = getattr(chunk, "metadata", None) or {} | ||
| doc_id = md.get("doc_id") or md.get("document_id") | ||
| title = md.get("title") | ||
|
|
||
| # 2) typed chunk_metadata (what your provider/logs are actually populating) | ||
| if not doc_id: | ||
| cm = getattr(chunk, "chunk_metadata", None) | ||
| if cm is not None: | ||
| # cm might be a pydantic model or a dict depending on caller | ||
| if isinstance(cm, dict): | ||
| doc_id = cm.get("doc_id") or cm.get("document_id") | ||
| title = title or cm.get("title") | ||
| reference_url = cm.get("reference_url") | ||
| else: | ||
| doc_id = getattr(cm, "doc_id", None) or getattr( | ||
| cm, "document_id", None | ||
| ) | ||
| title = title or getattr(cm, "title", None) | ||
| reference_url = getattr(cm, "reference_url", None) | ||
| else: | ||
| reference_url = None | ||
| else: | ||
| reference_url = md.get("reference_url") | ||
|
|
||
| if not doc_id and not reference_url: | ||
| continue | ||
|
|
||
| # Build URL based on offline flag | ||
| if offline: | ||
| # Use parent/doc path | ||
| reference_doc = doc_id | ||
| doc_url = "https://mimir.corp.redhat.com" + reference_doc | ||
| else: | ||
| # Use reference_url if online | ||
| reference_doc = reference_url or doc_id | ||
| doc_url = ( | ||
| reference_doc | ||
| if reference_doc.startswith("http") | ||
| else ("https://mimir.corp.redhat.com" + reference_doc) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard chunk parsing when chunks is missing and avoid offline doc_id None crashes.
The loop currently runs even if query_response.chunks is None, and offline mode concatenates doc_id even when only reference_url is available. Both can raise at runtime and match the type‑checker errors.
Proposed fix
- if query_response.chunks:
- retrieved_chunks = query_response.chunks
- retrieved_scores = (
- query_response.scores if hasattr(query_response, "scores") else []
- )
-
- # Extract doc_ids from chunks for referenced_documents
- metadata_doc_ids = set()
-
- for chunk in query_response.chunks:
+ if query_response.chunks:
+ retrieved_chunks = query_response.chunks
+ retrieved_scores = (
+ query_response.scores if hasattr(query_response, "scores") else []
+ )
+
+ # Extract doc_ids from chunks for referenced_documents
+ metadata_doc_ids: set[str] = set()
+
+ for chunk in query_response.chunks:
@@
- if offline:
+ if offline:
+ if not doc_id:
+ continue
# Use parent/doc path
reference_doc = doc_id
doc_url = "https://mimir.corp.redhat.com" + reference_doc🧰 Tools
🪛 GitHub Actions: Pyright
[error] 526-526: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None"
[error] 532-532: "startswith" is not a known attribute of "None" (reportOptionalMemberAccess)
[error] 533-533: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None"
🪛 GitHub Actions: Type checks
[error] 526-526: Unsupported operand types for + ("str" and "None") [operator]
[error] 532-532: Item "None" of "Any | None" has no attribute "startswith" [union-attr]
[error] 533-533: Unsupported operand types for + ("str" and "None") [operator]
🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 482 - 534, The loop assumes
query_response.chunks always exists and concatenates doc_id in offline mode even
when doc_id is None; fix by early-guarding and iterating over retrieved_chunks
(use the previously computed retrieved_chunks or empty list) instead of
accessing query_response.chunks directly, and when building the offline doc_url
only concatenate the base host with a non-None doc_id (fall back to
reference_url or skip/continue when neither doc_id nor reference_url is
present). Update references in the loop (chunk, md, cm, doc_id, reference_url,
offline) so you use the safe retrieved_chunks variable and add a check before
offline concatenation to avoid doc_id being None.
| # Extract source from chunk metadata based on offline flag | ||
| source = None | ||
| if chunk.metadata: | ||
| if offline: | ||
| parent_id = chunk.metadata.get("parent_id") | ||
| if parent_id: | ||
| source = urljoin("https://mimir.corp.redhat.com", parent_id) | ||
| else: | ||
| source = chunk.metadata.get("reference_url") | ||
|
|
||
| # Get score from retrieved_scores list if available | ||
| score = retrieved_scores[i] if i < len(retrieved_scores) else None | ||
|
|
||
| rag_chunks.append( | ||
| RAGChunk( | ||
| content=chunk.content, | ||
| source=source, | ||
| score=score, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate RAGChunk definition
rg --files-with-matches "class RAGChunk" -n .Repository: lightspeed-core/lightspeed-stack
Length of output: 96
Convert metadata and content fields to str
- Pass
str(parent_id)tourljoin - Wrap
reference_urlinstr(...)(or useNoneif missing) - Ensure
chunk.contentis astr(e.g. extract.textorstr(...)) before creatingRAGChunk
Diff
- source = urljoin("https://mimir.corp.redhat.com", parent_id)
+ source = urljoin("https://mimir.corp.redhat.com", str(parent_id))
- source = chunk.metadata.get("reference_url")
+ source = str(chunk.metadata.get("reference_url")) if chunk.metadata.get("reference_url") else None
- rag_chunks.append(RAGChunk(content=chunk.content, source=source, score=score))
+ content = chunk.content
+ if not isinstance(content, str):
+ content = getattr(content, "text", str(content))
+ rag_chunks.append(RAGChunk(content=content, source=source, score=score))🧰 Tools
🪛 GitHub Actions: Pyright
[error] 568-568: Argument of type "object" cannot be assigned to parameter "url" of type "AnyStr@urljoin | None" in function "urljoin"
[error] 577-577: Argument of type "ChunkContent" cannot be assigned to parameter "content" of type "str" in function "init"
[error] 578-578: Argument of type "str | object | None" cannot be assigned to parameter "source" of type "str | None" in function "init"
🪛 GitHub Actions: Type checks
[error] 568-568: Value of type variable "AnyStr" of "urljoin" cannot be "object" [type-var]
🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 562 - 580, The code may pass
non-string metadata and content into urljoin and RAGChunk; ensure parent_id and
reference_url are converted to strings and chunk.content is a string before
creating RAGChunk: when building source, call
urljoin("https://mimir.corp.redhat.com", str(parent_id)) and set source =
str(chunk.metadata.get("reference_url")) if reference_url exists (or None if
missing); when constructing the RAGChunk ensure content is a string (e.g., use
chunk.content.text or str(chunk.content)) and keep the existing score logic
using retrieved_scores.
| if not doc_id and not reference_url: | ||
| continue | ||
|
|
||
| # Build URL based on offline flag | ||
| if offline: | ||
| # Use parent/doc path | ||
| reference_doc = doc_id | ||
| doc_url = "https://mimir.corp.redhat.com" + reference_doc | ||
| else: | ||
| # Use reference_url if online | ||
| reference_doc = reference_url or doc_id | ||
| doc_url = ( | ||
| reference_doc | ||
| if reference_doc.startswith("http") | ||
| else ("https://mimir.corp.redhat.com" + reference_doc) | ||
| ) | ||
|
|
||
| if reference_doc and reference_doc not in metadata_doc_ids: | ||
| metadata_doc_ids.add(reference_doc) | ||
| doc_ids_from_chunks.append( | ||
| ReferencedDocument( | ||
| doc_title=title, | ||
| doc_url=doc_url, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix type errors when building document URLs.
Pipeline reports multiple type errors because doc_id, reference_doc, and reference_url can be None. The string concatenation and startswith calls will fail at runtime.
🐛 Proposed fix for type safety
if not doc_id and not reference_url:
continue
# Build URL based on offline flag
+ doc_url: Optional[str] = None
if offline:
# Use parent/doc path
- reference_doc = doc_id
- doc_url = "https://mimir.corp.redhat.com" + reference_doc
+ if doc_id:
+ reference_doc = str(doc_id)
+ doc_url = "https://mimir.corp.redhat.com" + reference_doc
+ else:
+ continue
else:
# Use reference_url if online
- reference_doc = reference_url or doc_id
- doc_url = (
- reference_doc
- if reference_doc.startswith("http")
- else ("https://mimir.corp.redhat.com" + reference_doc)
- )
+ reference_doc = str(reference_url) if reference_url else (str(doc_id) if doc_id else None)
+ if reference_doc:
+ doc_url = (
+ reference_doc
+ if reference_doc.startswith("http")
+ else ("https://mimir.corp.redhat.com" + reference_doc)
+ )
+ else:
+ continue
- if reference_doc and reference_doc not in metadata_doc_ids:
+ if reference_doc and doc_url and reference_doc not in metadata_doc_ids:
metadata_doc_ids.add(reference_doc)🧰 Tools
🪛 GitHub Actions: Pyright
[error] 532-532: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None"
[error] 538-538: "startswith" is not a known attribute of "None" (reportOptionalMemberAccess)
[error] 539-539: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None"
🪛 GitHub Actions: Type checks
[error] 532-532: Unsupported operand types for + ("str" and "None") [operator]
[error] 538-538: Item "None" of "Any | None" has no attribute "startswith" [union-attr]
[error] 539-539: Unsupported operand types for + ("str" and "None") [operator]
🤖 Prompt for AI Agents
In `@src/app/endpoints/streaming_query_v2.py` around lines 525 - 549, The code
building document URLs uses doc_id/reference_url/reference_doc without
guaranteeing they are strings, causing TypeErrors; update the block that
constructs doc_url (the variables doc_id, reference_url, reference_doc, doc_url,
metadata_doc_ids, and the ReferencedDocument creation) to first ensure
reference_doc is a non-None string (e.g., skip if None or cast to str only after
a non-None check), only call startswith on a confirmed string, and when
concatenating with "https://mimir.corp.redhat.com" ensure the left operand is a
string; in short, add explicit None guards and string coercion before using
startswith or '+' so ReferencedDocument is only created with valid string
doc_url/doc_title.
| for i, chunk in enumerate(retrieved_chunks): | ||
| # Extract source from chunk metadata based on offline flag | ||
| source = None | ||
| if chunk.metadata: | ||
| if offline: | ||
| parent_id = chunk.metadata.get("parent_id") | ||
| if parent_id: | ||
| source = urljoin("https://mimir.corp.redhat.com", parent_id) | ||
| else: | ||
| source = chunk.metadata.get("reference_url") | ||
|
|
||
| # Get score from retrieved_scores list if available | ||
| score = retrieved_scores[i] if i < len(retrieved_scores) else None | ||
|
|
||
| rag_chunks.append( | ||
| RAGChunk( | ||
| content=chunk.content, | ||
| source=source, | ||
| score=score, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix type errors in RAGChunk construction.
Pipeline reports:
- Line 574:
urljoinreceivesobjecttype instead ofstr - Line 583:
chunk.contentisChunkContent, notstr - Line 584:
sourcetype mismatch
🐛 Proposed fix
for i, chunk in enumerate(retrieved_chunks):
# Extract source from chunk metadata based on offline flag
source = None
if chunk.metadata:
if offline:
parent_id = chunk.metadata.get("parent_id")
- if parent_id:
+ if parent_id and isinstance(parent_id, str):
source = urljoin("https://mimir.corp.redhat.com", parent_id)
else:
- source = chunk.metadata.get("reference_url")
+ ref_url = chunk.metadata.get("reference_url")
+ source = str(ref_url) if ref_url else None
# Get score from retrieved_scores list if available
score = retrieved_scores[i] if i < len(retrieved_scores) else None
rag_chunks.append(
RAGChunk(
- content=chunk.content,
- source=source,
+ content=str(chunk.content) if chunk.content else "",
+ source=source if isinstance(source, str) else None,
score=score,
)
)🧰 Tools
🪛 GitHub Actions: Pyright
[error] 574-574: Argument of type "object" cannot be assigned to parameter "url" of type "AnyStr@urljoin | None" in function "urljoin"
[error] 583-583: Argument of type "ChunkContent" cannot be assigned to parameter "content" of type "str" in function "init"
[error] 584-584: Argument of type "str | object | None" cannot be assigned to parameter "source" of type "str | None" in function "init"
🪛 GitHub Actions: Type checks
[error] 574-574: Value of type variable "AnyStr" of "urljoin" cannot be "object" [type-var]
🤖 Prompt for AI Agents
In `@src/app/endpoints/streaming_query_v2.py` around lines 567 - 587, The RAGChunk
constructor is getting wrong types: ensure source is a str or None and content
is a str before calling RAGChunk in the loop over retrieved_chunks. For the
offline branch, coerce parent_id to str (e.g., parent_id = str(parent_id))
before passing to urljoin; for the non-offline branch ensure reference_url is
cast to str; for content, replace chunk.content with a string representation
(e.g., use chunk.content.text if that attribute exists, otherwise
str(chunk.content)) so RAGChunk(content=...) receives a str; update any uses of
source and score to match the expected types in RAGChunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query_v2.py (1)
636-644:⚠️ Potential issue | 🔴 CriticalReturn statement missing third element - incompatible with function signature.
Pipeline reports incompatible return type. The moderation-blocked path returns a 2-tuple but the function signature at line 394 declares a 3-tuple return type including
list[ReferencedDocument].🐛 Proposed fix
if moderation_result.blocked: violation_message = moderation_result.message or "" await append_turn_to_conversation( client, llama_stack_conv_id, input_text, violation_message ) return ( create_violation_stream(violation_message, moderation_result.shield_model), normalize_conversation_id(conversation_id), + [], # Empty list for doc_ids_from_chunks when blocked )
🧹 Nitpick comments (2)
src/app/endpoints/query_v2.py (2)
1009-1017: Consider reducing function arguments.Pipeline flags
too-many-arguments (6/5). Consider grouping related parameters into a dataclass or config object.♻️ Example refactor using a context object
`@dataclass` class ToolPrepContext: client: AsyncLlamaStackClient query_request: QueryRequest token: str config: AppConfig mcp_headers: Optional[dict[str, dict[str, str]]] = None skip_rag_tools: bool = False async def prepare_tools_for_responses_api( context: ToolPrepContext, ) -> Optional[list[dict[str, Any]]]: ...
1-3: Module has grown large (1087 lines).Pipeline flags the module exceeds 1000 lines. Consider extracting the RAG/vector querying logic (lines 431-597) into a separate utility module to improve maintainability and enable reuse with
streaming_query_v2.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/app/endpoints/query_v2.py`:
- Around line 561-581: The RAGChunk constructor is receiving wrong types:
urljoin is being passed a non-str parent_id, chunk.content is a ChunkContent
object not a str, and source must be Optional[str]; fix by coercing values to
the expected types before creating RAGChunk: when deriving parent_id from
chunk.metadata call str(parent_id) (or skip urljoin if parent_id is None) so
urljoin(MIMIR_DOC_URL, str(parent_id)) receives strings; for the content field
extract the string payload from chunk.content (e.g., chunk.content.text or
similar) or fallback to str(chunk.content) so content is a str; ensure source is
either a str or None (set to None if not present) and keep score as the numeric
value from retrieved_scores if available. Apply these changes around the
RAGChunk(...) creation in the loop that references retrieved_chunks,
chunk.metadata, urljoin, MIMIR_DOC_URL, retrieved_scores, and RAGChunk.
- Around line 519-534: The offline/online URL construction must avoid using None
for doc_id/reference_url: change the offline branch so it only assigns
reference_doc = doc_id if doc_id is truthy; otherwise fall back to using
reference_url (and build doc_url from reference_url with MIMIR_DOC_URL if
needed). In the online branch compute reference_doc = reference_url or doc_id
and ensure you check that reference_doc is truthy before calling
reference_doc.startswith("http") (i.e., guard the .startswith call and treat a
None/empty reference_doc as needing MIMIR_DOC_URL + (reference_url or doc_id)).
Update the variables reference_doc and doc_url accordingly so neither branch
calls methods on None.
🧹 Nitpick comments (3)
src/app/endpoints/query_v2.py (3)
12-12: Consolidate imports fromllama_stack_client.Pipeline flags ungrouped imports. Lines 12 and 24 both import from the same package and should be combined.
♻️ Proposed fix
-from llama_stack_client import APIConnectionError, APIStatusError +from llama_stack_client import APIConnectionError, APIStatusError, AsyncLlamaStackClient from llama_stack_api.openai_responses import ( ... ) -from llama_stack_client import AsyncLlamaStackClientAlso applies to: 24-24
1009-1017: Consider reducing function arguments.Pipeline warning: function has 6 arguments (exceeds limit of 5). Consider grouping related parameters into a config object or using
**kwargsfor optional parameters.
1-3: Module exceeds 1000 lines (currently 1080).Pipeline warning
C0302: Too many lines in module. Consider extracting the direct vector DB querying logic (lines 431-598) and related helpers into a separate module (e.g.,utils/vector_query.py) to improve maintainability.
| if not doc_id and not reference_url: | ||
| continue | ||
|
|
||
| # Build URL based on offline flag | ||
| if offline: | ||
| # Use parent/doc path | ||
| reference_doc = doc_id | ||
| doc_url = MIMIR_DOC_URL + reference_doc | ||
| else: | ||
| # Use reference_url if online | ||
| reference_doc = reference_url or doc_id | ||
| doc_url = ( | ||
| reference_doc | ||
| if reference_doc.startswith("http") | ||
| else (MIMIR_DOC_URL + reference_doc) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix type errors in offline/online URL construction.
Pipeline reports multiple type errors. The guard at line 519 allows cases where doc_id is None but reference_url exists, causing the offline branch to fail. Additionally, the online branch calls .startswith() on a potentially None value.
🐛 Proposed fix
if not doc_id and not reference_url:
continue
# Build URL based on offline flag
if offline:
- # Use parent/doc path
- reference_doc = doc_id
- doc_url = MIMIR_DOC_URL + reference_doc
+ # Use parent/doc path - skip if no doc_id in offline mode
+ if not doc_id:
+ continue
+ reference_doc = str(doc_id)
+ doc_url = MIMIR_DOC_URL + reference_doc
else:
# Use reference_url if online
- reference_doc = reference_url or doc_id
+ reference_doc = str(reference_url) if reference_url else (str(doc_id) if doc_id else None)
+ if not reference_doc:
+ continue
doc_url = (
reference_doc
- if reference_doc.startswith("http")
+ if reference_doc.startswith("http")
else (MIMIR_DOC_URL + reference_doc)
)🧰 Tools
🪛 GitHub Actions: Pyright
[error] 526-526: pyright: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None" (reportOperatorIssue)
[error] 532-532: pyright: "startswith" is not a known attribute of "None" (reportOptionalMemberAccess)
[error] 533-533: pyright: Operator "+" not supported for types "Literal['https://mimir.corp.redhat.com']" and "Unknown | Any | None" (reportOperatorIssue)
🪛 GitHub Actions: Type checks
[error] 532-532: mypy error: Item "None" of "Any | None" has no attribute "startswith" [union-attr].
🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 519 - 534, The offline/online URL
construction must avoid using None for doc_id/reference_url: change the offline
branch so it only assigns reference_doc = doc_id if doc_id is truthy; otherwise
fall back to using reference_url (and build doc_url from reference_url with
MIMIR_DOC_URL if needed). In the online branch compute reference_doc =
reference_url or doc_id and ensure you check that reference_doc is truthy before
calling reference_doc.startswith("http") (i.e., guard the .startswith call and
treat a None/empty reference_doc as needing MIMIR_DOC_URL + (reference_url or
doc_id)). Update the variables reference_doc and doc_url accordingly so neither
branch calls methods on None.
| for i, chunk in enumerate(retrieved_chunks): | ||
| # Extract source from chunk metadata based on offline flag | ||
| source = None | ||
| if chunk.metadata: | ||
| if offline: | ||
| parent_id = chunk.metadata.get("parent_id") | ||
| if parent_id: | ||
| source = urljoin(MIMIR_DOC_URL, parent_id) | ||
| else: | ||
| source = chunk.metadata.get("reference_url") | ||
|
|
||
| # Get score from retrieved_scores list if available | ||
| score = retrieved_scores[i] if i < len(retrieved_scores) else None | ||
|
|
||
| rag_chunks.append( | ||
| RAGChunk( | ||
| content=chunk.content, | ||
| source=source, | ||
| score=score, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix type errors in RAGChunk creation.
Pipeline reports: urljoin receives object instead of str, chunk.content is ChunkContent not str, and source type doesn't match. Convert values explicitly.
🐛 Proposed fix
# Convert retrieved chunks to RAGChunk format
for i, chunk in enumerate(retrieved_chunks):
# Extract source from chunk metadata based on offline flag
source = None
if chunk.metadata:
if offline:
parent_id = chunk.metadata.get("parent_id")
if parent_id:
- source = urljoin(MIMIR_DOC_URL, parent_id)
+ source = urljoin(MIMIR_DOC_URL, str(parent_id))
else:
- source = chunk.metadata.get("reference_url")
+ ref_url = chunk.metadata.get("reference_url")
+ source = str(ref_url) if ref_url else None
# Get score from retrieved_scores list if available
score = retrieved_scores[i] if i < len(retrieved_scores) else None
+ # Extract string content from chunk
+ content = chunk.content
+ if not isinstance(content, str):
+ content = getattr(content, "text", None) or str(content)
+
rag_chunks.append(
RAGChunk(
- content=chunk.content,
+ content=content,
source=source,
score=score,
)
)🧰 Tools
🪛 GitHub Actions: Pyright
[error] 568-568: pyright: Argument of type "object" cannot be assigned to parameter "url" of type "AnyStr@urljoin | None" in function "urljoin" (reportArgumentType)
[error] 577-577: pyright: Argument of type "ChunkContent" cannot be assigned to parameter "content" of type "str" in function "init" (reportArgumentType)
[error] 578-578: pyright: Argument of type "str | object | None" cannot be assigned to parameter "source" of type "str | None" in function "init" (reportArgumentType)
🤖 Prompt for AI Agents
In `@src/app/endpoints/query_v2.py` around lines 561 - 581, The RAGChunk
constructor is receiving wrong types: urljoin is being passed a non-str
parent_id, chunk.content is a ChunkContent object not a str, and source must be
Optional[str]; fix by coercing values to the expected types before creating
RAGChunk: when deriving parent_id from chunk.metadata call str(parent_id) (or
skip urljoin if parent_id is None) so urljoin(MIMIR_DOC_URL, str(parent_id))
receives strings; for the content field extract the string payload from
chunk.content (e.g., chunk.content.text or similar) or fallback to
str(chunk.content) so content is a str; ensure source is either a str or None
(set to None if not present) and keep score as the numeric value from
retrieved_scores if available. Apply these changes around the RAGChunk(...)
creation in the loop that references retrieved_chunks, chunk.metadata, urljoin,
MIMIR_DOC_URL, retrieved_scores, and RAGChunk.
Description
This PR introduces several changes to the Lightspeed Stack query and streaming query pipeline with a focus on Solr vector I/O support.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Bug Fixes
Configuration
Improvements