fix(infra): fix some dependency hells and add some lazy loading to reduce celery worker RAM usage #3

cubic-dev-local · 2025-09-24T16:39:33Z

## Description

https://linear.app/danswer/issue/DAN-2573/move-more-imports-onto-lazy-loading

so i have a script that i run to check memory usage for celery workers,

before this pr its ~600MB per worker

after its ~250 MB for some workers

docker container from 4.3GB -> 2.819GiB

to diagnose why, its not easy - its not that all pip dependencies are loaded into memory at worker start so i can just lazy load any, its specifically ones that get imported at runtime due to an actual import statement

this makes it very tricky to track down exactly what causes the 600 MB. literally had to trial and error suspiscious imports tracing starting from the celery worker main file

TBH the existing repo dependency graph is a little scuffed, one example branch that caused the worker to import llm stuff (there are like a dozen of these had to swift through to get all the memory offenders down):

app base
->

/Users/edwinluo/onyx/backend/onyx/background/celery/tasks/docprocessing/utils.py
->

redis connector
->

/Users/edwinluo/onyx/backend/onyx/redis/redis_connector_delete.py

->

/Users/edwinluo/onyx/backend/onyx/db/document.py

->

/Users/edwinluo/onyx/backend/onyx/db/feedback.py

->

/Users/edwinluo/onyx/backend/onyx/db/chat.py

->

/Users/edwinluo/onyx/backend/onyx/context/search/utils.py

->

/Users/edwinluo/onyx/backend/onyx/db/search_settings.py

->

/Users/edwinluo/onyx/backend/onyx/db/llm.py OR /Users/edwinluo/onyx/backend/onyx/natural_language_processing/search_nlp_models.py

->

/Users/edwinluo/onyx/backend/onyx/llm/utils.py (langchian, litellm, etc.)

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

This PR should be backported (make sure to check that the backport attempt succeeds)
[Optional] Override Linear Check

Summary by cubic

Moves heavy imports to lazy-loading across indexing, LLM, NLP, and file-processing code to reduce worker memory and speed up startup. Also consolidates search doc conversion into SearchDoc and extracts PromptSnapshot to a shared schema. Addresses Linear DAN-2573 (Reduce Memory usage in Onyx).

Refactors
- Lazy-load litellm, tiktoken, openai, markitdown, read_pdf_file, instantiate_connector, and run_indexing_pipeline within functions.
- Move chunks_or_sections_to_search_docs into SearchDoc as classmethods (plus from_inference_section/from_inference_chunk); remove the utils version.
- Extract PromptSnapshot to onyx.chat.prompt_builder.schemas and update imports.
- Redis connector delete: remove direct DB import to break import chain; per-document enqueue loop is currently disabled.
Migration
- Replace chunks_or_sections_to_search_docs(...) with SearchDoc.chunks_or_sections_to_search_docs(...).
- Import PromptSnapshot from onyx.chat.prompt_builder.schemas.

---

Based on: onyx-dot-app/onyx#5478

cubic-dev-ai

10 issues found across 15 files

Prompt for AI agents (all 10 issues)


Understand the root cause of the following 10 issues and fix them.


<file name="backend/onyx/llm/utils.py">

<violation number="1" location="backend/onyx/llm/utils.py:60">
Eagerly importing litellm exceptions before an early-return path; move imports after the custom mapping check to avoid unnecessary module load.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.*

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Low-impact micro-optimization; imports already function-scoped. Not worth reporting.</violation>

<violation number="2" location="backend/onyx/llm/utils.py:134">
Import read_pdf_file only inside the UnicodeDecodeError fallback to avoid unnecessary import when not needed.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.*</violation>

<violation number="3" location="backend/onyx/llm/utils.py:358">
Defer tiktoken import to inside the encode_fn is None branch to avoid unnecessary import when a custom encoder is provided.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.*</violation>
</file>

<file name="backend/onyx/natural_language_processing/search_nlp_models.py">

<violation number="1" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
Unconditional import of `openai` in embed loads the heavy SDK for all providers, defeating lazy-loading and increasing worker memory usage. Handle authentication generically or confine OpenAI-specific error handling to OpenAI paths.</violation>
</file>

<file name="backend/onyx/background/indexing/run_docfetching.py">

<violation number="1" location="backend/onyx/background/indexing/run_docfetching.py:101">
Local import placed outside try in _get_connector_runner allows ImportError to escape and leaves index attempt stuck (not marked failed).</violation>
</file>

<file name="backend/onyx/file_processing/extract_file_text.py">

<violation number="1" location="backend/onyx/file_processing/extract_file_text.py:39">
Lazy import of `markitdown` lacks ImportError handling; docx/pptx processing will crash if the dependency is missing. Add a graceful fallback.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: markitdown is a declared dependency; ImportError is unlikely. Outer try/except in _extract_text_and_images prevents crashes; extract_file_text’s raising is intentional. Not high-impact.</violation>

<violation number="2" location="backend/onyx/file_processing/extract_file_text.py:363">
Runtime import of `markitdown` exceptions/classes lacks ImportError handling; docx extraction will error instead of falling back.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Lazy import in docx_to_text_and_images would be caught by outer handler; with required dependency, ImportError concerns aren’t actionable.</violation>
</file>

<file name="backend/onyx/context/search/models.py">

<violation number="1" location="backend/onyx/context/search/models.py:362">
Invalid type annotation: mixing a string literal with `| None` will raise a TypeError at import. Use a real `Sequence[...] | None` type or quote the entire annotation.</violation>

<violation number="2" location="backend/onyx/context/search/models.py:379">
Potential KeyError: `source_links` is a dict; accessing `[0]` assumes key 0 exists. Use `.get(0)` to safely retrieve the first link by that key or return None.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Access pattern `source_links[0]` appears to be a project-wide invariant when `source_links` is set; low likelihood of KeyError and widely used elsewhere. Not worth reporting.</violation>
</file>

<file name="backend/onyx/chat/prompt_builder/schemas.py">

<violation number="1" location="backend/onyx/chat/prompt_builder/schemas.py:7">
`PromptSnapshot` includes LangChain `BaseMessage` instances but does not enable pydantic&#39;s `arbitrary_types_allowed`, risking validation/runtime errors when constructing the model.

(Based on your team&#39;s feedback about LangChain using pydantic v1 and the need to allow arbitrary types when nesting LangChain models inside pydantic v2 models, as seen in existing patterns.) [FEEDBACK_USED]</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-dev-ai · 2025-09-24T16:51:07Z

backend/onyx/llm/utils.py

    function. If none is provided, default to the tiktoken encoder used by GPT-3.5
    and GPT-4.
    """
+    import tiktoken


Defer tiktoken import to inside the encode_fn is None branch to avoid unnecessary import when a custom encoder is provided.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.

Prompt for AI agents

Address the following comment on backend/onyx/llm/utils.py at line 358: <comment>Defer tiktoken import to inside the encode_fn is None branch to avoid unnecessary import when a custom encoder is provided. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.*</comment> <file context> @@ -355,6 +355,7 @@ def check_number_of_tokens( function. If none is provided, default to the tiktoken encoder used by GPT-3.5 and GPT-4. """ + import tiktoken if encode_fn is None: </file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

cubic-dev-ai · 2025-09-24T16:51:07Z

backend/onyx/llm/utils.py

    files: list[InMemoryChatFile] | None = None,
 ) -> str:
    """Applies all non-image files."""
+    from onyx.file_processing.extract_file_text import read_pdf_file


Import read_pdf_file only inside the UnicodeDecodeError fallback to avoid unnecessary import when not needed.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.

Prompt for AI agents

Address the following comment on backend/onyx/llm/utils.py at line 134: <comment>Import read_pdf_file only inside the UnicodeDecodeError fallback to avoid unnecessary import when not needed. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.*</comment> <file context> @@ -133,6 +131,8 @@ def _build_content( files: list[InMemoryChatFile] | None = None, ) -> str: """Applies all non-image files.""" + from onyx.file_processing.extract_file_text import read_pdf_file + if not files: </file context>

[internal] Confidence score: 7/10

[internal] Posted by: General AI Review Agent

cubic-dev-ai · 2025-09-24T16:51:08Z

backend/onyx/llm/utils.py

        dict[str, str] | None
    ) = LITELLM_CUSTOM_ERROR_MESSAGE_MAPPINGS,
 ) -> str:
+    from litellm.exceptions import BadRequestError


Eagerly importing litellm exceptions before an early-return path; move imports after the custom mapping check to avoid unnecessary module load.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Low-impact micro-optimization; imports already function-scoped. Not worth reporting.

Prompt for AI agents

Address the following comment on backend/onyx/llm/utils.py at line 60: <comment>Eagerly importing litellm exceptions before an early-return path; move imports after the custom mapping check to avoid unnecessary module load. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.* DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Low-impact micro-optimization; imports already function-scoped. Not worth reporting.</comment> <file context> @@ -72,6 +57,19 @@ def litellm_exception_to_error_msg( dict[str, str] | None ) = LITELLM_CUSTOM_ERROR_MESSAGE_MAPPINGS, ) -> str: + from litellm.exceptions import BadRequestError + from litellm.exceptions import AuthenticationError + from litellm.exceptions import PermissionDeniedError </file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

cubic-dev-ai · 2025-09-24T16:51:08Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


Unconditional import of openai in embed loads the heavy SDK for all providers, defeating lazy-loading and increasing worker memory usage. Handle authentication generically or confine OpenAI-specific error handling to OpenAI paths.

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>Unconditional import of `openai` in embed loads the heavy SDK for all providers, defeating lazy-loading and increasing worker memory usage. Handle authentication generically or confine OpenAI-specific error handling to OpenAI paths.</comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-ai · 2025-09-24T16:51:08Z

backend/onyx/background/indexing/run_docfetching.py

    are the complete list of existing documents of the connector. If the task
    of type LOAD_STATE, the list will be considered complete and otherwise incomplete.
    """
+    from onyx.connectors.factory import instantiate_connector


Local import placed outside try in _get_connector_runner allows ImportError to escape and leaves index attempt stuck (not marked failed).

Prompt for AI agents

Address the following comment on backend/onyx/background/indexing/run_docfetching.py at line 101: <comment>Local import placed outside try in _get_connector_runner allows ImportError to escape and leaves index attempt stuck (not marked failed).</comment> <file context> @@ -100,6 +98,8 @@ def _get_connector_runner( are the complete list of existing documents of the connector. If the task of type LOAD_STATE, the list will be considered complete and otherwise incomplete. """ + from onyx.connectors.factory import instantiate_connector + task = attempt.connector_credential_pair.connector.input_type </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-dev-ai · 2025-09-24T16:51:08Z

backend/onyx/file_processing/extract_file_text.py

    The images list returned is empty in this case.
    """
    md = get_markitdown_converter()
+    from markitdown import (


Runtime import of markitdown exceptions/classes lacks ImportError handling; docx extraction will error instead of falling back.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Lazy import in docx_to_text_and_images would be caught by outer handler; with required dependency, ImportError concerns aren’t actionable.

Prompt for AI agents

Address the following comment on backend/onyx/file_processing/extract_file_text.py at line 363: <comment>Runtime import of `markitdown` exceptions/classes lacks ImportError handling; docx extraction will error instead of falling back. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Lazy import in docx_to_text_and_images would be caught by outer handler; with required dependency, ImportError concerns aren’t actionable.</comment> <file context> @@ -358,6 +360,12 @@ def docx_to_text_and_images( The images list returned is empty in this case. """ md = get_markitdown_converter() + from markitdown import ( + StreamInfo, + FileConversionException, </file context>

[internal] Confidence score: 7/10

[internal] Posted by: General AI Review Agent

cubic-dev-ai · 2025-09-24T16:51:08Z

backend/onyx/file_processing/extract_file_text.py

 from onyx.utils.logger import setup_logger

+if TYPE_CHECKING:
+    from markitdown import MarkItDown


Lazy import of markitdown lacks ImportError handling; docx/pptx processing will crash if the dependency is missing. Add a graceful fallback.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: markitdown is a declared dependency; ImportError is unlikely. Outer try/except in _extract_text_and_images prevents crashes; extract_file_text’s raising is intentional. Not high-impact.

Prompt for AI agents

Address the following comment on backend/onyx/file_processing/extract_file_text.py at line 39: <comment>Lazy import of `markitdown` lacks ImportError handling; docx/pptx processing will crash if the dependency is missing. Add a graceful fallback. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: markitdown is a declared dependency; ImportError is unlikely. Outer try/except in _extract_text_and_images prevents crashes; extract_file_text’s raising is intentional. Not high-impact.</comment> <file context> @@ -37,6 +35,8 @@ from onyx.utils.logger import setup_logger +if TYPE_CHECKING: + from markitdown import MarkItDown logger = setup_logger() </file context>

[internal] Confidence score: 7/10

[internal] Posted by: General AI Review Agent

cubic-dev-ai · 2025-09-24T16:51:08Z

backend/onyx/context/search/models.py

+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or "Unknown",
+                link=chunk.source_links[0] if chunk.source_links else None,


Potential KeyError: source_links is a dict; accessing [0] assumes key 0 exists. Use .get(0) to safely retrieve the first link by that key or return None.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Access pattern source_links[0] appears to be a project-wide invariant when source_links is set; low likelihood of KeyError and widely used elsewhere. Not worth reporting.

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 379: <comment>Potential KeyError: `source_links` is a dict; accessing `[0]` assumes key 0 exists. Use `.get(0)` to safely retrieve the first link by that key or return None. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Access pattern `source_links[0]` appears to be a project-wide invariant when `source_links` is set; low likelihood of KeyError and widely used elsewhere. Not worth reporting.</comment> <file context> @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + ).document_id, + chunk_ind=chunk.chunk_id, + semantic_identifier=chunk.semantic_identifier or "Unknown", + link=chunk.source_links[0] if chunk.source_links else None, + blurb=chunk.blurb, + source_type=chunk.source_type, </file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

cubic-dev-ai · 2025-09-24T16:51:08Z

backend/onyx/context/search/models.py

+    @classmethod
+    def chunks_or_sections_to_search_docs(
+        cls,
+        items: "Sequence[InferenceChunk | InferenceSection] | None",


Invalid type annotation: mixing a string literal with | None will raise a TypeError at import. Use a real Sequence[...] | None type or quote the entire annotation.

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 362: <comment>Invalid type annotation: mixing a string literal with `| None` will raise a TypeError at import. Use a real `Sequence[...] | None` type or quote the entire annotation.</comment> <file context> @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + @classmethod + def chunks_or_sections_to_search_docs( + cls, + items: "Sequence[InferenceChunk | InferenceSection] | None", + ) -> list["SearchDoc"]: + """Convert a sequence of InferenceChunk or InferenceSection objects to SearchDoc objects.""" </file context>

[internal] Confidence score: 10/10

[internal] Posted by: General AI Review Agent

Suggested change

items: "Sequence[InferenceChunk | InferenceSection] | None",

items: Sequence[InferenceChunk | InferenceSection] | None,

cubic-dev-ai · 2025-09-24T16:51:08Z

backend/onyx/chat/prompt_builder/schemas.py

+from onyx.llm.models import PreviousMessage
+
+
+class PromptSnapshot(BaseModel):


`PromptSnapshot` includes LangChain `BaseMessage` instances but does not enable pydantic's `arbitrary_types_allowed`, risking validation/runtime errors when constructing the model. (Based on your team's feedback about LangChain using pydantic v1 and the need to allow arbitrary types when nesting LangChain models inside pydantic v2 models, as seen in existing patterns.)

Prompt for AI agents
~~~ Address the following comment on backend/onyx/chat/prompt_builder/schemas.py at line 7: `PromptSnapshot` includes LangChain `BaseMessage` instances but does not enable pydantic's `arbitrary_types_allowed`, risking validation/runtime errors when constructing the model. (Based on your team's feedback about LangChain using pydantic v1 and the need to allow arbitrary types when nesting LangChain models inside pydantic v2 models, as seen in existing patterns.) @@ -0,0 +1,10 @@ +from onyx.llm.models import PreviousMessage + + +class PromptSnapshot(BaseModel): + raw_message_history: list[PreviousMessage] + raw_user_query: str ~~~
[internal] *Confidence score: 9/10* [internal] *Posted by: General AI Review Agent* [View Feedback](https://www.cubic.dev/ai-review?tab=learnings&feedbackId=langchain_pydantic_v1_arbitrary_types&repo=1063449229)

sanxroz · 2025-09-25T04:19:48Z

@cubic-dev-local review this

cubic-dev-local · 2025-09-25T04:20:03Z

@cubic-dev-local review this

@sanxroz I've started the AI code review. It'll take a few minutes to complete.

cubic-dev-local

8 issues found across 15 files

Prompt for AI agents (all 8 issues)


Understand the root cause of the following 8 issues and fix them.


<file name="backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py">

<violation number="1" location="backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py:13">
two-agent-filter: Group imports from the same module into a single statement to reduce duplication and improve readability.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: commentPurpose, functionalImpact, objectivity.*

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Stylistic-only import grouping suggestion with no functional or maintainability impact; per criteria, filter out low-importance style issues.</violation>

<violation number="2" location="backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py:13">
single-agent-filter: Group imports from the same module into a single statement to reduce duplication and improve readability.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: commentPurpose, functionalImpact, objectivity.*

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Technically correct but purely stylistic (import grouping). No functional, performance, or maintainability impact in this context; not worth reporting per criteria.</violation>
</file>

<file name="backend/onyx/natural_language_processing/search_nlp_models.py">

<violation number="1" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
two-agent-filter: Unconditionally importing `openai` in `embed()` loads the heavy SDK for all providers and creates an unnecessary hard dependency. If `openai` isn&#39;t installed, this can raise `ModuleNotFoundError` even when using non-OpenAI providers. Move exception handling into `_embed_openai` or import only when `self.provider == EmbeddingProvider.OPENAI`. 

• **Libraries consulted**:</violation>

<violation number="2" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
single-agent-filter: Unconditionally importing `openai` in `embed()` loads the heavy SDK for all providers and creates an unnecessary hard dependency. If `openai` isn&#39;t installed, this can raise `ModuleNotFoundError` even when using non-OpenAI providers. Move exception handling into `_embed_openai` or import only when `self.provider == EmbeddingProvider.OPENAI`. 

• **Libraries consulted**:</violation>

<violation number="3" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
two-agent-filter: Non-OpenAI providers crash due to unconditional import of openai in embed; async @retry is ineffective on coroutines; Azure path may use &#39;azure/None&#39;; API clients not closed; requests errors re-raised as httpx.HTTPError. 

• **Libraries consulted**: Python retry decorator library &#39;retry&#39; async support, Tenacity</violation>

<violation number="4" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
single-agent-filter: Non-OpenAI providers crash due to unconditional import of openai in embed; async @retry is ineffective on coroutines; Azure path may use &#39;azure/None&#39;; API clients not closed; requests errors re-raised as httpx.HTTPError. 

• **Libraries consulted**: Python retry decorator library &#39;retry&#39; async support, invl/retry Python</violation>
</file>

<file name="backend/onyx/context/search/models.py">

<violation number="1" location="backend/onyx/context/search/models.py:379">
two-agent-filter: Directly accessing chunk.source_links[0] assumes key 0 exists on a dict[int, str]; if source_links is present but lacks key 0, this raises KeyError. Use a safe first-value retrieval or get(0).

(Based on your team&#39;s feedback about consolidating conversion into SearchDoc to avoid brittle assumptions, this influenced checking for safer link extraction.) [FEEDBACK_USED] 

• **Libraries consulted**:</violation>

<violation number="2" location="backend/onyx/context/search/models.py:379">
single-agent-filter: Directly accessing chunk.source_links[0] assumes key 0 exists on a dict[int, str]; if source_links is present but lacks key 0, this raises KeyError. Use a safe first-value retrieval or get(0).

(Based on your team&#39;s feedback about consolidating conversion into SearchDoc to avoid brittle assumptions, this influenced checking for safer link extraction.) [FEEDBACK_USED] 

• **Libraries consulted**:</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-dev-local · 2025-09-25T04:32:43Z

backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py

 )
 from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event
 from onyx.context.search.models import SavedSearchDoc
+from onyx.context.search.models import SearchDoc


two-agent-filter: Group imports from the same module into a single statement to reduce duplication and improve readability.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: commentPurpose, functionalImpact, objectivity.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Stylistic-only import grouping suggestion with no functional or maintainability impact; per criteria, filter out low-importance style issues.

Prompt for AI agents

Address the following comment on backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py at line 13: <comment>two-agent-filter: Group imports from the same module into a single statement to reduce duplication and improve readability. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: commentPurpose, functionalImpact, objectivity.* DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Stylistic-only import grouping suggestion with no functional or maintainability impact; per criteria, filter out low-importance style issues.</comment> <file context> @@ -5,12 +5,12 @@ ) from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event from onyx.context.search.models import SavedSearchDoc +from onyx.context.search.models import SearchDoc from onyx.server.query_and_chat.streaming_models import SectionEnd from onyx.utils.logger import setup_logger </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-local · 2025-09-25T04:32:43Z

backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py

 )
 from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event
 from onyx.context.search.models import SavedSearchDoc
+from onyx.context.search.models import SearchDoc


single-agent-filter: Group imports from the same module into a single statement to reduce duplication and improve readability.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: commentPurpose, functionalImpact, objectivity.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Technically correct but purely stylistic (import grouping). No functional, performance, or maintainability impact in this context; not worth reporting per criteria.

Prompt for AI agents

Address the following comment on backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py at line 13: <comment>single-agent-filter: Group imports from the same module into a single statement to reduce duplication and improve readability. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: commentPurpose, functionalImpact, objectivity.* DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Technically correct but purely stylistic (import grouping). No functional, performance, or maintainability impact in this context; not worth reporting per criteria.</comment> <file context> @@ -5,12 +5,12 @@ ) from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event from onyx.context.search.models import SavedSearchDoc +from onyx.context.search.models import SearchDoc from onyx.server.query_and_chat.streaming_models import SectionEnd from onyx.utils.logger import setup_logger </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-local · 2025-09-25T04:32:43Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


two-agent-filter: Unconditionally importing openai in embed() loads the heavy SDK for all providers and creates an unnecessary hard dependency. If openai isn't installed, this can raise ModuleNotFoundError even when using non-OpenAI providers. Move exception handling into _embed_openai or import only when self.provider == EmbeddingProvider.OPENAI.

• Libraries consulted:

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>two-agent-filter: Unconditionally importing `openai` in `embed()` loads the heavy SDK for all providers and creates an unnecessary hard dependency. If `openai` isn't installed, this can raise `ModuleNotFoundError` even when using non-OpenAI providers. Move exception handling into `_embed_openai` or import only when `self.provider == EmbeddingProvider.OPENAI`. • **Libraries consulted**: </comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-local · 2025-09-25T04:32:43Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


single-agent-filter: Unconditionally importing openai in embed() loads the heavy SDK for all providers and creates an unnecessary hard dependency. If openai isn't installed, this can raise ModuleNotFoundError even when using non-OpenAI providers. Move exception handling into _embed_openai or import only when self.provider == EmbeddingProvider.OPENAI.

• Libraries consulted:

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>single-agent-filter: Unconditionally importing `openai` in `embed()` loads the heavy SDK for all providers and creates an unnecessary hard dependency. If `openai` isn't installed, this can raise `ModuleNotFoundError` even when using non-OpenAI providers. Move exception handling into `_embed_openai` or import only when `self.provider == EmbeddingProvider.OPENAI`. • **Libraries consulted**: </comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-local · 2025-09-25T04:32:43Z

backend/onyx/context/search/models.py

+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or "Unknown",
+                link=chunk.source_links[0] if chunk.source_links else None,


two-agent-filter: Directly accessing chunk.source_links[0] assumes key 0 exists on a dict[int, str]; if source_links is present but lacks key 0, this raises KeyError. Use a safe first-value retrieval or get(0). (Based on your team's feedback about consolidating conversion into SearchDoc to avoid brittle assumptions, this influenced checking for safer link extraction.) • **Libraries consulted**:

Prompt for AI agents
~~~ Address the following comment on backend/onyx/context/search/models.py at line 379: two-agent-filter: Directly accessing chunk.source_links[0] assumes key 0 exists on a dict[int, str]; if source_links is present but lacks key 0, this raises KeyError. Use a safe first-value retrieval or get(0). (Based on your team's feedback about consolidating conversion into SearchDoc to avoid brittle assumptions, this influenced checking for safer link extraction.) • **Libraries consulted**: @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + ).document_id, + chunk_ind=chunk.chunk_id, + semantic_identifier=chunk.semantic_identifier or "Unknown", + link=chunk.source_links[0] if chunk.source_links else None, + blurb=chunk.blurb, + source_type=chunk.source_type, ~~~
[internal] *Confidence score: 9/10* [internal] *Posted by: General AI Review Agent* [View Feedback](http://localhost:3000/ai-review?tab=learnings&feedbackId=DAN-2573&repo=1063449229)

cubic-dev-local · 2025-09-25T04:32:43Z

backend/onyx/context/search/models.py

+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or "Unknown",
+                link=chunk.source_links[0] if chunk.source_links else None,


single-agent-filter: Directly accessing chunk.source_links[0] assumes key 0 exists on a dict[int, str]; if source_links is present but lacks key 0, this raises KeyError. Use a safe first-value retrieval or get(0). (Based on your team's feedback about consolidating conversion into SearchDoc to avoid brittle assumptions, this influenced checking for safer link extraction.) • **Libraries consulted**:

Prompt for AI agents
~~~ Address the following comment on backend/onyx/context/search/models.py at line 379: single-agent-filter: Directly accessing chunk.source_links[0] assumes key 0 exists on a dict[int, str]; if source_links is present but lacks key 0, this raises KeyError. Use a safe first-value retrieval or get(0). (Based on your team's feedback about consolidating conversion into SearchDoc to avoid brittle assumptions, this influenced checking for safer link extraction.) • **Libraries consulted**: @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + ).document_id, + chunk_ind=chunk.chunk_id, + semantic_identifier=chunk.semantic_identifier or "Unknown", + link=chunk.source_links[0] if chunk.source_links else None, + blurb=chunk.blurb, + source_type=chunk.source_type, ~~~
[internal] *Confidence score: 9/10* [internal] *Posted by: General AI Review Agent* [View Feedback](http://localhost:3000/ai-review?tab=learnings&feedbackId=DAN-2573&repo=1063449229)

cubic-dev-local · 2025-09-25T04:32:43Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


two-agent-filter: Non-OpenAI providers crash due to unconditional import of openai in embed; async @Retry is ineffective on coroutines; Azure path may use 'azure/None'; API clients not closed; requests errors re-raised as httpx.HTTPError.

• Libraries consulted: Python retry decorator library 'retry' async support, Tenacity

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>two-agent-filter: Non-OpenAI providers crash due to unconditional import of openai in embed; async @retry is ineffective on coroutines; Azure path may use 'azure/None'; API clients not closed; requests errors re-raised as httpx.HTTPError. • **Libraries consulted**: Python retry decorator library 'retry' async support, Tenacity</comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-dev-local · 2025-09-25T04:32:44Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


single-agent-filter: Non-OpenAI providers crash due to unconditional import of openai in embed; async @Retry is ineffective on coroutines; Azure path may use 'azure/None'; API clients not closed; requests errors re-raised as httpx.HTTPError.

• Libraries consulted: Python retry decorator library 'retry' async support, invl/retry Python

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>single-agent-filter: Non-OpenAI providers crash due to unconditional import of openai in embed; async @retry is ineffective on coroutines; Azure path may use 'azure/None'; API clients not closed; requests errors re-raised as httpx.HTTPError. • **Libraries consulted**: Python retry decorator library 'retry' async support, invl/retry Python</comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

sanxroz · 2025-09-25T13:35:16Z

@cubic-dev-local review this

cubic-dev-local · 2025-09-25T13:35:45Z

@cubic-dev-local review this

@sanxroz I've started the AI code review. It'll take a few minutes to complete.

cubic-dev-local

18 issues found across 15 files

Prompt for AI agents (all 18 issues)


Understand the root cause of the following 18 issues and fix them.


<file name="backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py">

<violation number="1" location="backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py:13">
two-agent-filter: Duplicate imports from the same module on separate lines. Combine into a single import for clarity and style consistency.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact, objectivity.*

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Purely stylistic duplicate import from same module; no functional or maintainability impact. Filter out per selective criteria.</violation>

<violation number="2" location="backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py:13">
single-agent-filter: Duplicate imports from the same module on separate lines. Combine into a single import for clarity and style consistency.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact, objectivity.*

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Separate imports of different names from the same module are valid and common in Python. This is purely stylistic with no functional or maintainability impact; per guidelines, such low-impact style issues should be filtered out.</violation>
</file>

<file name="backend/onyx/natural_language_processing/search_nlp_models.py">

<violation number="1" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
two-agent-filter: Unconditional import of openai inside embed() forces OpenAI dependency for all providers and undermines lazy-loading; it can raise ImportError in non-OpenAI environments. Move the import and exception handling inside the OPENAI branch. 

• **Libraries consulted**:</violation>

<violation number="2" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
single-agent-filter: Unconditional import of openai inside embed() forces OpenAI dependency for all providers and undermines lazy-loading; it can raise ImportError in non-OpenAI environments. Move the import and exception handling inside the OPENAI branch. 

• **Libraries consulted**:</violation>

<violation number="3" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
two-agent-filter: Unconditional import of openai in embed() causes ModuleNotFoundError for non-OpenAI providers, breaking embeddings and making auth handling unreachable. 

• **Libraries consulted**:</violation>

<violation number="4" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
single-agent-filter: Unconditional import of openai in embed() causes ModuleNotFoundError for non-OpenAI providers, breaking embeddings and making auth handling unreachable. 

• **Libraries consulted**:</violation>
</file>

<file name="backend/onyx/agents/agent_search/dr/utils.py">

<violation number="1" location="backend/onyx/agents/agent_search/dr/utils.py:269">
two-agent-filter: Conversion logic from InferenceSection to SavedSearchDoc with default db_doc_id=0 duplicates functionality in `backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py` and `backend/onyx/chat/chat_utils.py`. This logic should be centralized into a shared utility function.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Low-impact and partially inaccurate. Code already uses centralized classmethods (SearchDoc.chunks_or_sections_to_search_docs and SavedSearchDoc.from_search_doc). Remaining repetition is a trivial list comprehension with db_doc_id=0; not substantive duplication to justify a shared helper.</violation>

<violation number="2" location="backend/onyx/agents/agent_search/dr/utils.py:269">
single-agent-filter: Conversion logic from InferenceSection to SavedSearchDoc with default db_doc_id=0 duplicates functionality in `backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py` and `backend/onyx/chat/chat_utils.py`. This logic should be centralized into a shared utility function. 

• **Libraries consulted**:</violation>
</file>

<file name="backend/onyx/context/search/models.py">

<violation number="1" location="backend/onyx/context/search/models.py:379">
two-agent-filter: Directly indexing `source_links[0]` can raise KeyError when the dict exists but lacks key 0; use .get(0) for safety. 

• **Libraries consulted**:</violation>

<violation number="2" location="backend/onyx/context/search/models.py:379">
single-agent-filter: Directly indexing `source_links[0]` can raise KeyError when the dict exists but lacks key 0; use .get(0) for safety. 

• **Libraries consulted**:</violation>

<violation number="3" location="backend/onyx/context/search/models.py:379">
two-agent-filter: Dict index by fixed key 0 can raise KeyError when source_links lacks key 0. 

• **Libraries consulted**:</violation>

<violation number="4" location="backend/onyx/context/search/models.py:379">
single-agent-filter: Dict index by fixed key 0 can raise KeyError when source_links lacks key 0. 

• **Libraries consulted**:</violation>
</file>

<file name="backend/onyx/chat/prompt_builder/schemas.py">

<violation number="1" location="backend/onyx/chat/prompt_builder/schemas.py:10">
two-agent-filter: Pydantic v2 model includes `list[BaseMessage]` (external type) without enabling `arbitrary_types_allowed`; this can raise validation errors when instantiating `PromptSnapshot` with langchain messages.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: LangChain BaseMessage is a Pydantic model; list[BaseMessage] fields validate without arbitrary_types_allowed in Pydantic v2. No concrete evidence of instantiation errors, and similar config in other files likely targets non-Pydantic types. High risk of false positive.

• **Libraries consulted**: Pydantic v2 arbitrary_types_allowed, LangChain BaseMessage messages, Pydantic, Langchain, Python_langchain</violation>

<violation number="2" location="backend/onyx/chat/prompt_builder/schemas.py:10">
single-agent-filter: Pydantic v2 model includes `list[BaseMessage]` (external type) without enabling `arbitrary_types_allowed`; this can raise validation errors when instantiating `PromptSnapshot` with langchain messages.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Uncertain. While the repo pins Pydantic v2 and uses arbitrary_types_allowed in several models, v2 generally supports arbitrary external types without explicit config, and documentation evidence here doesn’t conclusively show that a BaseModel field of list[BaseMessage] will fail validation. Without a reproducible error or usage context showing failure, this is too speculative to report.

• **Libraries consulted**: Pydantic v2 arbitrary types allowed, Pydantic</violation>
</file>

<file name="backend/onyx/background/celery/tasks/docprocessing/tasks.py">

<violation number="1" location="backend/onyx/background/celery/tasks/docprocessing/tasks.py:1270">
two-agent-filter: Heartbeat timeout is mutated per-loop and persists across attempts, delaying failure detection for others. 

• **Libraries consulted**:</violation>

<violation number="2" location="backend/onyx/background/celery/tasks/docprocessing/tasks.py:1270">
single-agent-filter: Heartbeat timeout is mutated per-loop and persists across attempts, delaying failure detection for others. 

• **Libraries consulted**:</violation>
</file>

<file name="backend/onyx/background/indexing/run_docfetching.py">

<violation number="1" location="backend/onyx/background/indexing/run_docfetching.py:101">
two-agent-filter: Early exceptions before try/except in connector_document_extraction cause stuck IN_PROGRESS attempts and MemoryTracer not stopped due to lazy import in _get_connector_runner.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Technically valid but redundant; subsumed by the broader issue that connector_document_extraction’s outer try/finally starts too late. Addressing that covers this and other early exceptions.</violation>

<violation number="2" location="backend/onyx/background/indexing/run_docfetching.py:101">
single-agent-filter: Early exceptions before try/except in connector_document_extraction cause stuck IN_PROGRESS attempts and MemoryTracer not stopped due to lazy import in _get_connector_runner.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Filter as overlapping with #1 and contains a minor inaccuracy: MemoryTracer starts after get_document_batch_storage, so a failure there wouldn’t leak the tracer. The core issue (early exceptions before the try/finally) is better captured by #1.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-dev-local · 2025-09-25T13:48:16Z

backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py

 )
 from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event
 from onyx.context.search.models import SavedSearchDoc
+from onyx.context.search.models import SearchDoc


two-agent-filter: Duplicate imports from the same module on separate lines. Combine into a single import for clarity and style consistency.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact, objectivity.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Purely stylistic duplicate import from same module; no functional or maintainability impact. Filter out per selective criteria.

Prompt for AI agents

Address the following comment on backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py at line 13: <comment>two-agent-filter: Duplicate imports from the same module on separate lines. Combine into a single import for clarity and style consistency. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact, objectivity.* DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Purely stylistic duplicate import from same module; no functional or maintainability impact. Filter out per selective criteria.</comment> <file context> @@ -5,12 +5,12 @@ ) from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event from onyx.context.search.models import SavedSearchDoc +from onyx.context.search.models import SearchDoc from onyx.server.query_and_chat.streaming_models import SectionEnd from onyx.utils.logger import setup_logger </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-local · 2025-09-25T13:48:16Z

backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py

 )
 from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event
 from onyx.context.search.models import SavedSearchDoc
+from onyx.context.search.models import SearchDoc


single-agent-filter: Duplicate imports from the same module on separate lines. Combine into a single import for clarity and style consistency.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact, objectivity.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Separate imports of different names from the same module are valid and common in Python. This is purely stylistic with no functional or maintainability impact; per guidelines, such low-impact style issues should be filtered out.

Prompt for AI agents

Address the following comment on backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py at line 13: <comment>single-agent-filter: Duplicate imports from the same module on separate lines. Combine into a single import for clarity and style consistency. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact, objectivity.* DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Separate imports of different names from the same module are valid and common in Python. This is purely stylistic with no functional or maintainability impact; per guidelines, such low-impact style issues should be filtered out.</comment> <file context> @@ -5,12 +5,12 @@ ) from onyx.agents.agent_search.shared_graph_utils.utils import write_custom_event from onyx.context.search.models import SavedSearchDoc +from onyx.context.search.models import SearchDoc from onyx.server.query_and_chat.streaming_models import SectionEnd from onyx.utils.logger import setup_logger </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-local · 2025-09-25T13:48:16Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


two-agent-filter: Unconditional import of openai inside embed() forces OpenAI dependency for all providers and undermines lazy-loading; it can raise ImportError in non-OpenAI environments. Move the import and exception handling inside the OPENAI branch.

• Libraries consulted:

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>two-agent-filter: Unconditional import of openai inside embed() forces OpenAI dependency for all providers and undermines lazy-loading; it can raise ImportError in non-OpenAI environments. Move the import and exception handling inside the OPENAI branch. • **Libraries consulted**: </comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-local · 2025-09-25T13:48:16Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


single-agent-filter: Unconditional import of openai inside embed() forces OpenAI dependency for all providers and undermines lazy-loading; it can raise ImportError in non-OpenAI environments. Move the import and exception handling inside the OPENAI branch.

• Libraries consulted:

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>single-agent-filter: Unconditional import of openai inside embed() forces OpenAI dependency for all providers and undermines lazy-loading; it can raise ImportError in non-OpenAI environments. Move the import and exception handling inside the OPENAI branch. • **Libraries consulted**: </comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-local · 2025-09-25T13:48:16Z

backend/onyx/agents/agent_search/dr/utils.py

 ) -> list[SavedSearchDoc]:
    # Convert InferenceSections to SavedSearchDocs
-    search_docs = chunks_or_sections_to_search_docs(inference_sections)
+    search_docs = SearchDoc.chunks_or_sections_to_search_docs(inference_sections)


two-agent-filter: Conversion logic from InferenceSection to SavedSearchDoc with default db_doc_id=0 duplicates functionality in backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py and backend/onyx/chat/chat_utils.py. This logic should be centralized into a shared utility function.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Low-impact and partially inaccurate. Code already uses centralized classmethods (SearchDoc.chunks_or_sections_to_search_docs and SavedSearchDoc.from_search_doc). Remaining repetition is a trivial list comprehension with db_doc_id=0; not substantive duplication to justify a shared helper.

Prompt for AI agents

Address the following comment on backend/onyx/agents/agent_search/dr/utils.py at line 269: <comment>two-agent-filter: Conversion logic from InferenceSection to SavedSearchDoc with default db_doc_id=0 duplicates functionality in `backend/onyx/agents/agent_search/dr/sub_agents/basic_search/dr_basic_search_3_reduce.py` and `backend/onyx/chat/chat_utils.py`. This logic should be centralized into a shared utility function. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Low-impact and partially inaccurate. Code already uses centralized classmethods (SearchDoc.chunks_or_sections_to_search_docs and SavedSearchDoc.from_search_doc). Remaining repetition is a trivial list comprehension with db_doc_id=0; not substantive duplication to justify a shared helper.</comment> <file context> @@ -266,7 +266,7 @@ def convert_inference_sections_to_search_docs( ) -> list[SavedSearchDoc]: # Convert InferenceSections to SavedSearchDocs - search_docs = chunks_or_sections_to_search_docs(inference_sections) + search_docs = SearchDoc.chunks_or_sections_to_search_docs(inference_sections) for search_doc in search_docs: search_doc.is_internet = is_internet </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Unmapped Agent in getAgentNameFromViolationSource (undefined)

cubic-dev-local · 2025-09-25T13:48:17Z

backend/onyx/background/celery/tasks/docprocessing/tasks.py

    tenant_id: str,
    batch_num: int,
 ) -> None:
+    from onyx.indexing.indexing_pipeline import run_indexing_pipeline


single-agent-filter: Heartbeat timeout is mutated per-loop and persists across attempts, delaying failure detection for others.

• Libraries consulted:

Prompt for AI agents

Address the following comment on backend/onyx/background/celery/tasks/docprocessing/tasks.py at line 1270: <comment>single-agent-filter: Heartbeat timeout is mutated per-loop and persists across attempts, delaying failure detection for others. • **Libraries consulted**: </comment> <file context> @@ -1268,6 +1267,8 @@ def _docprocessing_task( tenant_id: str, batch_num: int, ) -> None: + from onyx.indexing.indexing_pipeline import run_indexing_pipeline + start_time = time.monotonic() </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-dev-local · 2025-09-25T13:48:17Z

backend/onyx/background/indexing/run_docfetching.py

    are the complete list of existing documents of the connector. If the task
    of type LOAD_STATE, the list will be considered complete and otherwise incomplete.
    """
+    from onyx.connectors.factory import instantiate_connector


two-agent-filter: Early exceptions before try/except in connector_document_extraction cause stuck IN_PROGRESS attempts and MemoryTracer not stopped due to lazy import in _get_connector_runner.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Technically valid but redundant; subsumed by the broader issue that connector_document_extraction’s outer try/finally starts too late. Addressing that covers this and other early exceptions.

Prompt for AI agents

Address the following comment on backend/onyx/background/indexing/run_docfetching.py at line 101: <comment>two-agent-filter: Early exceptions before try/except in connector_document_extraction cause stuck IN_PROGRESS attempts and MemoryTracer not stopped due to lazy import in _get_connector_runner. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Technically valid but redundant; subsumed by the broader issue that connector_document_extraction’s outer try/finally starts too late. Addressing that covers this and other early exceptions.</comment> <file context> @@ -100,6 +98,8 @@ def _get_connector_runner( are the complete list of existing documents of the connector. If the task of type LOAD_STATE, the list will be considered complete and otherwise incomplete. """ + from onyx.connectors.factory import instantiate_connector + task = attempt.connector_credential_pair.connector.input_type </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-dev-local · 2025-09-25T13:48:17Z

backend/onyx/background/indexing/run_docfetching.py

    are the complete list of existing documents of the connector. If the task
    of type LOAD_STATE, the list will be considered complete and otherwise incomplete.
    """
+    from onyx.connectors.factory import instantiate_connector


single-agent-filter: Early exceptions before try/except in connector_document_extraction cause stuck IN_PROGRESS attempts and MemoryTracer not stopped due to lazy import in _get_connector_runner.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Filter as overlapping with #1 and contains a minor inaccuracy: MemoryTracer starts after get_document_batch_storage, so a failure there wouldn’t leak the tracer. The core issue (early exceptions before the try/finally) is better captured by #1.

Prompt for AI agents

Address the following comment on backend/onyx/background/indexing/run_docfetching.py at line 101: <comment>single-agent-filter: Early exceptions before try/except in connector_document_extraction cause stuck IN_PROGRESS attempts and MemoryTracer not stopped due to lazy import in _get_connector_runner. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Filter as overlapping with #1 and contains a minor inaccuracy: MemoryTracer starts after get_document_batch_storage, so a failure there wouldn’t leak the tracer. The core issue (early exceptions before the try/finally) is better captured by #1.</comment> <file context> @@ -100,6 +98,8 @@ def _get_connector_runner( are the complete list of existing documents of the connector. If the task of type LOAD_STATE, the list will be considered complete and otherwise incomplete. """ + from onyx.connectors.factory import instantiate_connector + task = attempt.connector_credential_pair.connector.input_type </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-dev-local · 2025-09-25T13:48:18Z

backend/onyx/context/search/models.py

+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or "Unknown",
+                link=chunk.source_links[0] if chunk.source_links else None,


two-agent-filter: Dict index by fixed key 0 can raise KeyError when source_links lacks key 0.

• Libraries consulted:

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 379: <comment>two-agent-filter: Dict index by fixed key 0 can raise KeyError when source_links lacks key 0. • **Libraries consulted**: </comment> <file context> @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + ).document_id, + chunk_ind=chunk.chunk_id, + semantic_identifier=chunk.semantic_identifier or "Unknown", + link=chunk.source_links[0] if chunk.source_links else None, + blurb=chunk.blurb, + source_type=chunk.source_type, </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-dev-local · 2025-09-25T13:48:20Z

backend/onyx/context/search/models.py

+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or "Unknown",
+                link=chunk.source_links[0] if chunk.source_links else None,


single-agent-filter: Dict index by fixed key 0 can raise KeyError when source_links lacks key 0.

• Libraries consulted:

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 379: <comment>single-agent-filter: Dict index by fixed key 0 can raise KeyError when source_links lacks key 0. • **Libraries consulted**: </comment> <file context> @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + ).document_id, + chunk_ind=chunk.chunk_id, + semantic_identifier=chunk.semantic_identifier or "Unknown", + link=chunk.source_links[0] if chunk.source_links else None, + blurb=chunk.blurb, + source_type=chunk.source_type, </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging

9 issues found across 15 files

Prompt for AI agents (all 9 issues)


Understand the root cause of the following 9 issues and fix them.


<file name="backend/onyx/background/indexing/run_docfetching.py">

<violation number="1" location="backend/onyx/background/indexing/run_docfetching.py:101">
Lazy import placed outside error handling; ImportError would bypass the existing try/except, leading to unhandled failure instead of graceful connector pause/logging.</violation>

<violation number="2" location="backend/onyx/background/indexing/run_docfetching.py:286">
Import is outside the function’s exception handling; an ImportError would escape the try/except that manages indexing failures.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Although the import precedes the try/except in _run_indexing, this function is explicitly marked as legacy/for comparison and has no call sites in this file. Impact is likely negligible; no MemoryTracer is started before the import. Given medium sensitivity and desire to avoid false positives, this is too low-impact/uncertain to report.</violation>
</file>

<file name="backend/onyx/natural_language_processing/search_nlp_models.py">

<violation number="1" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
Unconditional OpenAI import in embed() loads a heavy dependency for non-OpenAI providers and can crash if openai isn’t installed. Move error handling away from requiring this import or scope the import to the OpenAI path.</violation>
</file>

<file name="backend/onyx/chat/prompt_builder/schemas.py">

<violation number="1" location="backend/onyx/chat/prompt_builder/schemas.py:10">
Pydantic v2 model lacks arbitrary_types_allowed while using non-Pydantic type BaseMessage, causing schema/instantiation errors.</violation>
</file>

<file name="backend/onyx/context/search/models.py">

<violation number="1" location="backend/onyx/context/search/models.py:362">
Stringified type annotation is unnecessary here; use a real annotation so the imported Sequence is actually used and annotations remain introspectable, preventing potential unused-import warnings.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: String annotations here are acceptable and consistent with the forward reference in the return type. No functional or maintainability issue is demonstrated; the unused-import warning concern is speculative. This is a stylistic nit, so it should be filtered out.</violation>
</file>

<file name="backend/onyx/llm/utils.py">

<violation number="1" location="backend/onyx/llm/utils.py:134">
Unnecessary unconditional import of read_pdf_file; import on-demand inside the UnicodeDecodeError path to further reduce memory and avoid needless dependency loading.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.*</violation>

<violation number="2" location="backend/onyx/llm/utils.py:358">
Unconditional import of tiktoken; import only when encode_fn is None to avoid unnecessary dependency loading.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.*</violation>

<violation number="3" location="backend/onyx/llm/utils.py:382">
Add a type-ignore comment to this litellm import to avoid type-stub issues and align with repository conventions.

*DEV MODE: This violation would have been filtered out by screening filters. Failing filters: commentPurpose, functionalImpact.*

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Same as 2: # type: ignore unnecessary under current mypy config; low impact.</violation>
</file>

<file name="backend/onyx/file_processing/extract_file_text.py">

<violation number="1" location="backend/onyx/file_processing/extract_file_text.py:39">
Unhandled ImportError in get_markitdown_converter causes docx/pptx processing to raise RuntimeError via extract_file_text instead of graceful fallback.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-staging · 2025-09-26T14:38:45Z

backend/onyx/background/indexing/run_docfetching.py

    2. Embed and index these documents into the chosen datastore (vespa)
    3. Updates Postgres to record the indexed documents + the outcome of this run
    """
+    from onyx.indexing.indexing_pipeline import run_indexing_pipeline


Import is outside the function’s exception handling; an ImportError would escape the try/except that manages indexing failures.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Although the import precedes the try/except in _run_indexing, this function is explicitly marked as legacy/for comparison and has no call sites in this file. Impact is likely negligible; no MemoryTracer is started before the import. Given medium sensitivity and desire to avoid false positives, this is too low-impact/uncertain to report.

Prompt for AI agents

Address the following comment on backend/onyx/background/indexing/run_docfetching.py at line 286: <comment>Import is outside the function’s exception handling; an ImportError would escape the try/except that manages indexing failures. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Although the import precedes the try/except in _run_indexing, this function is explicitly marked as legacy/for comparison and has no call sites in this file. Impact is likely negligible; no MemoryTracer is started before the import. Given medium sensitivity and desire to avoid false positives, this is too low-impact/uncertain to report.</comment> <file context> @@ -283,6 +283,8 @@ def _run_indexing( 2. Embed and index these documents into the chosen datastore (vespa) 3. Updates Postgres to record the indexed documents + the outcome of this run """ + from onyx.indexing.indexing_pipeline import run_indexing_pipeline + start_time = time.monotonic() # jsut used for logging </file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

cubic-staging · 2025-09-26T14:38:45Z

backend/onyx/background/indexing/run_docfetching.py

    are the complete list of existing documents of the connector. If the task
    of type LOAD_STATE, the list will be considered complete and otherwise incomplete.
    """
+    from onyx.connectors.factory import instantiate_connector


Lazy import placed outside error handling; ImportError would bypass the existing try/except, leading to unhandled failure instead of graceful connector pause/logging.

Prompt for AI agents

Address the following comment on backend/onyx/background/indexing/run_docfetching.py at line 101: <comment>Lazy import placed outside error handling; ImportError would bypass the existing try/except, leading to unhandled failure instead of graceful connector pause/logging.</comment> <file context> @@ -100,6 +98,8 @@ def _get_connector_runner( are the complete list of existing documents of the connector. If the task of type LOAD_STATE, the list will be considered complete and otherwise incomplete. """ + from onyx.connectors.factory import instantiate_connector + task = attempt.connector_credential_pair.connector.input_type </file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

cubic-staging · 2025-09-26T14:38:45Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


Unconditional OpenAI import in embed() loads a heavy dependency for non-OpenAI providers and can crash if openai isn’t installed. Move error handling away from requiring this import or scope the import to the OpenAI path.

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>Unconditional OpenAI import in embed() loads a heavy dependency for non-OpenAI providers and can crash if openai isn’t installed. Move error handling away from requiring this import or scope the import to the OpenAI path.</comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-staging · 2025-09-26T14:38:45Z

backend/onyx/chat/prompt_builder/schemas.py

+class PromptSnapshot(BaseModel):
+    raw_message_history: list[PreviousMessage]
+    raw_user_query: str
+    built_prompt: list[BaseMessage]


Pydantic v2 model lacks arbitrary_types_allowed while using non-Pydantic type BaseMessage, causing schema/instantiation errors.

Prompt for AI agents

Address the following comment on backend/onyx/chat/prompt_builder/schemas.py at line 10: <comment>Pydantic v2 model lacks arbitrary_types_allowed while using non-Pydantic type BaseMessage, causing schema/instantiation errors.</comment> <file context> @@ -0,0 +1,10 @@ +class PromptSnapshot(BaseModel): + raw_message_history: list[PreviousMessage] + raw_user_query: str + built_prompt: list[BaseMessage] </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T14:38:45Z

backend/onyx/context/search/models.py

+    @classmethod
+    def chunks_or_sections_to_search_docs(
+        cls,
+        items: "Sequence[InferenceChunk | InferenceSection] | None",


Stringified type annotation is unnecessary here; use a real annotation so the imported Sequence is actually used and annotations remain introspectable, preventing potential unused-import warnings.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: String annotations here are acceptable and consistent with the forward reference in the return type. No functional or maintainability issue is demonstrated; the unused-import warning concern is speculative. This is a stylistic nit, so it should be filtered out.

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 362: <comment>Stringified type annotation is unnecessary here; use a real annotation so the imported Sequence is actually used and annotations remain introspectable, preventing potential unused-import warnings. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: String annotations here are acceptable and consistent with the forward reference in the return type. No functional or maintainability issue is demonstrated; the unused-import warning concern is speculative. This is a stylistic nit, so it should be filtered out.</comment> <file context> @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + @classmethod + def chunks_or_sections_to_search_docs( + cls, + items: "Sequence[InferenceChunk | InferenceSection] | None", + ) -> list["SearchDoc"]: + """Convert a sequence of InferenceChunk or InferenceSection objects to SearchDoc objects.""" </file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

Suggested change

items: "Sequence[InferenceChunk | InferenceSection] | None",

items: Sequence[InferenceChunk | InferenceSection] | None,

cubic-staging · 2025-09-26T14:38:45Z

backend/onyx/llm/utils.py


 @lru_cache(maxsize=1)  # the copy.deepcopy is expensive, so we cache the result
 def get_model_map() -> dict:
+    import litellm


Add a type-ignore comment to this litellm import to avoid type-stub issues and align with repository conventions.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: commentPurpose, functionalImpact.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Same as 2: # type: ignore unnecessary under current mypy config; low impact.

Prompt for AI agents

Address the following comment on backend/onyx/llm/utils.py at line 382: <comment>Add a type-ignore comment to this litellm import to avoid type-stub issues and align with repository conventions. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: commentPurpose, functionalImpact.* DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Same as 2: # type: ignore unnecessary under current mypy config; low impact.</comment> <file context> @@ -378,6 +379,8 @@ def test_llm(llm: LLM) -> str | None: @lru_cache(maxsize=1) # the copy.deepcopy is expensive, so we cache the result def get_model_map() -> dict: + import litellm + starting_map = copy.deepcopy(cast(dict, litellm.model_cost)) </file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

cubic-staging · 2025-09-26T14:38:45Z

backend/onyx/llm/utils.py

    files: list[InMemoryChatFile] | None = None,
 ) -> str:
    """Applies all non-image files."""
+    from onyx.file_processing.extract_file_text import read_pdf_file


Unnecessary unconditional import of read_pdf_file; import on-demand inside the UnicodeDecodeError path to further reduce memory and avoid needless dependency loading.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.

Prompt for AI agents

Address the following comment on backend/onyx/llm/utils.py at line 134: <comment>Unnecessary unconditional import of read_pdf_file; import on-demand inside the UnicodeDecodeError path to further reduce memory and avoid needless dependency loading. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.*</comment> <file context> @@ -133,6 +131,8 @@ def _build_content( files: list[InMemoryChatFile] | None = None, ) -> str: """Applies all non-image files.""" + from onyx.file_processing.extract_file_text import read_pdf_file + if not files: </file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

cubic-staging · 2025-09-26T14:38:45Z

backend/onyx/llm/utils.py

    function. If none is provided, default to the tiktoken encoder used by GPT-3.5
    and GPT-4.
    """
+    import tiktoken


Unconditional import of tiktoken; import only when encode_fn is None to avoid unnecessary dependency loading.

DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.

Prompt for AI agents

Address the following comment on backend/onyx/llm/utils.py at line 358: <comment>Unconditional import of tiktoken; import only when encode_fn is None to avoid unnecessary dependency loading. *DEV MODE: This violation would have been filtered out by screening filters. Failing filters: functionalImpact.*</comment> <file context> @@ -355,6 +355,7 @@ def check_number_of_tokens( function. If none is provided, default to the tiktoken encoder used by GPT-3.5 and GPT-4. """ + import tiktoken if encode_fn is None: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-staging · 2025-09-26T14:38:45Z

backend/onyx/file_processing/extract_file_text.py

 from onyx.utils.logger import setup_logger

+if TYPE_CHECKING:
+    from markitdown import MarkItDown


Unhandled ImportError in get_markitdown_converter causes docx/pptx processing to raise RuntimeError via extract_file_text instead of graceful fallback.

Prompt for AI agents

Address the following comment on backend/onyx/file_processing/extract_file_text.py at line 39: <comment>Unhandled ImportError in get_markitdown_converter causes docx/pptx processing to raise RuntimeError via extract_file_text instead of graceful fallback.</comment> <file context> @@ -37,6 +35,8 @@ from onyx.utils.logger import setup_logger +if TYPE_CHECKING: + from markitdown import MarkItDown logger = setup_logger() </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging

5 issues found across 15 files

Prompt for AI agents (all 5 issues)


Understand the root cause of the following 5 issues and fix them.


<file name="backend/onyx/natural_language_processing/search_nlp_models.py">

<violation number="1" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
Unconditional import of OpenAI inside embed loads the SDK for all providers, defeating lazy-loading and increasing memory on non-OpenAI paths.</violation>
</file>

<file name="backend/onyx/chat/prompt_builder/schemas.py">

<violation number="1" location="backend/onyx/chat/prompt_builder/schemas.py:4">
Module-level import of onyx.llm.models triggers heavy LangChain/LLM dependencies at import time, undermining lazy-loading and increasing memory footprint. Use TYPE_CHECKING and forward references to avoid runtime import, or move this type to a lighter module.</violation>

<violation number="2" location="backend/onyx/chat/prompt_builder/schemas.py:10">
Missing arbitrary_types_allowed for field using external type BaseMessage; PromptSnapshot will fail validation on instantiation.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Likely false positive. LangChain’s BaseMessage and subclasses are Pydantic models in recent versions, so list[BaseMessage] validates without arbitrary_types_allowed. Other files enabling arbitrary types don’t prove it’s needed here, and no failing instantiation is shown. High risk of false positive; filter it out.

• **Libraries consulted**: pydantic arbitrary_types_allowed v2, langchain_core BaseMessage, Pydantic, Langchain</violation>
</file>

<file name="backend/onyx/background/indexing/run_docfetching.py">

<violation number="1" location="backend/onyx/background/indexing/run_docfetching.py:101">
Import of instantiate_connector outside try/except causes unhandled setup exceptions, leaving attempts stuck IN_PROGRESS and skipping CCPair pause.</violation>
</file>

<file name="backend/onyx/context/search/models.py">

<violation number="1" location="backend/onyx/context/search/models.py:379">
Direct indexing chunk.source_links[0] can raise KeyError when dict lacks key 0, causing runtime failure.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-staging · 2025-09-26T16:54:29Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


Unconditional import of OpenAI inside embed loads the SDK for all providers, defeating lazy-loading and increasing memory on non-OpenAI paths.

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>Unconditional import of OpenAI inside embed loads the SDK for all providers, defeating lazy-loading and increasing memory on non-OpenAI paths.</comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-staging · 2025-09-26T16:54:29Z

backend/onyx/chat/prompt_builder/schemas.py

+from langchain_core.messages import BaseMessage
+from pydantic import BaseModel
+
+from onyx.llm.models import PreviousMessage


Module-level import of onyx.llm.models triggers heavy LangChain/LLM dependencies at import time, undermining lazy-loading and increasing memory footprint. Use TYPE_CHECKING and forward references to avoid runtime import, or move this type to a lighter module.

Prompt for AI agents

Address the following comment on backend/onyx/chat/prompt_builder/schemas.py at line 4: <comment>Module-level import of onyx.llm.models triggers heavy LangChain/LLM dependencies at import time, undermining lazy-loading and increasing memory footprint. Use TYPE_CHECKING and forward references to avoid runtime import, or move this type to a lighter module.</comment> <file context> @@ -0,0 +1,10 @@ +from langchain_core.messages import BaseMessage +from pydantic import BaseModel + +from onyx.llm.models import PreviousMessage + + </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-staging · 2025-09-26T16:54:29Z

backend/onyx/chat/prompt_builder/schemas.py

+class PromptSnapshot(BaseModel):
+    raw_message_history: list[PreviousMessage]
+    raw_user_query: str
+    built_prompt: list[BaseMessage]


Missing arbitrary_types_allowed for field using external type BaseMessage; PromptSnapshot will fail validation on instantiation.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Likely false positive. LangChain’s BaseMessage and subclasses are Pydantic models in recent versions, so list[BaseMessage] validates without arbitrary_types_allowed. Other files enabling arbitrary types don’t prove it’s needed here, and no failing instantiation is shown. High risk of false positive; filter it out.

• Libraries consulted: pydantic arbitrary_types_allowed v2, langchain_core BaseMessage, Pydantic, Langchain

Prompt for AI agents

Address the following comment on backend/onyx/chat/prompt_builder/schemas.py at line 10: <comment>Missing arbitrary_types_allowed for field using external type BaseMessage; PromptSnapshot will fail validation on instantiation. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Likely false positive. LangChain’s BaseMessage and subclasses are Pydantic models in recent versions, so list[BaseMessage] validates without arbitrary_types_allowed. Other files enabling arbitrary types don’t prove it’s needed here, and no failing instantiation is shown. High risk of false positive; filter it out. • **Libraries consulted**: pydantic arbitrary_types_allowed v2, langchain_core BaseMessage, Pydantic, Langchain</comment> <file context> @@ -0,0 +1,10 @@ +class PromptSnapshot(BaseModel): + raw_message_history: list[PreviousMessage] + raw_user_query: str + built_prompt: list[BaseMessage] </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T16:54:29Z

backend/onyx/background/indexing/run_docfetching.py

    are the complete list of existing documents of the connector. If the task
    of type LOAD_STATE, the list will be considered complete and otherwise incomplete.
    """
+    from onyx.connectors.factory import instantiate_connector


Import of instantiate_connector outside try/except causes unhandled setup exceptions, leaving attempts stuck IN_PROGRESS and skipping CCPair pause.

Prompt for AI agents

Address the following comment on backend/onyx/background/indexing/run_docfetching.py at line 101: <comment>Import of instantiate_connector outside try/except causes unhandled setup exceptions, leaving attempts stuck IN_PROGRESS and skipping CCPair pause.</comment> <file context> @@ -100,6 +98,8 @@ def _get_connector_runner( are the complete list of existing documents of the connector. If the task of type LOAD_STATE, the list will be considered complete and otherwise incomplete. """ + from onyx.connectors.factory import instantiate_connector + task = attempt.connector_credential_pair.connector.input_type </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T16:54:29Z

backend/onyx/context/search/models.py

+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or "Unknown",
+                link=chunk.source_links[0] if chunk.source_links else None,


Direct indexing chunk.source_links[0] can raise KeyError when dict lacks key 0, causing runtime failure.

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 379: <comment>Direct indexing chunk.source_links[0] can raise KeyError when dict lacks key 0, causing runtime failure.</comment> <file context> @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + ).document_id, + chunk_ind=chunk.chunk_id, + semantic_identifier=chunk.semantic_identifier or "Unknown", + link=chunk.source_links[0] if chunk.source_links else None, + blurb=chunk.blurb, + source_type=chunk.source_type, </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging

3 issues found across 15 files

Prompt for AI agents (all 3 issues)


Understand the root cause of the following 3 issues and fix them.


<file name="backend/onyx/natural_language_processing/search_nlp_models.py">

<violation number="1" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
Non-OpenAI providers crash if openai is not installed due to unconditional import inside embed().</violation>
</file>

<file name="backend/onyx/llm/utils.py">

<violation number="1" location="backend/onyx/llm/utils.py:134">
Unprotected lazy import in _build_content causes ImportError at runtime, crashing any call if dependency is missing.</violation>

<violation number="2" location="backend/onyx/llm/utils.py:382">
Ungarded lazy import of litellm in get_model_map causes ImportError and breaks token/window queries when litellm isn’t installed.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: litellm is a required dependency in backend/requirements; the lazy import inside get_model_map is intentional and not a functional bug. ImportError concern is not applicable to supported deployments.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-staging · 2025-09-26T17:59:28Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


Non-OpenAI providers crash if openai is not installed due to unconditional import inside embed().

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>Non-OpenAI providers crash if openai is not installed due to unconditional import inside embed().</comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T17:59:28Z

backend/onyx/llm/utils.py


 @lru_cache(maxsize=1)  # the copy.deepcopy is expensive, so we cache the result
 def get_model_map() -> dict:
+    import litellm


Ungarded lazy import of litellm in get_model_map causes ImportError and breaks token/window queries when litellm isn’t installed.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: litellm is a required dependency in backend/requirements; the lazy import inside get_model_map is intentional and not a functional bug. ImportError concern is not applicable to supported deployments.

Prompt for AI agents

Address the following comment on backend/onyx/llm/utils.py at line 382: <comment>Ungarded lazy import of litellm in get_model_map causes ImportError and breaks token/window queries when litellm isn’t installed. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: litellm is a required dependency in backend/requirements; the lazy import inside get_model_map is intentional and not a functional bug. ImportError concern is not applicable to supported deployments.</comment> <file context> @@ -378,6 +379,8 @@ def test_llm(llm: LLM) -> str | None: @lru_cache(maxsize=1) # the copy.deepcopy is expensive, so we cache the result def get_model_map() -> dict: + import litellm + starting_map = copy.deepcopy(cast(dict, litellm.model_cost)) </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T17:59:28Z

backend/onyx/llm/utils.py

    files: list[InMemoryChatFile] | None = None,
 ) -> str:
    """Applies all non-image files."""
+    from onyx.file_processing.extract_file_text import read_pdf_file


Unprotected lazy import in _build_content causes ImportError at runtime, crashing any call if dependency is missing.

Prompt for AI agents

Address the following comment on backend/onyx/llm/utils.py at line 134: <comment>Unprotected lazy import in _build_content causes ImportError at runtime, crashing any call if dependency is missing.</comment> <file context> @@ -133,6 +131,8 @@ def _build_content( files: list[InMemoryChatFile] | None = None, ) -> str: """Applies all non-image files.""" + from onyx.file_processing.extract_file_text import read_pdf_file + if not files: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging

4 issues found across 15 files

Prompt for AI agents (all 4 issues)


Understand the root cause of the following 4 issues and fix them.


<file name="backend/onyx/natural_language_processing/search_nlp_models.py">

<violation number="1" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
Unconditional import of openai in embed() causes ImportError for non-OpenAI providers and makes OpenAI-only except unreachable when openai isn’t installed.</violation>
</file>

<file name="backend/onyx/context/search/models.py">

<violation number="1" location="backend/onyx/context/search/models.py:379">
KeyError risk: source_links is dict, code assumes key 0 exists and indexes like a list</violation>
</file>

<file name="backend/onyx/chat/prompt_builder/schemas.py">

<violation number="1" location="backend/onyx/chat/prompt_builder/schemas.py:10">
Pydantic model lacks arbitrary_types_allowed for list[BaseMessage], causing schema/validation error</violation>
</file>

<file name="backend/onyx/background/indexing/run_docfetching.py">

<violation number="1" location="backend/onyx/background/indexing/run_docfetching.py:101">
Import inside _get_connector_runner occurs outside its try/except, so import-time errors bypass pause logic and, since caller’s try is later, leave attempt IN_PROGRESS.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: The import sits outside the local try/except, but the stated impact is wrong. The watchdog (docfetching_proxy_task/process_job_result) marks attempts FAILED on early errors, so attempts won’t be stuck IN_PROGRESS. Pausing on import errors (dependency/code issues) is also undesirable. Low impact; filter.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-staging · 2025-09-26T19:01:29Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


Unconditional import of openai in embed() causes ImportError for non-OpenAI providers and makes OpenAI-only except unreachable when openai isn’t installed.

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>Unconditional import of openai in embed() causes ImportError for non-OpenAI providers and makes OpenAI-only except unreachable when openai isn’t installed.</comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T19:01:29Z

backend/onyx/context/search/models.py

+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or "Unknown",
+                link=chunk.source_links[0] if chunk.source_links else None,


KeyError risk: source_links is dict, code assumes key 0 exists and indexes like a list

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 379: <comment>KeyError risk: source_links is dict, code assumes key 0 exists and indexes like a list</comment> <file context> @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + ).document_id, + chunk_ind=chunk.chunk_id, + semantic_identifier=chunk.semantic_identifier or "Unknown", + link=chunk.source_links[0] if chunk.source_links else None, + blurb=chunk.blurb, + source_type=chunk.source_type, </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T19:01:30Z

backend/onyx/chat/prompt_builder/schemas.py

+class PromptSnapshot(BaseModel):
+    raw_message_history: list[PreviousMessage]
+    raw_user_query: str
+    built_prompt: list[BaseMessage]


Pydantic model lacks arbitrary_types_allowed for list[BaseMessage], causing schema/validation error

Prompt for AI agents

Address the following comment on backend/onyx/chat/prompt_builder/schemas.py at line 10: <comment>Pydantic model lacks arbitrary_types_allowed for list[BaseMessage], causing schema/validation error</comment> <file context> @@ -0,0 +1,10 @@ +class PromptSnapshot(BaseModel): + raw_message_history: list[PreviousMessage] + raw_user_query: str + built_prompt: list[BaseMessage] </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T19:01:30Z

backend/onyx/background/indexing/run_docfetching.py

    are the complete list of existing documents of the connector. If the task
    of type LOAD_STATE, the list will be considered complete and otherwise incomplete.
    """
+    from onyx.connectors.factory import instantiate_connector


Import inside _get_connector_runner occurs outside its try/except, so import-time errors bypass pause logic and, since caller’s try is later, leave attempt IN_PROGRESS.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: The import sits outside the local try/except, but the stated impact is wrong. The watchdog (docfetching_proxy_task/process_job_result) marks attempts FAILED on early errors, so attempts won’t be stuck IN_PROGRESS. Pausing on import errors (dependency/code issues) is also undesirable. Low impact; filter.

Prompt for AI agents

Address the following comment on backend/onyx/background/indexing/run_docfetching.py at line 101: <comment>Import inside _get_connector_runner occurs outside its try/except, so import-time errors bypass pause logic and, since caller’s try is later, leave attempt IN_PROGRESS. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: The import sits outside the local try/except, but the stated impact is wrong. The watchdog (docfetching_proxy_task/process_job_result) marks attempts FAILED on early errors, so attempts won’t be stuck IN_PROGRESS. Pausing on import errors (dependency/code issues) is also undesirable. Low impact; filter.</comment> <file context> @@ -100,6 +98,8 @@ def _get_connector_runner( are the complete list of existing documents of the connector. If the task of type LOAD_STATE, the list will be considered complete and otherwise incomplete. """ + from onyx.connectors.factory import instantiate_connector + task = attempt.connector_credential_pair.connector.input_type </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging

3 issues found across 15 files

Prompt for AI agents (all 3 issues)


Understand the root cause of the following 3 issues and fix them.


<file name="backend/onyx/natural_language_processing/search_nlp_models.py">

<violation number="1" location="backend/onyx/natural_language_processing/search_nlp_models.py:187">
Unconditional openai import in embed() causes ImportError for non-OpenAI providers when openai is not installed.</violation>
</file>

<file name="backend/onyx/llm/utils.py">

<violation number="1" location="backend/onyx/llm/utils.py:382">
Unconditional runtime import of litellm in get_llm_contextual_cost; default backend image lacks litellm, causing ModuleNotFoundError and 500s when cost is computed.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Filter out: The claim is incorrect. litellm is included in backend/requirements/default.txt and installed by backend/Dockerfile. The import in get_llm_contextual_cost is lazy, so it won’t raise ModuleNotFoundError in the default backend image.</violation>
</file>

<file name="backend/onyx/context/search/models.py">

<violation number="1" location="backend/onyx/context/search/models.py:379">
Indexing source_links by key 0 can raise KeyError; source_links is a dict[int, str], not guaranteed to contain 0.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-staging · 2025-09-26T19:36:48Z

backend/onyx/natural_language_processing/search_nlp_models.py

    async def _embed_openai(
        self, texts: list[str], model: str | None, reduced_dimension: int | None
    ) -> list[Embedding]:
+        import openai


Unconditional openai import in embed() causes ImportError for non-OpenAI providers when openai is not installed.

Prompt for AI agents

Address the following comment on backend/onyx/natural_language_processing/search_nlp_models.py at line 187: <comment>Unconditional openai import in embed() causes ImportError for non-OpenAI providers when openai is not installed.</comment> <file context> @@ -186,6 +184,8 @@ def __init__( async def _embed_openai( self, texts: list[str], model: str | None, reduced_dimension: int | None ) -> list[Embedding]: + import openai + if not model: </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T19:36:49Z

backend/onyx/llm/utils.py


 @lru_cache(maxsize=1)  # the copy.deepcopy is expensive, so we cache the result
 def get_model_map() -> dict:
+    import litellm


Unconditional runtime import of litellm in get_llm_contextual_cost; default backend image lacks litellm, causing ModuleNotFoundError and 500s when cost is computed.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Filter out: The claim is incorrect. litellm is included in backend/requirements/default.txt and installed by backend/Dockerfile. The import in get_llm_contextual_cost is lazy, so it won’t raise ModuleNotFoundError in the default backend image.

Prompt for AI agents

Address the following comment on backend/onyx/llm/utils.py at line 382: <comment>Unconditional runtime import of litellm in get_llm_contextual_cost; default backend image lacks litellm, causing ModuleNotFoundError and 500s when cost is computed. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Filter out: The claim is incorrect. litellm is included in backend/requirements/default.txt and installed by backend/Dockerfile. The import in get_llm_contextual_cost is lazy, so it won’t raise ModuleNotFoundError in the default backend image.</comment> <file context> @@ -378,6 +379,8 @@ def test_llm(llm: LLM) -> str | None: @lru_cache(maxsize=1) # the copy.deepcopy is expensive, so we cache the result def get_model_map() -> dict: + import litellm + starting_map = copy.deepcopy(cast(dict, litellm.model_cost)) </file context>

[internal] Confidence score: 8/10

[internal] Posted by: Functional Bugs Agent

cubic-staging · 2025-09-26T19:36:49Z

backend/onyx/context/search/models.py

+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or "Unknown",
+                link=chunk.source_links[0] if chunk.source_links else None,


Indexing source_links by key 0 can raise KeyError; source_links is a dict[int, str], not guaranteed to contain 0.

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 379: <comment>Indexing source_links by key 0 can raise KeyError; source_links is a dict[int, str], not guaranteed to contain 0.</comment> <file context> @@ -355,6 +356,44 @@ class SearchDoc(BaseModel): + ).document_id, + chunk_ind=chunk.chunk_id, + semantic_identifier=chunk.semantic_identifier or "Unknown", + link=chunk.source_links[0] if chunk.source_links else None, + blurb=chunk.blurb, + source_type=chunk.source_type, </file context>

[internal] Confidence score: 9/10

[internal] Posted by: Functional Bugs Agent

Suggested change

link=chunk.source_links[0] if chunk.source_links else None,

link=(next(iter(chunk.source_links.values())) if chunk.source_links else None),

edwin-onyx added 5 commits September 23, 2025 13:41

.

51afca7

.

deb6315

.

78c255f

.

b9c7df4

.

6782d00

cubic-dev-ai bot reviewed Sep 24, 2025

View reviewed changes

cubic-dev-local bot commented Sep 25, 2025

View reviewed changes

cubic-staging bot reviewed Sep 26, 2025

View reviewed changes

	items: "Sequence[InferenceChunk \| InferenceSection] \| None",
	items: Sequence[InferenceChunk \| InferenceSection] \| None,

		from onyx.llm.models import PreviousMessage


		class PromptSnapshot(BaseModel):

	link=chunk.source_links[0] if chunk.source_links else None,
	link=(next(iter(chunk.source_links.values())) if chunk.source_links else None),

fix(infra): fix some dependency hells and add some lazy loading to reduce celery worker RAM usage #3

Are you sure you want to change the base?

fix(infra): fix some dependency hells and add some lazy loading to reduce celery worker RAM usage #3

Uh oh!

Conversation

cubic-dev-local bot commented Sep 24, 2025

How Has This Been Tested?

Backporting (check the box to trigger backport action)

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

sanxroz commented Sep 25, 2025

Uh oh!

cubic-dev-local bot commented Sep 25, 2025

Uh oh!

cubic-dev-local bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-local bot Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-local bot Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-local bot Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-local bot Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-local bot Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-local bot Sep 25, 2025

Choose a reason for hiding this comment

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-dev-local bot Sep 25, 2025 •

edited

Loading

cubic-staging bot Sep 26, 2025 •

edited

Loading

cubic-staging bot Sep 26, 2025 •

edited

Loading