Skip to content

hotfix(agents-api): Fix subquery cardinality issue #1298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 31, 2025
Merged

Conversation

creatorrr
Copy link
Contributor

@creatorrr creatorrr commented Mar 30, 2025

User description

Fix for this error on get_history when trying to talk to tira bot:

agents-api-multi-tenant-1  | asyncpg.exceptions.CardinalityViolationError: more than one row returned by a subquery used as an expression

Signed-off-by: Diwank Singh Tomer diwank.singh@gmail.com


PR Type

Bug fix


Description

  • Added LIMIT 1 to subqueries to ensure single-row results.

  • Fixed potential cardinality issues in multiple SQL queries.

  • Improved query reliability and consistency across the API.


Changes walkthrough 📝

Relevant files
Bug fix
get_history.py
Fix cardinality issue in `get_history` query                         

agents-api/agents_api/queries/entries/get_history.py

  • Added LIMIT 1 to subquery for session_created_at.
  • Ensured single-row result for created_at field.
  • +1/-1     
    list_entries.py
    Fix cardinality issue in `list_entries` query                       

    agents-api/agents_api/queries/entries/list_entries.py

  • Added LIMIT 1 to subquery for session created_at.
  • Ensured single-row result for session creation time.
  • +1/-1     
    count_executions.py
    Fix cardinality issue in `count_executions` query               

    agents-api/agents_api/queries/executions/count_executions.py

  • Added LIMIT 1 to subquery for developer created_at.
  • Ensured single-row result for developer creation time.
  • +1/-1     
    get_paused_execution_token.py
    Fix cardinality issue in `get_paused_execution_token` query

    agents-api/agents_api/queries/executions/get_paused_execution_token.py

  • Added LIMIT 1 to subquery for execution created_at.
  • Ensured single-row result for paused execution token query.
  • +1/-1     
    list_execution_transitions.py
    Fix cardinality issue in `list_execution_transitions` query

    agents-api/agents_api/queries/executions/list_execution_transitions.py

  • Added LIMIT 1 to subquery for execution created_at.
  • Ensured single-row result for execution transitions query.
  • +1/-1     
    prepare_execution_input.py
    Fix cardinality issue in `prepare_execution_input` query 

    agents-api/agents_api/queries/executions/prepare_execution_input.py

  • Added ORDER BY created_at DESC and LIMIT 1 to subquery for agent
    selection.
  • Ensured latest agent is selected for execution input.
  • +1/-0     

    Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.

  • Important

    Fix subquery cardinality issues by adding LIMIT 1 to ensure single-row results in various SQL queries.

    • SQL Query Fixes:
      • Add LIMIT 1 to subqueries in get_history.py, list_entries.py, count_executions.py, get_paused_execution_token.py, and list_execution_transitions.py to ensure single-row results.
      • In prepare_execution_input.py, add ORDER BY created_at DESC and LIMIT 1 to subquery for selecting agent_id to ensure the most recent agent is selected.

    This description was created by Ellipsis for af0a860. It will automatically update as commits are pushed.

    Signed-off-by: Diwank Singh Tomer <diwank.singh@gmail.com>
    Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Redundant Condition

    The query has two conditions checking the same thing: e.created_at >= $7 and e.created_at >= (select created_at from sessions where session_id = $1 LIMIT 1). If $7 is meant to be the session creation time, one condition is redundant. If they serve different purposes, this might lead to unexpected filtering.

    AND (er.relation IS NULL OR er.relation != ALL($6))
    AND e.created_at >= $7
    AND e.created_at >= (select created_at from sessions where session_id = $1 LIMIT 1)

    Copy link
    Contributor

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Handle empty result sets

    The subquery might return NULL if the collected_entries CTE is empty. Consider
    adding a fallback value or handling NULL results to prevent potential issues
    when no entries exist.

    agents-api/agents_api/queries/entries/get_history.py [45]

    -(SELECT session_created_at FROM collected_entries LIMIT 1) AS created_at,
    +(SELECT COALESCE((SELECT session_created_at FROM collected_entries LIMIT 1), NOW())) AS created_at,
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    __

    Why: The suggestion addresses a potential runtime error by adding COALESCE to handle NULL results when collected_entries is empty. This is an important defensive programming practice that prevents application errors in edge cases.

    Medium
    General
    Verify task-agent relationship consistency

    The query assumes that tasks with the same task_id can have different agent_ids.
    If this is intentional, ensure the application logic handles this correctly. If
    not, consider adding a unique constraint on task_id.

    agents-api/agents_api/queries/executions/prepare_execution_input.py [18-21]

    +SELECT agent_id FROM tasks
    +WHERE developer_id = $1 AND task_id = $2
    +ORDER BY created_at DESC
    +LIMIT 1
     
    -
    • Apply this suggestion
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion raises a valid concern about data consistency, pointing out a potential design issue where multiple agent_ids could exist for the same task_id. While the improved code doesn't change anything, the suggestion prompts important verification of the data model.

    Low
    • More

    Copy link
    Contributor

    @ellipsis-dev ellipsis-dev bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    👍 Looks good to me! Reviewed everything up to af0a860 in 1 minute and 27 seconds

    More details
    • Looked at 77 lines of code in 6 files
    • Skipped 0 files when reviewing.
    • Skipped posting 12 drafted comments based on config settings.
    1. agents-api/agents_api/queries/entries/get_history.py:45
    • Draft comment:
      LIMIT 1 ensures a single session_created_at value is returned. Confirm this is the intended row.
    • Reason this comment was not posted:
      Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
      The comment violates the rule about not asking authors to confirm their intention. Additionally, since session_created_at comes from a 1:1 join with sessions table, all values would be identical anyway, so there's no ambiguity about which row we're getting. The LIMIT 1 is just an optimization.
      Maybe there could be edge cases where the sessions table has inconsistent data? Maybe the join conditions are insufficient?
      The join conditions are clear and include both session_id and developer_id, making it a precise 1:1 join. Any data consistency issues would be a database integrity problem, not a query logic issue.
      Delete this comment. It asks for unnecessary confirmation and the underlying assumption about row ambiguity is incorrect given the query structure.
    2. agents-api/agents_api/queries/entries/list_entries.py:45
    • Draft comment:
      Using LIMIT 1 in the subquery avoids multiple row issues. Verify that the session's created_at retrieved is correct for filtering.
    • Reason this comment was not posted:
      Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
      This comment is asking the PR author to verify the correctness of a specific behavior, which violates the rule against asking for confirmation or verification of intended behavior. It does not provide a specific suggestion or point out a clear issue.
    3. agents-api/agents_api/queries/executions/count_executions.py:14
    • Draft comment:
      LIMIT 1 in the subquery ensures a single value is returned. Consider performance if developer table grows.
    • Reason this comment was not posted:
      Comment looked like it was already resolved.
    4. agents-api/agents_api/queries/executions/get_paused_execution_token.py:28
    • Draft comment:
      Adding LIMIT 1 restricts subquery to a single created_at value, resolving cardinality issues.
    • Reason this comment was not posted:
      Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
      This comment is purely informative, explaining what the code change does without suggesting any improvements or asking for clarification. It doesn't align with the rules for useful comments.
    5. agents-api/agents_api/queries/executions/list_execution_transitions.py:21
    • Draft comment:
      Using LIMIT 1 here prevents multiple rows. Confirm this aligns with the intended filtering for created_at.
    • Reason this comment was not posted:
      Comment looked like it was already resolved.
    6. agents-api/agents_api/queries/executions/prepare_execution_input.py:20
    • Draft comment:
      Addition of 'ORDER BY created_at DESC' in tasks subquery ensures the latest task is selected. Verify this meets business requirements.
    • Reason this comment was not posted:
      Comment did not seem useful. Confidence is useful = 0% <= threshold 50%
      The comment is asking the PR author to verify if the change meets business requirements, which is against the rules. It does not provide a specific code suggestion or ask for a specific test to be written.
    7. agents-api/agents_api/queries/entries/get_history.py:45
    • Draft comment:
      Adding 'LIMIT 1' prevents a cardinality error. If the entries might have different 'session_created_at' values, consider adding an ORDER BY clause to choose the desired one.
    • Reason this comment was not posted:
      Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
      Since session_created_at comes from the sessions table via JOIN, and we're joining on session_id, all entries in collected_entries will have the same session_created_at value - it's the created_at from that one session. Adding ORDER BY wouldn't change anything since all rows have identical values. The LIMIT 1 is just an optimization to avoid returning multiple identical values.
      What if there are no entries in collected_entries? What if the session has multiple created_at values due to a bug or data inconsistency?
      The NULL case is handled in _transform() with the "or utcnow()" fallback. Data inconsistency is impossible here because session_created_at comes from the sessions table with a single session_id.
      The comment should be deleted. The ORDER BY suggestion is unnecessary since all session_created_at values will be identical within a session.
    8. agents-api/agents_api/queries/entries/list_entries.py:45
    • Draft comment:
      Using 'LIMIT 1' in the subquery for sessions prevents multiple row returns. If session_id is unique this may be redundant, but if not, consider whether an ORDER BY is needed for deterministic results.
    • Reason this comment was not posted:
      Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
      The comment makes a valid point about deterministic results when using LIMIT 1. However, session_id is likely a primary key since it's used as a key identifier throughout the code (e.g., in session_exists_query). If session_id is unique, both LIMIT 1 and ORDER BY would be redundant. Without seeing the sessions table schema, we can't be 100% sure about uniqueness.
      I don't have definitive evidence that session_id is unique. The comment could be raising a valid edge case if session_id is not unique.
      However, the very existence of session_exists_query checking just session_id strongly suggests it's unique. Adding ORDER BY without clear evidence of a problem violates the principle of only keeping strongly evidenced comments.
      The comment is speculative without strong evidence of an actual issue. While technically correct about SQL best practices, it's making assumptions we can't verify.
    9. agents-api/agents_api/queries/executions/count_executions.py:14
    • Draft comment:
      The addition of 'LIMIT 1' ensures that only one row is returned from the developers table. Verify that the developer_id uniquely identifies a row or consider adding an ORDER BY if multiple rows are possible.
    • Reason this comment was not posted:
      Comment looked like it was already resolved.
    10. agents-api/agents_api/queries/executions/get_paused_execution_token.py:28
    • Draft comment:
      Adding 'LIMIT 1' prevents multi-row returns in the executions subquery. Ensure this aligns with the intended logic if multiple executions might match.
    • Reason this comment was not posted:
      Comment looked like it was already resolved.
    11. agents-api/agents_api/queries/executions/list_execution_transitions.py:21
    • Draft comment:
      The 'LIMIT 1' addition in the subquery resolves the cardinality issue. Consider whether an explicit ORDER BY is needed to guarantee which 'created_at' is returned in ambiguous cases.
    • Reason this comment was not posted:
      Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
      The subquery is selecting created_at from executions table where execution_id matches. Since execution_id is likely a primary key or unique identifier, there should only ever be one row matching this condition. Even if there were multiple rows (which would indicate a data integrity issue), we're only using this value as a lower bound for filtering - we're not using it for ordering or displaying. The main query's ORDER BY clause handles the final sorting.
      I could be wrong about execution_id being unique - I don't have access to the table schema. Also, if there were multiple rows, the choice of which created_at we get could affect which transitions are included in the results.
      If execution_id wasn't unique, that would be a fundamental data model problem that should be fixed at the schema level, not worked around with ORDER BY clauses. The comment is suggesting a band-aid for what would be a deeper issue.
      The comment should be deleted. It's speculating about an edge case that would indicate a more serious data model problem, and the ORDER BY wouldn't be the right fix anyway.
    12. agents-api/agents_api/queries/executions/prepare_execution_input.py:20
    • Draft comment:
      Adding 'ORDER BY created_at DESC' ensures the most recent task is used to determine the agent_id. Confirm that selecting the latest task is the intended behavior.
    • Reason this comment was not posted:
      Comment looked like it was already resolved.

    Workflow ID: wflow_04vIGF5YkVIkLLbf


    You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

    @creatorrr creatorrr merged commit ad893c0 into dev Mar 31, 2025
    14 checks passed
    @creatorrr creatorrr deleted the x/fix-subquery branch March 31, 2025 13:58
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant