-
Notifications
You must be signed in to change notification settings - Fork 938
hotfix(agents-api): Fix subquery cardinality issue #1298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Diwank Singh Tomer <diwank.singh@gmail.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to af0a860 in 1 minute and 27 seconds
More details
- Looked at
77
lines of code in6
files - Skipped
0
files when reviewing. - Skipped posting
12
drafted comments based on config settings.
1. agents-api/agents_api/queries/entries/get_history.py:45
- Draft comment:
LIMIT 1 ensures a single session_created_at value is returned. Confirm this is the intended row. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment violates the rule about not asking authors to confirm their intention. Additionally, since session_created_at comes from a 1:1 join with sessions table, all values would be identical anyway, so there's no ambiguity about which row we're getting. The LIMIT 1 is just an optimization.
Maybe there could be edge cases where the sessions table has inconsistent data? Maybe the join conditions are insufficient?
The join conditions are clear and include both session_id and developer_id, making it a precise 1:1 join. Any data consistency issues would be a database integrity problem, not a query logic issue.
Delete this comment. It asks for unnecessary confirmation and the underlying assumption about row ambiguity is incorrect given the query structure.
2. agents-api/agents_api/queries/entries/list_entries.py:45
- Draft comment:
Using LIMIT 1 in the subquery avoids multiple row issues. Verify that the session's created_at retrieved is correct for filtering. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
This comment is asking the PR author to verify the correctness of a specific behavior, which violates the rule against asking for confirmation or verification of intended behavior. It does not provide a specific suggestion or point out a clear issue.
3. agents-api/agents_api/queries/executions/count_executions.py:14
- Draft comment:
LIMIT 1 in the subquery ensures a single value is returned. Consider performance if developer table grows. - Reason this comment was not posted:
Comment looked like it was already resolved.
4. agents-api/agents_api/queries/executions/get_paused_execution_token.py:28
- Draft comment:
Adding LIMIT 1 restricts subquery to a single created_at value, resolving cardinality issues. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
This comment is purely informative, explaining what the code change does without suggesting any improvements or asking for clarification. It doesn't align with the rules for useful comments.
5. agents-api/agents_api/queries/executions/list_execution_transitions.py:21
- Draft comment:
Using LIMIT 1 here prevents multiple rows. Confirm this aligns with the intended filtering for created_at. - Reason this comment was not posted:
Comment looked like it was already resolved.
6. agents-api/agents_api/queries/executions/prepare_execution_input.py:20
- Draft comment:
Addition of 'ORDER BY created_at DESC' in tasks subquery ensures the latest task is selected. Verify this meets business requirements. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
The comment is asking the PR author to verify if the change meets business requirements, which is against the rules. It does not provide a specific code suggestion or ask for a specific test to be written.
7. agents-api/agents_api/queries/entries/get_history.py:45
- Draft comment:
Adding 'LIMIT 1' prevents a cardinality error. If the entries might have different 'session_created_at' values, consider adding an ORDER BY clause to choose the desired one. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
Since session_created_at comes from the sessions table via JOIN, and we're joining on session_id, all entries in collected_entries will have the same session_created_at value - it's the created_at from that one session. Adding ORDER BY wouldn't change anything since all rows have identical values. The LIMIT 1 is just an optimization to avoid returning multiple identical values.
What if there are no entries in collected_entries? What if the session has multiple created_at values due to a bug or data inconsistency?
The NULL case is handled in _transform() with the "or utcnow()" fallback. Data inconsistency is impossible here because session_created_at comes from the sessions table with a single session_id.
The comment should be deleted. The ORDER BY suggestion is unnecessary since all session_created_at values will be identical within a session.
8. agents-api/agents_api/queries/entries/list_entries.py:45
- Draft comment:
Using 'LIMIT 1' in the subquery for sessions prevents multiple row returns. If session_id is unique this may be redundant, but if not, consider whether an ORDER BY is needed for deterministic results. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The comment makes a valid point about deterministic results when using LIMIT 1. However, session_id is likely a primary key since it's used as a key identifier throughout the code (e.g., in session_exists_query). If session_id is unique, both LIMIT 1 and ORDER BY would be redundant. Without seeing the sessions table schema, we can't be 100% sure about uniqueness.
I don't have definitive evidence that session_id is unique. The comment could be raising a valid edge case if session_id is not unique.
However, the very existence of session_exists_query checking just session_id strongly suggests it's unique. Adding ORDER BY without clear evidence of a problem violates the principle of only keeping strongly evidenced comments.
The comment is speculative without strong evidence of an actual issue. While technically correct about SQL best practices, it's making assumptions we can't verify.
9. agents-api/agents_api/queries/executions/count_executions.py:14
- Draft comment:
The addition of 'LIMIT 1' ensures that only one row is returned from the developers table. Verify that the developer_id uniquely identifies a row or consider adding an ORDER BY if multiple rows are possible. - Reason this comment was not posted:
Comment looked like it was already resolved.
10. agents-api/agents_api/queries/executions/get_paused_execution_token.py:28
- Draft comment:
Adding 'LIMIT 1' prevents multi-row returns in the executions subquery. Ensure this aligns with the intended logic if multiple executions might match. - Reason this comment was not posted:
Comment looked like it was already resolved.
11. agents-api/agents_api/queries/executions/list_execution_transitions.py:21
- Draft comment:
The 'LIMIT 1' addition in the subquery resolves the cardinality issue. Consider whether an explicit ORDER BY is needed to guarantee which 'created_at' is returned in ambiguous cases. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The subquery is selecting created_at from executions table where execution_id matches. Since execution_id is likely a primary key or unique identifier, there should only ever be one row matching this condition. Even if there were multiple rows (which would indicate a data integrity issue), we're only using this value as a lower bound for filtering - we're not using it for ordering or displaying. The main query's ORDER BY clause handles the final sorting.
I could be wrong about execution_id being unique - I don't have access to the table schema. Also, if there were multiple rows, the choice of which created_at we get could affect which transitions are included in the results.
If execution_id wasn't unique, that would be a fundamental data model problem that should be fixed at the schema level, not worked around with ORDER BY clauses. The comment is suggesting a band-aid for what would be a deeper issue.
The comment should be deleted. It's speculating about an edge case that would indicate a more serious data model problem, and the ORDER BY wouldn't be the right fix anyway.
12. agents-api/agents_api/queries/executions/prepare_execution_input.py:20
- Draft comment:
Adding 'ORDER BY created_at DESC' ensures the most recent task is used to determine the agent_id. Confirm that selecting the latest task is the intended behavior. - Reason this comment was not posted:
Comment looked like it was already resolved.
Workflow ID: wflow_04vIGF5YkVIkLLbf
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
User description
Fix for this error on get_history when trying to talk to tira bot:
Signed-off-by: Diwank Singh Tomer diwank.singh@gmail.com
PR Type
Bug fix
Description
Added
LIMIT 1
to subqueries to ensure single-row results.Fixed potential cardinality issues in multiple SQL queries.
Improved query reliability and consistency across the API.
Changes walkthrough 📝
get_history.py
Fix cardinality issue in `get_history` query
agents-api/agents_api/queries/entries/get_history.py
LIMIT 1
to subquery forsession_created_at
.created_at
field.list_entries.py
Fix cardinality issue in `list_entries` query
agents-api/agents_api/queries/entries/list_entries.py
LIMIT 1
to subquery for sessioncreated_at
.count_executions.py
Fix cardinality issue in `count_executions` query
agents-api/agents_api/queries/executions/count_executions.py
LIMIT 1
to subquery for developercreated_at
.get_paused_execution_token.py
Fix cardinality issue in `get_paused_execution_token` query
agents-api/agents_api/queries/executions/get_paused_execution_token.py
LIMIT 1
to subquery for executioncreated_at
.list_execution_transitions.py
Fix cardinality issue in `list_execution_transitions` query
agents-api/agents_api/queries/executions/list_execution_transitions.py
LIMIT 1
to subquery for executioncreated_at
.prepare_execution_input.py
Fix cardinality issue in `prepare_execution_input` query
agents-api/agents_api/queries/executions/prepare_execution_input.py
ORDER BY created_at DESC
andLIMIT 1
to subquery for agentselection.
Important
Fix subquery cardinality issues by adding
LIMIT 1
to ensure single-row results in various SQL queries.LIMIT 1
to subqueries inget_history.py
,list_entries.py
,count_executions.py
,get_paused_execution_token.py
, andlist_execution_transitions.py
to ensure single-row results.prepare_execution_input.py
, addORDER BY created_at DESC
andLIMIT 1
to subquery for selectingagent_id
to ensure the most recent agent is selected.This description was created by
for af0a860. It will automatically update as commits are pushed.