-
-
Notifications
You must be signed in to change notification settings - Fork 725
Fix: Prevent failed resumes for very large batches #1598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 eslint
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.tsOops! Something went wrong! :( ESLint: 8.45.0 ESLint couldn't find the config "custom" to extend from. Please check that the name of the config is correct. The config "custom" was referenced from the config file in "/.eslintrc.js". If you still have problems, please stop by https://eslint.org/chat/help to chat with the team. WalkthroughThe changes enhance the task run attempt processing in the shared queue consumer and resume attempt service. New types for improved type safety are introduced, along with refactored method signatures and enhanced error handling. Key additions include the Changes
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
apps/webapp/app/v3/services/resumeAttempt.server.ts (1)
Line range hint
149-197
: Handle potential concurrency issues with asynchronous operationsWithin the
#handleDependencyResume
method, multiple asynchronous database calls are made inside a loop. Consider the impact of concurrent executions and ensure that the async operations are properly awaited and that errors are handled correctly.Consider using
Promise.all
if appropriate or ensuring that the database calls are sequential if required by the logic.
🧹 Nitpick comments (6)
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (4)
1002-1048
: Consider potential performance impact of deep includes in type definitionsThe
AttemptForCompletion
andAttemptForExecution
types include multiple nested relations, which could lead to performance issues when retrieving data from the database. Ensure that only necessary fields are included to optimize query performance.You might consider selecting only the fields required for your methods:
type AttemptForCompletion = Prisma.TaskRunAttemptGetPayload<{ include: { backgroundWorker: true; backgroundWorkerTask: true; taskRun: { include: { runtimeEnvironment: { include: { - organization: true; - project: true; + organization: { select: { id: true, slug: true, title: true } }, + project: { select: { id: true, slug: true, name: true, externalRef: true } }, }; }; tags: true; }; }; queue: true; }; }>; // Similarly for AttemptForExecution
1050-1071
: Ensure consistent handling ofoutput
andoutputType
In the
_completionPayloadFromAttempt
method, when constructing the success result, you setoutput: attempt.output ?? undefined
. Ifattempt.output
isnull
, this will setoutput
toundefined
. Ensure the consuming code handlesundefined
appropriately.Consider explicitly handling
null
values:- output: attempt.output ?? undefined, + output: attempt.output,
1139-1141
: Simplify conditional expression using optional chainingYou can simplify the conditional expression when accessing
taskRun.batchItems
by using optional chaining. This makes the code more concise and improves readability.Apply this diff to use optional chaining:
- batch: - taskRun.batchItems[0] && taskRun.batchItems[0].batchTaskRun - ? { id: taskRun.batchItems[0].batchTaskRun.friendlyId } - : undefined, + batch: taskRun.batchItems[0]?.batchTaskRun + ? { id: taskRun.batchItems[0].batchTaskRun.friendlyId } + : undefined,🧰 Tools
🪛 Biome (1.9.4)
[error] 1139-1139: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
1297-1300
: Avoid reassigning parametersIn the
_executionFromAttempt
method, you check ifmachinePreset
is not defined and then assign a value to it. Reassigning function parameters can lead to unexpected behavior and make the code harder to understand.Consider using a local variable instead:
- if (!machinePreset) { - machinePreset = machinePresetFromConfig(backgroundWorkerTask.machineConfig ?? {}); - } + const preset = machinePreset ?? machinePresetFromConfig(backgroundWorkerTask.machineConfig ?? {});And use
preset
in the rest of the method.apps/webapp/app/v3/services/resumeAttempt.server.ts (2)
12-13
: Organize imports and remove unused onesThe imports for
Prisma
andTaskRunAttempt
seem to have been added. Ensure that all imported modules are necessary. Also, consider organizing imports for better readability.If any imports are unused, consider removing them to keep the code clean.
77-86
: Improve error message specificityThe error message "Run is not resumable" could be more specific by including the run status. This would aid in debugging.
Modify the error log to include the run status:
- this._logger.error("Run is not resumable"); + this._logger.error(`Run is not resumable; current status: ${attempt.taskRun.status}`);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
(6 hunks)apps/webapp/app/v3/services/resumeAttempt.server.ts
(5 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts
[error] 1139-1139: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (buildjet-8vcpu-ubuntu-2204 - pnpm)
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: e2e / 🧪 CLI v3 tests (buildjet-8vcpu-ubuntu-2204 - npm)
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (6)
apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts (3)
20-20
: Imports look goodThe added imports from Prisma and other modules are appropriate for the new code additions.
1073-1151
: Validate metadata parsing and error handlingIn
_executionFromAttempt
, theparsePacket
function is used to parsetaskRun.metadata
. Ensure thatparsePacket
can handle potential parsing errors, and that any exceptions are properly caught and handled to prevent runtime errors.Since the parsing of metadata is crucial, verify that all possible formats of
taskRun.metadata
are accounted for, and consider adding error handling or validation as needed.🧰 Tools
🪛 Biome (1.9.4)
[error] 1139-1139: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
1319-1375
:⚠️ Potential issueEnsure proper error handling in
getResumePayload
In the
getResumePayload
method, if the attempt is not found, an error is logged but the method proceeds without returning. This might lead to undefined behavior whenattempt
isundefined
.Add a return statement after logging the error to prevent further execution:
if (!attempt) { logger.error("getExecutionPayloadFromAttempt: No attempt found", { id: attemptId }); + return; }
Likely invalid or redundant comment.
apps/webapp/app/v3/services/resumeAttempt.server.ts (3)
23-31
: ReuselatestAttemptSelect
across the codebaseThe
latestAttemptSelect
constant is defined to encapsulate the selection logic for the latest attempt. Ensure this constant is reused wherever the latest attempt selection is required to maintain consistency.Good use of a constant to avoid repetition.
33-70
: Check for potential null values in nested includesWhen performing
findFirst
with nested includes, ensure that the data you expect is always present. There might be cases where included relations arenull
, which could lead to runtime errors if not handled properly.Verify that all necessary related data is present and handle cases where relations might be missing.
88-122
: Ensure all dependency cases are handledIn the switch statement handling different resume types, double-check that all possible cases are accounted for, and that appropriate actions are taken for each type.
Review the logic to ensure that no dependency types are missed and that the correct methods are called for each case.
@trigger.dev/build
trigger.dev
@trigger.dev/react-hooks
@trigger.dev/rsc
@trigger.dev/sdk
@trigger.dev/core
commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
apps/webapp/app/v3/services/resumeAttempt.server.ts (1)
33-70
: Consider pagination for large result setsThe query includes nested relations with
include
statements. For very large batches, this could still lead to performance issues. Consider implementing pagination or limiting the depth of included relations.references/v3-catalog/src/trigger/batch.ts (1)
Line range hint
19-49
: Implementation aligns well with PR objectives.The changes effectively address the issue of failed resumes for large batches by:
- Adding dynamic batch size control
- Providing flexibility in execution mode with the wait option
- Maintaining backward compatibility with default values
Consider adding telemetry to monitor batch processing times and success rates to help identify optimal batch sizes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/webapp/app/v3/services/resumeAttempt.server.ts
(5 hunks)references/v3-catalog/src/trigger/batch.ts
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (buildjet-8vcpu-ubuntu-2204 - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (buildjet-8vcpu-ubuntu-2204 - npm)
- GitHub Check: units / 🧪 Unit Tests
- GitHub Check: typecheck / typecheck
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
apps/webapp/app/v3/services/resumeAttempt.server.ts (1)
211-235
: Reconsider transaction removalThe removal of transactions might introduce race conditions. As mentioned in a previous review, consider using transactions to ensure atomicity of status updates.
#!/bin/bash # Check for other status update operations that might conflict rg -A 5 "taskRunAttempt.update.*status"
This could sometimes fail due to transaction timeouts.
Summary by CodeRabbit
Release Notes
New Features
Improvements
Technical Updates