Skip to content

Errors from the providers aren't handled #1479

Closed as not planned
Closed as not planned
@ghostdevv

Description

@ghostdevv

The Problem

Providers mostly don't report any errors they encounter back up to the core system. This means that if something fails, it's possible for it to just cause an unlimited "hang". In my case the hang causes the trigger to have been marked as started, when it hasn't actually been started. In the docker provider for example, errors are caught and simply ignored (example).

I'm not familiar with the codebase so please let me know if there any mistakes. I spent some time following around the providers, and found a few examples like the following where, a effectively a request to deploy the trigger is just sent but never followed up upon.

https://github.com/triggerdotdev/trigger.dev/blob/main/apps/webapp/app/v3/marqs/sharedQueueConsumer.server.ts#L544-L560

I'm guessing what needs to happen is that the provider needs some way to return an error code, which core can then "bubble up" by changing the deployment status. I didn't want to attempt to make changes without creating an issue first, as I'm missing a lot of context. I'm also not sure how prs such as #1470 interact with this issue for example.

Example Reproduction

I originally reported this in #1476, but moved it here as I realised my issue was a symptom of a wider problem that I described above. I encountered this while I was setting up authentication for my self hosted docker registry. Trigger would try to deploy a task, and the docker provider would fail to run it, because it couldn't download the image. This would cause trigger to hang exponentially, as it was unaware that the docker provider failed to run the task.

During my testing last night I added a scheduled task that runs every 20 minutes. I then forgot about it, and was messing around with some other things in trigger. After some sleep, I came back to it and noticed that there was a long list of "running" scheduled tasks. Upon further investigation, before going to sleep I had made an incomplete deployment which resulted in a missing docker image from the registry. This lead to the same place where, trigger tries to deploy the image, it fails to, but trigger has no idea. This lead to the long list of running tasks, the longest of which was hanging for ~14 hours.

image

Screenshot of the runs list

For reference, when the task is working correctly it takes ~2 seconds from start to finish.

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions