For the second time, we've hit an issue where the data we're trying to send through azure queue is too large. The first time (#2742 #2788 ), and that worked ok but we probably don't want to be implementing ITruncatable for everything.
The problem
Many of the events we publish contain TaskConfig/JobConfig/rendered crash reports which can all contain lots of data. We also include the job id/task id.
If we keep only the job id/task id and expect recipients to query the state when they receive the webhook, information may be lost.
For example:
- Task starts -> in the db:
task id: 123abc, state: started, other info: ...
- Send TaskStarted webhook (
task id: 123abc)
- User receives the webhook
- The task crashes and we update the state in the db to Failed
- User queries the task id and see status failed
In that example, the db state for the task at time 1. is lost forever. The user can't check any state related to the task when it started.
The solution
I created this issue so we can brainstorm solutions.
The only requirement for a solution is that if we choose to continue using Azure Queue, we need some reasonable expectation that the events don't have unbounded size. For example we know guids/task states will have a limited length when they're serialized.
AB#45326