You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a service bundled with durable function with servicebus trigger and two activity functions in an orchestrator, packaged it in a DockerFile, and deployed to Azure Container Apps. The functionality of our service is as follows: the Controller service communicates with the PlanGenerator service, which in turn interacts with the Azure OpenAI LLM model and tools via apis.
We have set a function timeout of approximately ~10 hours. Despite this, long-running jobs are not completing even though the backend jobs are finished. The activity function stops, retries, and logs indicate a function timeout with retries. Our backend jobs typically run for 1-2 hours at most.
To reproduce the issue, we decreased the function timeout to 00:19:59 and observed logs indicating a function timeout every ~19 minutes. During this time, if an backend job was running, it would be lost, and a retry would occur after 10 minutes (as per configuration), restarting the activity task. Below is the logs snapshot for reference:
Snapshot from the Taskhub history table for the above highlighted instance
Behaviour for the First Message
Whenever a user sends a request to service after deploying a new revision, even there is a 5-minute wait time. During this period, the message is dropped/late reply, but the orchestrator schedules the task, which does not complete and stuck in running state. Upon restarting the service, we observe that the response for the same message is received. However, if a new message is sent in the new revision, the response is either significantly delayed or remains in a running state indefinitely.
Resource Allocation and Performance
Would like to understand more about how requests are being served in our setup with only one replica having 16 vCPUs and 32 GB of memory. Does every activity function create a process and share compute resources from Azure Container Apps? or single function app serves all the requests and when function app dies due to timeout, all the activity functions would stop? How does this work?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Scenario and Current Deployment
We have a service bundled with durable function with servicebus trigger and two activity functions in an orchestrator, packaged it in a DockerFile, and deployed to Azure Container Apps. The functionality of our service is as follows: the Controller service communicates with the PlanGenerator service, which in turn interacts with the Azure OpenAI LLM model and tools via apis.
Explicit Settings in host.json
{ "version": "2.0", "extensions": { "durableTask": { "hubName": "copilotplangeneratorhub", "maxConcurrentActivityFunctions": 50, "maxConcurrentOrchestratorFunctions": 50 }, "serviceBus": { "messageHandlerOptions": { "maxConcurrentCalls": 10, "autoComplete": false }, "prefetchCount": 10 } }, "functionTimeout": "09:59:59", "retry": { "strategy": "fixedDelay", "maxRetryCount": 3, "delayInterval": "00:10:00" // 10 minutes delay after a failure. } }
Function Timeout Behaviour
We have set a function timeout of approximately ~10 hours. Despite this, long-running jobs are not completing even though the backend jobs are finished. The activity function stops, retries, and logs indicate a function timeout with retries. Our backend jobs typically run for 1-2 hours at most.
To reproduce the issue, we decreased the function timeout to 00:19:59 and observed logs indicating a function timeout every ~19 minutes. During this time, if an backend job was running, it would be lost, and a retry would occur after 10 minutes (as per configuration), restarting the activity task. Below is the logs snapshot for reference:
Snapshot from the Taskhub history table for the above highlighted instance
Behaviour for the First Message
Whenever a user sends a request to service after deploying a new revision, even there is a 5-minute wait time. During this period, the message is dropped/late reply, but the orchestrator schedules the task, which does not complete and stuck in running state. Upon restarting the service, we observe that the response for the same message is received. However, if a new message is sent in the new revision, the response is either significantly delayed or remains in a running state indefinitely.
Resource Allocation and Performance
Would like to understand more about how requests are being served in our setup with only one replica having 16 vCPUs and 32 GB of memory. Does every activity function create a process and share compute resources from Azure Container Apps? or single function app serves all the requests and when function app dies due to timeout, all the activity functions would stop? How does this work?
@cgillum Would you be able to help here? Thanks
Beta Was this translation helpful? Give feedback.
All reactions