-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tests] Increase efcore timeout for cosmos in CosmosEndToEnd #5464
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
.. playground app. Fixes issue: dotnet#5415
00d2912
to
18b5d40
Compare
@@ -9,7 +9,10 @@ | |||
|
|||
builder.AddServiceDefaults(); | |||
builder.AddAzureCosmosClient("cosmos"); | |||
builder.AddCosmosDbContext<TestCosmosContext>("cosmos", "ef"); | |||
builder.AddCosmosDbContext<TestCosmosContext>("cosmos", "ef", configureDbContextOptions => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this, or should we wait longer in the tests? Can't we wait until the Cosmos contianer says "ready"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do wait for cosmos to do that. And we invoke other endpoints that work:
Calling endpoint 'http://localhost:36563/alive for resource 'api' in app 'CosmosEndToEnd.AppHost'
Calling endpoint 'http://localhost:36563/health for resource 'api' in app 'CosmosEndToEnd.AppHost'
Calling endpoint 'http://localhost:36563/ for resource 'api' in app 'CosmosEndToEnd.AppHost'
That last one is
aspire/playground/CosmosEndToEnd/CosmosEndToEnd.ApiService/Program.cs
Lines 17 to 46 in 0587e48
app.MapGet("/", async (CosmosClient cosmosClient) => | |
{ | |
var db = (await cosmosClient.CreateDatabaseIfNotExistsAsync("db")).Database; | |
var container = (await db.CreateContainerIfNotExistsAsync("entries", "/Id")).Container; | |
// Add an entry to the database on each request. | |
var newEntry = new Entry() { Id = Guid.NewGuid().ToString() }; | |
await container.CreateItemAsync(newEntry); | |
var entries = new List<Entry>(); | |
var iterator = container.GetItemQueryIterator<Entry>(requestOptions: new QueryRequestOptions() { MaxItemCount = 5 }); | |
var batchCount = 0; | |
while (iterator.HasMoreResults) | |
{ | |
batchCount++; | |
var batch = await iterator.ReadNextAsync(); | |
foreach (var entry in batch) | |
{ | |
entries.Add(entry); | |
} | |
} | |
return new | |
{ | |
batchCount = batchCount, | |
totalEntries = entries.Count, | |
entries = entries | |
}; | |
}); |
So, I would expect /ef
to work fine too. But it often times out like in this log.
Calling endpoint 'http://localhost:36563/ef for resource 'api' in app 'CosmosEndToEnd.AppHost'
And the exceptions before retries are like:
A transient exception occurred during execution. The operation will be retried after 1081ms.
Microsoft.Azure.Cosmos.CosmosException : Response status code does not indicate success: RequestTimeout (408); Substatus: 0; ActivityId: 9e083133-9535-45ce-851a-713e40630608; Reason: (GatewayStoreClient Request Timeout. Start Time UTC:08/28/2024 22:25:38; Total Duration:65000.8637 Ms; Request Timeout 65000 Ms; Http Client Timeout:65000 Ms; Activity id: 9e083133-9535-45ce-851a-713e40630608;);
---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
---> System.Net.Sockets.SocketException (125): Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](CancellationToken cancellationToken, Int32 estimatedSize)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](Memory`1 buffer, CancellationToken cancellationToken)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async)
at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.Metrics.MetricsHandler.SendAsyncWithMetrics(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts)
at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
at Microsoft.Azure.Cosmos.CosmosHttpClientCore.ExecuteHttpHelperAsync(HttpRequestMessage requestMessage, ResourceType resourceType, CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.CosmosHttpClientCore.SendHttpHelperAsync(Func`1 createRequestMessageAsync, ResourceType resourceType, HttpTimeoutPolicy timeoutPolicy, IClientSideRequestStatistics clientSideRequestStatistics, CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.GatewayStoreClient.InvokeAsync(DocumentServiceRequest request, ResourceType resourceType, Uri physicalAddress, CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.GatewayStoreModel.ProcessMessageAsync(DocumentServiceRequest request, CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Handlers.TransportHandler.ProcessMessageAsync(RequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Cosmos.Handlers.TransportHandler.SendAsync(RequestMessage request, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at Microsoft.Azure.Cosmos.ResponseMessage.EnsureSuccessStatusCode()
at Microsoft.EntityFrameworkCore.Cosmos.Storage.Internal.CosmosClientWrapper.CreateContainerIfNotExistsOnceAsync(DbContext _, ValueTuple`2 parametersTuple, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass30_0`2.<<ExecuteAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken)
--- Cosmos Diagnostics ---{"Summary":{"GatewayCalls":{"(0, 0)":1}},"name":"CreateContainerStreamAsync","start datetime":"2024-08-28T22:25:38.876Z","duration in milliseconds":65001.6493,"data":{"Client Configuration":{"Client Created Time Utc":"2024-08-28T22:24:33.6750368Z","MachineId":"vmId:d0263488-b5f7-4868-9f1d-8b403582d46c","VM Region":"westus","NumberOfClientsCreated":2,"NumberOfActiveClients":2,"ConnectionMode":"Gateway","User Agent":"cosmos-netstandard-sdk/3.42.0|2|X64|Ubuntu 22.04.4 LTS|.NET 8.0.7|DN| Microsoft.EntityFrameworkCore.Cosmos/8.0.8","ConnectionConfig":{"gw":"(cps:50, urto:6, p:False, httpf: False)","rntbd":"(cto: 5, icto: -1, mrpc: 30, mcpe: 65535, erd: True, pr: ReuseUnicastPort)","other":"(ed:True, be:False)"},"ConsistencyConfig":"(consistency: NotSet, prgns:[], apprgn: )","ProcessorCount":2}},"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.RequestInvokerHandler","duration in milliseconds":65001.5201,"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.DiagnosticsHandler","duration in milliseconds":65001.4947,"data":{"System Info":{"systemHistory":[{"dateUtc":"2024-08-28T22:25:50.7661661Z","cpu":6.727,"memory":3280836.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.292,"availableThreads":32766,"minThreads":2,"maxThreads":32767},"numberOfOpenTcpConnection":0},{"dateUtc":"2024-08-28T22:26:00.7664554Z","cpu":6.593,"memory":3280888.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0488,"availableThreads":32766,"minThreads":2,"maxThreads":32767},"numberOfOpenTcpConnection":0},{"dateUtc":"2024-08-28T22:26:10.7667479Z","cpu":15.760,"memory":3279748.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.232,"availableThreads":32766,"minThreads":2,"maxThreads":32767},"numberOfOpenTcpConnection":0},{"dateUtc":"2024-08-28T22:26:20.7673058Z","cpu":7.456,"memory":3280048.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.2036,"availableThreads":32766,"minThreads":2,"maxThreads":32767},"numberOfOpenTcpConnection":0},{"dateUtc":"2024-08-28T22:26:30.7676397Z","cpu":7.280,"memory":3280100.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.0502,"availableThreads":32766,"minThreads":2,"maxThreads":32767},"numberOfOpenTcpConnection":0},{"dateUtc":"2024-08-28T22:26:40.7679509Z","cpu":6.448,"memory":3279920.000,"threadInfo":{"isThreadStarving":"False","threadWaitIntervalInMs":0.2664,"availableThreads":32766,"minThreads":2,"maxThreads":32767},"numberOfOpenTcpConnection":0}]}},"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.TelemetryHandler","duration in milliseconds":65001.4701,"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.RetryHandler","duration in milliseconds":65001.4639,"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.RouterHandler","duration in milliseconds":65001.4264,"children":[{"name":"Microsoft.Azure.Cosmos.Handlers.TransportHandler","duration in milliseconds":65001.4223,"children":[{"name":"Microsoft.Azure.Cosmos.GatewayStoreModel Transport Request","duration in milliseconds":65001.2505,"data":{"Client Side Request Stats":{"Id":"AggregatedClientSideRequestStatistics","ContactedReplicas":[],"RegionsContacted":[],"FailedReplicas":[],"AddressResolutionStatistics":[],"StoreResponseStatistics":[],"HttpResponseStats":[{"StartTimeUTC":"2024-08-28T22:25:38.8768352Z","DurationInMs":65000.7969,"RequestUri":"https://127.0.0.1:38213/dbs/ef/colls","ResourceType":"Collection","HttpMethod":"POST","ActivityId":"9e083133-9535-45ce-851a-713e40630608","ExceptionType":"System.Threading.Tasks.TaskCanceledException","ExceptionMessage":"The operation was canceled."}]}}}]}]}]}]}]}]}]}
Note 'timeout: 65000 Ms
in the CosmosException
. Looking at the trace, my understanding is that this is the timeout from efcore, which does seem to use 65 secs default timeout for cosmos.
I'm not sure why it takes longer for the /ef
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the emulator is running, waiting over 1 minute to execute a query seems crazy. I'm not sure waiting for another minute is going to help.
@roji @AndriySvyryd @ajcvickers - Is there anything special about the EF cosmos client here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why, but it does seem to help. I tried to run the same test 6 times on the CI in one run, and trying that twice it didn't fail. I can test more iterations too.
It would be great to find out if that is reliable or not, and why this takes longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything special about the EF cosmos client here?
There shouldn't be, see https://github.com/dotnet/efcore/blob/main/src/EFCore.Cosmos/Storage/Internal/SingletonCosmosClientWrapper.cs#L32
- Is there high CPU utilization?
- Is there high degree of concurrency?
- Is it a large request or response?
/azp run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this change and see if that fixes the stability of these tests. I hit this one myself earlier today.
Azure Pipelines successfully started running 1 pipeline(s). |
.. playground app.
The
/ef
request times out after 65 seconds, which is the default timeout for efcore cosmos requests. Bumping this to 120 seconds makes the test run more stable.Fixes issue: #5415
Microsoft Reviewers: Open in CodeFlow