Adjust Test_wait_interrupted_user_apc test timeout to handle deviation due to lowres timers. #116066

lateralusX · 2025-05-28T11:22:38Z

Looks like outer loop x64 and arm64 Windows lanes hit issues with one of the tests added in #116001.

Test validates that waits are not broken too early by queued APC's by measuring time it spends waiting compared to requested timeout. Test uses higres timer to measure, but it appears that CoreCLR uses lowres timers calculating the wait timeout. Test probably need to include some error margin to handle timer resolution differences.

PR adds logging to the amount of time waited in case of error and increased the acceptance deviations to 500 ms, should be enough to trigger multiple APC's triggering retry of the internal wait with recalculated timeout.

Fixes #116060

Copilot

Pull Request Overview

This PR adjusts the wait timeout test to better accommodate timing deviations observed on Windows lanes using low-res timers. Key changes include lowering the minimum expected wait time to 1500 ms, adding a local variable for elapsed milliseconds, and enhancing the log output in case of a timeout error.

src/tests/baseservices/threading/regressions/115178/115178.cs

lateralusX · 2025-05-28T15:27:54Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2025-05-28T15:32:16Z

Azure Pipelines successfully started running 1 pipeline(s).

dotnet-policy-service · 2025-05-31T17:25:05Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

lateralusX · 2025-06-04T09:04:27Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2025-06-04T09:04:44Z

Azure Pipelines successfully started running 1 pipeline(s).

lateralusX · 2025-06-04T11:59:45Z

/azp run runtime-coreclr jitstress

azure-pipelines · 2025-06-04T12:00:01Z

Azure Pipelines successfully started running 1 pipeline(s).

lateralusX · 2025-06-04T14:48:38Z

/azp run runtime-coreclr r2r

azure-pipelines · 2025-06-04T14:48:55Z

Azure Pipelines successfully started running 1 pipeline(s).

lateralusX · 2025-06-09T09:19:05Z

Have run the following additional test suites>

/azp run runtime-coreclr outerloop
/azp run runtime-coreclr jitstress
/azp run runtime-coreclr r2r

Failures seems unrelated to this test since its only running on Windows and failures are on none Windows platforms.

Any more test suites that would need to be executed before we can merge this PR and re-enable the test @mangod9, @jkotas ?

jkotas · 2025-06-11T18:28:48Z

increased the acceptance deviations to 500 ms,

This sounds like too generous tolerance. The precision of the low-res timer on Windows is no worse than 20ms, so the tolerance for waiting less should not be more than that. (The tolerance for waiting more can be large - to handle overloaded machines.)

I am wondering whether there is a subtle bug in the wait implementation that causes the error to accumulate:

runtime/src/coreclr/vm/threads.cpp

Lines 3314 to 3355 in f1bff2a

    
           retry: 
        
               if (millis != INFINITE) 
        
               { 
        
                   dwStart = minipal_lowres_ticks(); 
        
               } 
        
               if (tryNonblockingWaitFirst) 
        
               { 
        
                   // We have a final wait result from the nonblocking wait above 
        
                   tryNonblockingWaitFirst = false; 
        
               } 
        
               else 
        
               { 
        
                   ret = DoAppropriateAptStateWait(countHandles, handles, waitAll, millis, mode); 
        
               } 
        
               if (ret == WAIT_IO_COMPLETION) 
        
               { 
        
                   _ASSERTE (alertable); 
        
                   if (m_State & TS_Interrupted) 
        
                   { 
        
                       HandleThreadInterrupt(); 
        
                   } 
        
                   // We could be woken by some spurious APC or an EE APC queued to 
        
                   // interrupt us. In the latter case the TS_Interrupted bit will be set 
        
                   // in the thread state bits. Otherwise we just go back to sleep again. 
        
                   if (millis != INFINITE) 
        
                   { 
        
                       dwEnd = minipal_lowres_ticks(); 
        
                       if (dwEnd - dwStart >= millis) 
        
                       { 
        
                           ret = WAIT_TIMEOUT; 
        
                           goto WaitCompleted; 
        
                       } 
        
                       else 
        
                       { 
        
                           millis -= (DWORD)(dwEnd - dwStart); 
        
                       } 
        
                   } 
        
                   goto retry; 
        
               }

. Should the retry label be right in front of if (tryNonblockingWaitFirst) and should we update the start time with the value that we have read after the wait instead?

lateralusX · 2025-06-16T08:05:10Z

Didn't look too deep into CoreCLR wait implementation, but if we don't except to much deviation, then we can harden it more and potentially reduce deviation in CoreCLR wait implementation. Question is what an acceptable tolerance would be without introduce flakiness in case there is high loads on the machines running the test.

Also, the purpose of the test was not the measure the exact diffs in wait, but to make sure custom APC's didn't prematurely break waits and from that perspective the current tolerance is enough, since test queues an APC every 100ms, so if APC's incorrectly breaks wait, we will notice it with current tolerance as well.

jkotas · 2025-06-17T06:27:01Z

High load on the machine can make us to wait significantly longer time, but it should never make us wait significantly shorter time. If we wait significantly shorter time, it is a bug that we should be fixed.

I think we should:

Set the tolerance for waiting less than specified to 20ms (it is more than 15.6ms that is GetTickCount resolution)
Improve precision of wait time calculation that I have pointed above (something like jkotas@f2efd2f)

lateralusX · 2025-06-17T07:37:52Z

OK, I fix up the wait implementation in this PR as well as accept any waits greater than 1980 ms (the wait is set to 2000 ms in the test) as acceptable and then re-run all the test suites to make sure they still pass. As pointed out, regardless of machine load we should never observe early wakeups, but waits might end up longer, that is fine since this test is only interested in early wakeups.

lateralusX · 2025-07-29T12:19:12Z

Did suggested changes to make timing more accurate + harden accepted delta down to 20ms to account for difference in timer resolution. Let's see how well it stands up against the tests cross the CI pipelines.

jkotas

Thanks

lateralusX · 2025-07-29T16:08:17Z

/azp run runtime-coreclr outerloop

azure-pipelines · 2025-07-29T16:08:36Z

Azure Pipelines successfully started running 1 pipeline(s).

lateralusX · 2025-07-29T19:01:05Z

/azp run runtime-coreclr jitstress

azure-pipelines · 2025-07-29T19:01:22Z

Azure Pipelines successfully started running 1 pipeline(s).

lateralusX · 2025-07-29T19:01:35Z

/azp run runtime-coreclr r2r

azure-pipelines · 2025-07-29T19:01:44Z

Azure Pipelines successfully started running 1 pipeline(s).

lateralusX · 2025-07-30T10:05:03Z

Have run the following additional test suites:

/azp run runtime-coreclr outerloop
/azp run runtime-coreclr jitstress
/azp run runtime-coreclr r2r

All pass, going to merge this assuming that the 20ms deviation tolerance + making the timing a little more accurate is enough for the Test_wait_interrupted_user_apc test to reliable pass.

MichalStrehovsky · 2025-07-30T21:38:17Z

@lateralusX the added test is reliably failing native AOT outerloops. do we need similar adjustment to what was done in CoreCLR VM?

jkotas · 2025-07-30T23:10:09Z

NAOT does not perform alertable waits currently. Notice that alertable is false for both RhCompatibleReentrantWaitAny and WaitForMultipleObjectsEx here

runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.Windows.cs

Lines 53 to 62 in b5f8e98

    
           int result; 
        
           if (reentrantWait) 
        
           { 
        
               Debug.Assert(!waitAll); 
        
               result = RuntimeImports.RhCompatibleReentrantWaitAny(false, millisecondsTimeout, numHandles, pHandles); 
        
           } 
        
           else 
        
           { 
        
               result = (int)Interop.Kernel32.WaitForMultipleObjectsEx((uint)numHandles, (IntPtr)pHandles, waitAll ? Interop.BOOL.TRUE : Interop.BOOL.FALSE, (uint)millisecondsTimeout, Interop.BOOL.FALSE); 
        
           }

. We may need to pass alertable: true here and retry when the wait returns WAIT_IO_COMPLETION.

lateralusX · 2025-07-31T08:06:31Z

Maybe we should disable the test under NAOT config until supported, @MichalStrehovsky, @jkotas ? Do we have some log output of a failing outerloop test to see what the test actually fails upon.

lateralusX · 2025-07-31T08:46:00Z

Found the logging:

Return code:      -100
Raw output file:      C:hw9D0908A9wB1800967uploads
egressions115178115178output.txt
Raw output:
BEGIN EXECUTION
call C:hw9D0908A9p
ativeaottest.cmd C:hw9D0908A9wB1800967e�aseservices	hreading
egressions115178115178 115178.dll
Running RunTestUsingInfiniteWait test.
Starting thread waiting on event.
Waiting for thread to enter wait...
Queue user APC.
Waiting for APC to execute...

Since NAOT doesn't do alterable waits this test is not supported on NAOT, will disable it on NAOT.

lateralusX · 2025-07-31T08:50:36Z

PR disabling the test on NAOT, #118231

dotnet-policy-service bot assigned lateralusX May 28, 2025

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 28, 2025

lateralusX marked this pull request as ready for review May 28, 2025 12:57

Copilot AI review requested due to automatic review settings May 28, 2025 12:57

Copilot AI reviewed May 28, 2025

View reviewed changes

src/tests/baseservices/threading/regressions/115178/115178.cs Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request May 28, 2025

Microsoft.DotNet.CoreSetup.Test.HostActivation.FrameworkDependentAppLaunch.AppHost_GlobalLocation fails with "Failed to restore file HelloWorld.exe" #115851

Closed

dotnet deleted a comment from azure-pipelines bot May 28, 2025

steveisok approved these changes May 28, 2025

View reviewed changes

build-analysis bot mentioned this pull request May 28, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

teo-tsirpanis added area-System.Threading and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 31, 2025

lateralusX mentioned this pull request Jun 2, 2025

Test failure: baseservices/threading/regressions/115178/115178/115178.cmd #116060

Closed

lateralusX force-pushed the lateralusX/adjust-115178-timeout branch from 63bea4a to 0d86cac Compare June 3, 2025 16:54

Adjust test timeout to handle deviation due to lowres timers.

b3764ff

lateralusX added 4 commits July 29, 2025 10:13

Acceept 500 ms deviation on wait time.

8e370ea

Reenable test.

047ec10

Adjust wait time calculation.

64ea11b

Reduce wait delta to 20ms to account for low vs high res timers.

b1b1689

lateralusX force-pushed the lateralusX/adjust-115178-timeout branch from 0d86cac to b1b1689 Compare July 29, 2025 12:10

build-analysis bot mentioned this pull request Jul 29, 2025

System.IO.Tests.File_GetSetTimes_SafeFileHandle.WritingShouldUpdateWriteTime_After_SetLastAccessTime failing #97020

Open

jkotas mentioned this pull request Jul 29, 2025

System.IO.Tests.DirectoryInfo_Create.DriveLetter_Unix test failure - exit code 133 #118168

Open

jkotas approved these changes Jul 29, 2025

View reviewed changes

jkotas mentioned this pull request Jul 29, 2025

PortableComponentOnSelfContainedAppRidAssetResolution.RidSpecificAssembly_CurrentRid test failure #118170

Closed

lateralusX merged commit cfe9056 into dotnet:main Jul 30, 2025
189 checks passed

MichalStrehovsky mentioned this pull request Jul 31, 2025

WaitHandle doesn't do alertable waits #118233

Closed

Copilot AI mentioned this pull request Aug 1, 2025

Implement alertable waits for WaitHandle to support APC handling in NativeAOT #118256

Merged

github-actions bot locked and limited conversation to collaborators Aug 31, 2025

Adjust Test_wait_interrupted_user_apc test timeout to handle deviation due to lowres timers. #116066

Adjust Test_wait_interrupted_user_apc test timeout to handle deviation due to lowres timers. #116066

Uh oh!

Conversation

lateralusX commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

lateralusX commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azure-pipelines bot commented May 28, 2025

Uh oh!

dotnet-policy-service bot commented May 31, 2025

Uh oh!

lateralusX commented Jun 4, 2025

Uh oh!

azure-pipelines bot commented Jun 4, 2025

Uh oh!

lateralusX commented Jun 4, 2025

Uh oh!

azure-pipelines bot commented Jun 4, 2025

Uh oh!

lateralusX commented Jun 4, 2025

Uh oh!

azure-pipelines bot commented Jun 4, 2025

Uh oh!

lateralusX commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Jun 11, 2025

Uh oh!

lateralusX commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Jun 17, 2025

Uh oh!

lateralusX commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

lateralusX commented Jul 29, 2025

Uh oh!

azure-pipelines bot commented Jul 29, 2025

Uh oh!

lateralusX commented Jul 29, 2025

Uh oh!

azure-pipelines bot commented Jul 29, 2025

Uh oh!

lateralusX commented Jul 29, 2025

Uh oh!

azure-pipelines bot commented Jul 29, 2025

Uh oh!

lateralusX commented Jul 30, 2025

Uh oh!

Uh oh!

MichalStrehovsky commented Jul 30, 2025

Uh oh!

jkotas commented Jul 30, 2025

Uh oh!

lateralusX commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lateralusX commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

lateralusX commented May 28, 2025 •

edited

Loading

lateralusX commented May 28, 2025 •

edited

Loading

lateralusX commented Jun 9, 2025 •

edited

Loading

lateralusX commented Jun 16, 2025 •

edited

Loading

lateralusX commented Jun 17, 2025 •

edited

Loading

lateralusX commented Jul 29, 2025 •

edited

Loading

lateralusX commented Jul 31, 2025 •

edited

Loading

lateralusX commented Jul 31, 2025 •

edited

Loading

lateralusX commented Jul 31, 2025 •

edited

Loading