-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix to get the correct value for Environment.TickCount in OSX #1269
Conversation
On linux monotonic clock API clock_gettime is available. But on OSX the PAL implementation falls back to gettimeofday. This always retured the time elapsed since Unix Epoch (1970/01/01). I fixed it by using the mach API clock_get_time on OSX to get the time elapsed since system start. Fixes #2441 https://github.com/dotnet/corefx/issues/2441
Hi @Sridhar-MS, I'm your friendly neighborhood .NET Foundation Pull Request Bot (You can call me DNFBOT). Thanks for your contribution! TTYL, DNFBOT; |
@Sridhar-MS, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR. |
clock_serv_t clock; | ||
mach_timespec_t mts; | ||
|
||
host_get_clock_service(mach_host_self(), SYSTEM_CLOCK, &clock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaks the host port, no? Also, why are you not checking the return value here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I missed checking the return value here. But I am deallocating the host port using 'mach_port_deallocate' a bunch of lines below. I need to make it better to deallocate the host port even in case of failing return values. I am working on the fix.
- Add return value check for "host_get_clock_service". - Fix host port leaks in GetTickCount64 and GlobalMemoryStatusEx.
I did some reference counting experiment on host ports. I discovered that there is one more place in the PAL layer (src/pal/src/misc/sysinfo.cpp) where the host port is leaking. I have fixed it too. |
mach_timespec_t mts; | ||
host_t host = mach_host_self(); | ||
|
||
if((machRet = host_get_clock_service(host, SYSTEM_CLOCK, &clock)) != KERN_SUCCESS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetTickCount is expected to be very fast API. Acquiring/releasing the clock service for each call looks pretty expensive. Could you please do some quick performance measuments, and maybe consider caching the clock service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas I did some crude performance measurements around Environment.TickCount.
With my changes (no caching) - Average time 3.0 microseconds
Without my changes (using gettimeofday) - Average time 0.13 microseconds.
I am not sure if perf cost is within acceptable level of tolerance. Meanwhile I am also trying to get perf measurements with caching the clock service.
It did my testing on release builds and got the average from 10 million iterations.
const long tccSecondsToMicroSeconds = 1000000;
[Fact]
public static void TickCountPerf()
{
int iterations = 10000000;
long totalTime = 0;
for (int i = 0; i < iterations; i++)
{
long startTime = GetTimeofDay();
int tick = Environment.TickCount;
totalTime += GetTimeofDay() - startTime;
}
float average = (float)totalTime / iterations;
Console.WriteLine("totalTime - {0} usecs", totalTime);
Console.WriteLine("Average - {0} usecs", average);
}
public static long GetTimeofDay()
{
timeval tv = new timeval();
int result = gettimeofday(ref tv, IntPtr.Zero);
return (tv.tv_sec * tccSecondsToMicroSeconds) + tv.tv_usec;
}
[DllImport("libc")]
// provides microsecond level resolution
private static extern int gettimeofday(ref timeval tv, IntPtr tz);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas I also got the results with caching the clock service.
With my changes (no caching) - Average time 3.0 microseconds.
With my changes (with caching) - Average time 0.9 microseconds.
Without my changes (using gettimeofday) - Average time 0.13 microseconds.
There is a minor increase in perf cost even with caching enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The microbenchmark is not right. GetTimeofDay() method should not be called for every iteration because of it is about as expensive as Environment.TickCount that you are trying to measure. It should be called only once before and after the loop:
public static void TickCountPerf()
{
int iterations = 10000000;
long startTime = GetTimeofDay();
for (int i = 0; i < iterations; i++)
{
int tick = Environment.TickCount;
}
long totalTime = GetTimeofDay() - startTime;
...
}
Once you fix the microbenchmark, I expect that you will get numbers ~0.03 microseconds without your change, and ~3 microseconds with your change. Making Environment.TickCount 100x slower is certainly not within acceptable level of tolerance.
For reference, Environment.TickCount takes less than 0.01 microseconds on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @jkotas! I updated the benchmark and got the results which are close to the numbers you predicted.
With my changes (no caching) - Average time 2.86 microseconds.
With my changes (with caching) - Average time 0.82 microseconds.
Without my changes (using gettimeofday) - Average time 0.05 microseconds.
So with caching enabled it is around it is around 16.4x slower.
Here is the updated microbench mark
const long tccSecondsToMicroSeconds = 1000000;
[Fact]
public static void TickCountPerf()
{
int iterations = 10000000;
long startTime = GetTimeofDay();
for (int i = 0; i < iterations; i++)
{
int tick = Environment.TickCount;
}
long totalTime = GetTimeofDay() - startTime;
float average = (float)totalTime / iterations;
Console.WriteLine("totalTime -- {0} usecs", totalTime);
Console.WriteLine("Average -- {0} usecs", average);
}
public static long GetTimeofDay()
{
timeval tv = new timeval();
int result = gettimeofday(ref tv, IntPtr.Zero);
return (tv.tv_sec * tccSecondsToMicroSeconds) + tv.tv_usec;
}
[DllImport("libc")]
private static extern int gettimeofday(ref timeval tv, IntPtr tz);
}
struct timeval
{
public long tv_sec; /* seconds since Jan. 1, 1970 */
public long tv_usec; /* and microseconds */
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caching enabled it is around it is around 16.4x slower
It is better than before, but still ~100x slower compared to Windows...
I have looked around for what is available ... have you considered using mach_absolute_time()? It seems to be the fastest way to get what you want.
Also, I am wondering whether we need to make similar change in other places like https://github.com/dotnet/coreclr/blob/master/src/pal/src/synchmgr/synchmanager.cpp#L4881 to be robust around clock time adjustment (e.g. day light switch). This can be tracked as separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also looking into mach_absolute_time(). I did some initial perf test and I got the Average time down to 0.04 vs 0.05 without my changes. I will keep you updated on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mach_absolute_time uses rdtsc
…ach_absolute_time' in GetTickCount64. Monotonic clock function 'clock_get_time' was causing perf regression in 'Environment.TickCount'. So replacing it with 'mach_absolute_time' which gives considerable performance gains. Here are the perf numbers Using 'clock_get_time' (without caching host ports) - Average time 2.86 microseconds. Using 'clock_get_time' (with caching host ports) - Average time 0.82 microseconds. Using 'gettimeofday' (original implementation) - Average time 0.05 microseconds. Using 'mach_absolute_time' - Average time 0.04 microseconds.
@jkotas Here are the final perf numbers I was initially was not sure if 'mach_absolute_time' was monotonic, so I went ahead with 'clock_get_time'. Now I verified that both 'clock_get_time' and 'mach_absolute_time' return the exact same time elapsed. With that said I replaced the implementation with 'mach_absolute_time' because of the better performance numbers. |
@@ -298,4 +320,4 @@ GetTickCount64() | |||
#endif // HAVE_CLOCK_MONOTONIC | |||
EXIT: | |||
return retval; | |||
} | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add new line at the end of the file?
…nd resolution support for StopWatch in OSX. OSX provides nanosecond timer capabilities. But the PAL was not using it and falling back to microsecond-level resolution.
SET_DEFAULT_DEBUG_CHANNEL(MISC); | ||
|
||
static BOOL s_TimeInitialized = FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable should not be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am new to the PAL source code, so I am trying to follow existing conventions. Example - https://github.com/dotnet/coreclr/blob/master/src/pal/src/exception/machexception.cpp#L74.
So I believe that if you are not freeing any resources in cleanup you dont need an 'init' variable.
LGTM modulo comments |
I mentioned this in a now outdated comment, but mach_absolute_time uses rdtsc. My understanding is TickCount is supposed to be monotonic? |
@kangaroo I had the same concern initially. But in the code we are rounding off nanosecond measurement into milliseconds. This makes the TickCount value to be non-decreasing ie. any jump back and forth by mach_absolute_time is offset by this rounding off. As an experiment, I ran both mach_absolute_time and monotonic function clock_get_time in a loop for 10 million times and rounding off both values to milliseconds. The value returned by mach_absolute_time was always non-decreasing and same as the value returned by clock_get_time. |
@Sridhar-MS Did you force core migrations, cpu frequency scaling, and S3 power switching? |
…in terms of ticks) in OSX. Reference: https://developer.apple.com/library/ios/qa/qa1643/_index.html
Fix to get the correct value for Environment.TickCount in OSX
On linux monotonic clock API clock_gettime is available. But on OSX the PAL implementation falls back to gettimeofday. This always retured the time elapsed since Unix Epoch (1970/01/01). I fixed it by using the mach API clock_get_time on OSX to get the time elapsed since system start.
Fixes #2441
https://github.com/dotnet/corefx/issues/2441
CC: @stephentoub