-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Port C# key hardware intrinsics APIs for SSE from SIMD native algorithms #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Need to multi-target CpuMath for netstandard and netcoreapp2.1. Also, since we are going to move CpuMath into its own NuGet package, remove the dependency from CpuMath to the ML.Core project.
…tting. Note: It will not compile until Microsoft.ML.CpuMath.csproj is changed to adapt to multi-targetting.
…cs to simplify CpuMathUtils.DotNetCoreApp.cs
…o regardless of target framework
- due to working in Mac OSX
@@ -0,0 +1,451 @@ | |||
using System; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All code files need a copyright header on them. See other .cs
files for what the header should look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Some new files include the header, while some others don't. Will fix it right away.
|
||
namespace Microsoft.ML.CpuMath.UnitTests.DotNetCoreApp | ||
{ | ||
public class UnitTests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should give this class a more descriptive name of what it is testing. Probably something similar to CpuMathUtilsTests
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited thanks
Issues at the end of today:
|
build/AfterCommonTargets.targets
Outdated
<Project> | ||
<PropertyGroup> | ||
<MSBuildAllProjects>$(MSBuildAllProjects);$(MSBuildThisFileFullPath)</MSBuildAllProjects> | ||
<VSTestTargets Condition="'$(UseIntrinsics)' != 'true' and '$(TargetFramework)' == 'netcoreapp3.0'">ignore.targets</VSTestTargets> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? I'm not familiar. Also, since you are also overriding VSTest
- isn't that enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this property setting, we should factor out '$(UseIntrinsics)' != 'true' and '$(TargetFramework)' == 'netcoreapp3.0'
into an MSBuild property so it is only declared once.
In reply to: 206735143 [](ancestors = 206735143)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that the place where we try to blank VSTest
(https://github.com/dotnet/machinelearning/pull/562/files/c6f5f8569ea31bc30e0b7ae54870b9cbf51e2bdb#diff-77ebf330684fd059d7aecec3ce7e6218R19) is processed before the program builds the actual VSTest
, so without this line, even though we would have blanked VSTest
, the entire VSTest
would be built all over again. Because of this order of the build, one way to solve this issue is to ignore the built targets of VSTest
to ignore the test projects. Not sure if it is the best way to do it so feel free to suggest any optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I helped him with this yesterday and by doing a pp log I found out that Microsoft.TestPlatform.targets
where imported after the actual Empty.targets
file even though they are specified to be imported after the common targets, I don't know if that is the way it is designed or if it is a bug.
So I found out that there was a VSTestTargets
overridable property declared in: 15.0\Microsoft.Common.targets\ImportAfter\Microsoft.TestPlatform.ImportAfter.targets:17
that points to Microsoft.TestPlatform.targets
and this ImportAfter.targets
file will import that project if the specified path exists. So in order to override the VSTest
target I suggested to just set that property to a non-existent targets file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm reading this correctly, that means this line in Empty.targets:
<Target Name="VSTest"/>
Isn't doing anything and can be removed, right?
Also - it would probably make sense to move setting this property into Empty.targets
. That would resolve my "we should factor this condition out" comment above as well. build/AfterCommonTargets.targets
is responsible for deciding whether Empty.targets should be imported. Empty.targets decides how to ensure no targets are run (which includes setting this property).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I'm reading this correctly, that means this line in Empty.targets:
<Target Name="VSTest"/>
Isn't doing anything and can be removed, right?
It is actually needed because in run-tests.proj we're calling the VSTest target on every test project: https://github.com/dotnet/machinelearning/blob/master/test/run-tests.proj#L10
And if we don't declare it, it will complain saying that the VSTest target doesn't exist in both test projects targeting netcoreapp3.0.
Also - it would probably make sense to move setting this property into Empty.targets. That would resolve my "we should factor this condition out" comment above as well. build/AfterCommonTargets.targets is responsible for deciding whether Empty.targets should be imported. Empty.targets decides how to ensure no targets are run (which includes setting this property).
That is right, no sense having the same condition twice, just declare it when Empty.targets
is imported, which is conditioned already.
I also thought of just simply excluding both projects from: https://github.com/dotnet/machinelearning/blob/master/test/run-tests.proj#L5
But it makes more sense to me to do everything inside Empty.targets
since whenever the repo is ready to build and target netcoreapp3.0 we can just remove this files and we're good to go.
<TargetFramework>netcoreapp3.0</TargetFramework> | ||
<IsPackable>false</IsPackable> | ||
<!-- The SDK we currently restore and use has 2.0 as the maximum supported version --> | ||
<NETCoreAppMaximumVersion>3.0</NETCoreAppMaximumVersion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be able to be removed.
<TargetFramework>netcoreapp3.0</TargetFramework> | ||
<AllowUnsafeBlocks>true</AllowUnsafeBlocks> | ||
<!-- The SDK we currently restore and use has 2.0 as the maximum supported version --> | ||
<NETCoreAppMaximumVersion>3.0</NETCoreAppMaximumVersion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
Let's not merge this until after we branch for |
vector = Sse3.HorizontalAdd(vector, vector); | ||
vector = Sse3.HorizontalAdd(vector, vector); | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding: (Responding to your previous PR comment) Factoring the VectorSum
function out of the intrinsics has worked quite well, but my old implementation of VectorSum
(which outputs a Vector128<float>
) seemed to cause some perf issues (doubled the running time of 4 intrinsics that call VectorSum
). I changed it to ToVectorSum
, which changes vector
in place, but the same perf issue persisted. Then, I (thanks to Santi's help) tried the AggressiveInlining
attribute above to make the function inline, which has improved the perf results back to normal. Curious about how it sounds to you, and if there is any better way to do it.
Side question: Should I also make the other private functions - Load1
, Load4
, Rotate
, RotateReverse
, and Store4
- inline with the AggressiveInlining
attribute to improve perf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably want to change private static void ToVectorSum(ref Vector128<float> vector)
to private static Vector128<T> ToVectorSum(in Vector128<float> vector)
, as Vector128<T>
was made into a readonly struct
, but it otherwise looks good.
Should I also make the other private functions - Load1, Load4, Rotate, RotateReverse, and Store4 - inline with the AggressiveInlining attribute to improve perf
I would think they are already being inlined (except for Store4), if that isn't the case, then the methods are probably small enough that inlining would help. I would recommend benchmarking and looking at the codegen for both having them explicitly inlined and not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not hurt to put AggressiveInlining attribute on these short and internal functions.
For your old code, even though VectorSum is inlined, the codegen is not optimized due to register spill and reload. It seems JIT has optimization opportunity over there. Do you mind opening an issue to discuss about it on CoreCLR github repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input! Adding AggressiveInlining
attributes for those short functions to mimic the behavior of the existing native code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll pick up logging a bug for CoreCLR. We should get a minimal repro and ensure it isn't already tracked by one of the "first class struct" issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. 2 last things (plus resolving the merge conflict on the .sln file), and I think this can be merged.
Nice work, @briancylui
src/Native/build.cmd
Outdated
@@ -86,6 +88,9 @@ if %__IntermediatesDir% == "" ( | |||
set "__CMakeBinDir=%__CMakeBinDir:\=/%" | |||
set "__IntermediatesDir=%__IntermediatesDir:\=/%" | |||
|
|||
if [%CMAKE_BUILD_TYPE:~-11%] == [-Intrinsics] ( | |||
set CMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE:~0,-11% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you stick a comment here saying what is happening (it isn't obvious from the code).
"Strip the -Intrinsics
from the build type"
<TargetFramework Condition="'$(UseIntrinsics)' != 'true'">netstandard2.0</TargetFramework> | ||
<TargetFrameworks Condition="'$(UseIntrinsics)' == 'true'">netstandard2.0;netcoreapp3.0</TargetFrameworks> | ||
<IncludeInPackage>Microsoft.ML.CpuMath</IncludeInPackage> | ||
<AllowUnsafeBlocks>true</AllowUnsafeBlocks> | ||
<DefineConstants>$(DefineConstants);CORECLR;PRIVATE_CONTRACTS</DefineConstants> | ||
<LangVersion>7.3</LangVersion> | ||
<!-- The SDK we currently restore and use has 2.0 as the maximum supported version --> | ||
<NETCoreAppMaximumVersion>3.0</NETCoreAppMaximumVersion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with removing the NETCoreAppMaximumVersion
tag.
In a recent fix of the VectorSum
function in SseIntrinsics.cs
, since we need to rely on the read-only feature of Vector128
introduced in C# 7.2 to change the vector in-place, the Lang Version has to be at least 7.2. However, since we use the feature of extensible fixed statements
in SseIntrinsics.cs
, which is a feature in C# 7.3, the LangVersion
restriction has been set to 7.3
. Reference: dotnet/roslyn#24469
Please let me know how I should proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I was specifically talking about <NETCoreAppMaximumVersion>3.0</NETCoreAppMaximumVersion>
. The LangVersion
change is needed.
Responded to the second round of review, and now this PR is ready for a third round if any. Gathering thoughts on this PR before fixing the merge conflicts in the |
Merge conflicts have been fixed. Green builds for all 5 checks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Nice work, Brian.
.Run(null, CreateClrVsCoreConfig()); | ||
} | ||
|
||
private static IConfig CreateClrVsCoreConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. We can leave it like this for now.
private const int DEFAULT_SEED = 253421; | ||
private const float DEFAULT_SCALE = 1.11f; | ||
|
||
private float[] src, dst, original, src1, src2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This all non constant privates should be prefixed with _
but it can be updated in your next PR to not block this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the issue page (briancylui#2) to include this action item for future follow-up. Thank you for your input!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an expert in the intrinsic operations but looks good to me. Nice work Brian!
Thank you all reviewers for your time reviewing this PR and making great suggestions to fix issues or improve performance. I have learnt a lot from all of you. Since I don't have write access to the After the merging, I can then start writing more SSE intrinsics in a new branch. |
Structure:
Changes to users:
Future follow-up for
CpuMath
enhancementCpuMath
enhancement in this PR scheduled for future follow-ups have been compiled into an issue page here.