-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Port SymSGD trainer #624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port SymSGD trainer #624
Conversation
@@ -98,3 +98,7 @@ set -x # turn on trace | |||
cmake "$DIR" -G "Unix Makefiles" $__cmake_defines | |||
set +x # turn off trace | |||
make install | |||
if [[ "$OSTYPE" == "darwin"* ]]; then | |||
echo "Changing libMklImports.dylib's executable path within libSymSgdNative.dylib so that loader can find it." | |||
install_name_tool "-change" @loader_path/libMklImports.dylib @rpath/libMklImports.dylib "$RootRepo"/bin/x64."$__configuration"/Native/libSymSgdNative.dylib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work on an end-user's machine, when they use our NuGet packages? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: you can do this stuff inside of CMake too. It may make more sense there, since you won't have to do things like "$RootRepo"/bin/x64."$__configuration"/Native
. Plus then it can also be localized in the SymSgdNative\CmakeLists.txt
, and this high-level build script doesn't need to know about individual assemblies we are building.
Here's an example: https://github.com/dotnet/corefx/blob/20bcc468b12119e12428cfd9cce50ef5d5b9cc13/src/Native/System.Security.Cryptography.Native/CMakeLists.txt#L56-L70
In reply to: 206888376 [](ancestors = 206888376)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We spoke about it offline and the insall_name_tool step can moved out of the script and be done when nuget is created. I'll test the nuget as end user during bug-bash.
In reply to: 206964367 [](ancestors = 206964367,206888376)
[TestCategory("Binary")] | ||
public void BinaryClassifierSymSgdTest() | ||
{ | ||
//Results sometimes go out of error tolerance on OS X. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we open an issue to enable this test on macOS? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can but linear learners tend to differ on results. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the day, we still need test coverage even if the results differ. Just because running on a different OS produces different results/behavior doesn't mean we should test on that OS. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other test of SymSGD is enabled on macOS too. I can enable this as well just that roughly half the time it will fail because of error tolerance. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just open an issue saying that we should enable this test on macOS. Then when we work on that issue, we can make it stable as well. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -11,6 +11,7 @@ | |||
<ProjectReference Include="..\..\src\Microsoft.ML.FastTree\Microsoft.ML.FastTree.csproj" /> | |||
<ProjectReference Include="..\..\src\Microsoft.ML.Maml\Microsoft.ML.Maml.csproj" /> | |||
<ProjectReference Include="..\..\src\Microsoft.ML.ResultProcessor\Microsoft.ML.ResultProcessor.csproj" /> | |||
<ProjectReference Include="..\..\src\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change necessary now that SymSGD isn't in the StandardLearners project? #Resolved
src/Microsoft.ML/CSharpApi.cs
Outdated
/// <summary> | ||
/// Tolerance for difference in average loss in consecutive passes. | ||
/// </summary> | ||
public float Tol { get; set; } = 0.0001f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a complete word: Tolerance
. #Resolved
public int NumberOfIterations = 50; | ||
|
||
[Argument(ArgumentType.AtMostOnce, HelpText = "Tolerance for difference in average loss in consecutive passes.", ShortName = "tol")] | ||
public float Tol = 1e-4f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tol
=> Tolerance
? #Resolved
|
||
static Native() | ||
{ | ||
if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be nice to comment why this is only needed on Windows, and what makes it work on the other platforms. #Resolved
/// This is the state of a SymSGD learner that is shared between the managed and native code. | ||
/// </summary> | ||
[StructLayout(LayoutKind.Explicit)] | ||
public unsafe struct State |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public? I think it should be able to be made internal. #Resolved
public unsafe struct State | ||
{ | ||
#pragma warning disable 649 // never assigned | ||
[FieldOffset(0x00)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the need to explicitly list the offsets here? This forces us to never work on non-64-bit platforms.
Instead, we should be able to use [StructLayout(Sequential)]
and then just list the fields as necessary.
For example: https://github.com/dotnet/corefx/blob/ba496ae6a9ae055708b630b3f73395a76c2ce6ce/src/System.Drawing.Common/src/System/Drawing/Internal/GpPathData.cs#L9-L15 #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is a port it is probably out of scope to make this kind of change without also involving the authors of sysmsgd, instead I will open an issue and assign it to them. Does that sound good? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me. #Resolved
|
||
// This method learns for a single instance | ||
inline void LearnInstance(int instSize, int* instIndices, float* instValues, | ||
float label, float alpha, float l2Const, float piw, float& weightScaling, float* weightVector, float& bias) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) It would be nice to follow the coding style already established in the repo and put the opening {
on a new line:
inline void LearnInstance(int instSize, int* instIndices, float* instValues,
float label, float alpha, float l2Const, float piw, float& weightScaling, float* weightVector, float& bias)
{
//code
}
``` #Resolved
state->PassIteration++; | ||
} | ||
} else { | ||
#if defined(USE_OMP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in this case if someone passes in numThreads > 1
? Does it just no op? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. We have defined as part of arguments that multi-threaded is not supported.
In reply to: 206914794 [](ancestors = 206914794)
ch.CheckUserArg(numThreads > 0, nameof(_args.NumberOfThreads), | ||
"The number of threads must be either null or a positive integer."); | ||
|
||
ch.Assert(numThreads > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this redundant with the line just before it? #Resolved
|
||
<PropertyGroup> | ||
<TargetFramework>netstandard2.0</TargetFramework> | ||
<IncludeInPackage>Microsoft.ML.HalLearners</IncludeInPackage> | ||
<AllowUnsafeBlocks>true</AllowUnsafeBlocks> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true [](start = 3, length = 44)
think i removed this #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (int featIndex = 0; featIndex < numFeat; featIndex++) | ||
weightVector[featIndex] *= weightScaling; | ||
if (threadId == 0) | ||
weightScaling = 1.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 0, length = 6)
tabs in this file. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
int numFeatures = data.Schema.Feature.Type.VectorSize; | ||
var cursorFactory = new FloatLabelCursor.Factory(data, CursOpt.Label | CursOpt.Features | CursOpt.Weight); | ||
int numThreads = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an issue tracking turning NumberOfThreads
back on? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I will open it. We will need to enable this once openmp support comes in.
In reply to: 206964865 [](ancestors = 206964865)
find_library(MKL_LIBRARY libMklImports.so HINTS ${CMAKE_SOURCE_DIR}/../../packages/mlnetmkldeps/0.0.0.5/runtimes/linux-x64/native) | ||
SET(CMAKE_SKIP_BUILD_RPATH FALSE) | ||
SET(CMAKE_BUILD_WITH_INSTALL_RPATH FALSE) | ||
SET(CMAKE_INSTALL_RPATH "${CMAKE_SOURCE_DIR}/../../packages/mlnetmkldeps/0.0.0.5/runtimes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that this may work for unit tests, but it may not work on an end-user's machine. We will have to test this as an end user to verify. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. I will make sure to do this as part of bug bash.
In reply to: 206967492 [](ancestors = 206967492)
src/Native/build.proj
Outdated
@@ -90,6 +90,8 @@ | |||
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" /> | |||
<NativePackageAsset Include="$(NativeAssetsBuiltPath)\$(NativeLibPrefix)FactorizationMachineNative$(NativeLibExtension)" | |||
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" /> | |||
<NativePackageAsset Include="$(NativeAssetsBuiltPath)\$(NativeLibPrefix)SymSgdNative$(NativeLibExtension)" | |||
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This native binary shouldn't go in the Microsoft.ML
nuget package. Instead, it needs to go in the Microsoft.ML.HalLearners
nuget package. #Resolved
src/Native/build.proj
Outdated
@@ -90,6 +90,8 @@ | |||
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" /> | |||
<NativePackageAsset Include="$(NativeAssetsBuiltPath)\$(NativeLibPrefix)FactorizationMachineNative$(NativeLibExtension)" | |||
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" /> | |||
<NativePackageAsset Include="$(NativeAssetsBuiltPath)\$(NativeLibPrefix)SymSgdNative$(NativeLibExtension)" | |||
RelativePath="Microsoft.Hal\runtimes\$(PackageRid)\native" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real name here is: Microsoft.ML.HalLearners
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. After this gets merged, let's make sure this will work for our end users on all 3 platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change adds parallel SGD trainer but disables its multi-threading capabilities because of the lack of OpenMP support by the Clang compiler on linux and macOS build systems and MKL library. It also Supersedes PR #556 by squashing all the commits in one and rebasing with master.
fixes #623