Skip to content

Port SymSGD trainer #624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 1, 2018
Merged

Port SymSGD trainer #624

merged 7 commits into from
Aug 1, 2018

Conversation

codemzs
Copy link
Member

@codemzs codemzs commented Aug 1, 2018

This change adds parallel SGD trainer but disables its multi-threading capabilities because of the lack of OpenMP support by the Clang compiler on linux and macOS build systems and MKL library. It also Supersedes PR #556 by squashing all the commits in one and rebasing with master.

fixes #623

@codemzs codemzs requested review from eerhardt and sfilipi August 1, 2018 12:07
@@ -98,3 +98,7 @@ set -x # turn on trace
cmake "$DIR" -G "Unix Makefiles" $__cmake_defines
set +x # turn off trace
make install
if [[ "$OSTYPE" == "darwin"* ]]; then
echo "Changing libMklImports.dylib's executable path within libSymSgdNative.dylib so that loader can find it."
install_name_tool "-change" @loader_path/libMklImports.dylib @rpath/libMklImports.dylib "$RootRepo"/bin/x64."$__configuration"/Native/libSymSgdNative.dylib
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work on an end-user's machine, when they use our NuGet packages? #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: you can do this stuff inside of CMake too. It may make more sense there, since you won't have to do things like "$RootRepo"/bin/x64."$__configuration"/Native. Plus then it can also be localized in the SymSgdNative\CmakeLists.txt, and this high-level build script doesn't need to know about individual assemblies we are building.

Here's an example: https://github.com/dotnet/corefx/blob/20bcc468b12119e12428cfd9cce50ef5d5b9cc13/src/Native/System.Security.Cryptography.Native/CMakeLists.txt#L56-L70


In reply to: 206888376 [](ancestors = 206888376)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double-checking, would the NuGet work?


In reply to: 206888376 [](ancestors = 206888376)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spoke about it offline and the insall_name_tool step can moved out of the script and be done when nuget is created. I'll test the nuget as end user during bug-bash.


In reply to: 206964367 [](ancestors = 206964367,206888376)

[TestCategory("Binary")]
public void BinaryClassifierSymSgdTest()
{
//Results sometimes go out of error tolerance on OS X.
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we open an issue to enable this test on macOS? #Resolved

Copy link
Member Author

@codemzs codemzs Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can but linear learners tend to differ on results. #Resolved

Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of the day, we still need test coverage even if the results differ. Just because running on a different OS produces different results/behavior doesn't mean we should test on that OS. #Resolved

Copy link
Member Author

@codemzs codemzs Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other test of SymSGD is enabled on macOS too. I can enable this as well just that roughly half the time it will fail because of error tolerance. #Resolved

Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just open an issue saying that we should enable this test on macOS. Then when we work on that issue, we can make it stable as well. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.


In reply to: 206948457 [](ancestors = 206948457)

@@ -11,6 +11,7 @@
<ProjectReference Include="..\..\src\Microsoft.ML.FastTree\Microsoft.ML.FastTree.csproj" />
<ProjectReference Include="..\..\src\Microsoft.ML.Maml\Microsoft.ML.Maml.csproj" />
<ProjectReference Include="..\..\src\Microsoft.ML.ResultProcessor\Microsoft.ML.ResultProcessor.csproj" />
<ProjectReference Include="..\..\src\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" />
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary now that SymSGD isn't in the StandardLearners project? #Resolved

/// <summary>
/// Tolerance for difference in average loss in consecutive passes.
/// </summary>
public float Tol { get; set; } = 0.0001f;
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a complete word: Tolerance. #Resolved

public int NumberOfIterations = 50;

[Argument(ArgumentType.AtMostOnce, HelpText = "Tolerance for difference in average loss in consecutive passes.", ShortName = "tol")]
public float Tol = 1e-4f;
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tol => Tolerance? #Resolved


static Native()
{
if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be nice to comment why this is only needed on Windows, and what makes it work on the other platforms. #Resolved

/// This is the state of a SymSGD learner that is shared between the managed and native code.
/// </summary>
[StructLayout(LayoutKind.Explicit)]
public unsafe struct State
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public? I think it should be able to be made internal. #Resolved

public unsafe struct State
{
#pragma warning disable 649 // never assigned
[FieldOffset(0x00)]
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the need to explicitly list the offsets here? This forces us to never work on non-64-bit platforms.

Instead, we should be able to use [StructLayout(Sequential)] and then just list the fields as necessary.

For example: https://github.com/dotnet/corefx/blob/ba496ae6a9ae055708b630b3f73395a76c2ce6ce/src/System.Drawing.Common/src/System/Drawing/Internal/GpPathData.cs#L9-L15 #Resolved

Copy link
Member Author

@codemzs codemzs Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is a port it is probably out of scope to make this kind of change without also involving the authors of sysmsgd, instead I will open an issue and assign it to them. Does that sound good? #Resolved

Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me. #Resolved


// This method learns for a single instance
inline void LearnInstance(int instSize, int* instIndices, float* instValues,
float label, float alpha, float l2Const, float piw, float& weightScaling, float* weightVector, float& bias) {
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) It would be nice to follow the coding style already established in the repo and put the opening { on a new line:

inline void LearnInstance(int instSize, int* instIndices, float* instValues,
 float label, float alpha, float l2Const, float piw, float& weightScaling, float* weightVector, float& bias)
{
    //code
}
``` #Resolved

state->PassIteration++;
}
} else {
#if defined(USE_OMP)
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in this case if someone passes in numThreads > 1? Does it just no op? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. We have defined as part of arguments that multi-threaded is not supported.


In reply to: 206914794 [](ancestors = 206914794)

ch.CheckUserArg(numThreads > 0, nameof(_args.NumberOfThreads),
"The number of threads must be either null or a positive integer.");

ch.Assert(numThreads > 0);
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant with the line just before it? #Resolved


<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<IncludeInPackage>Microsoft.ML.HalLearners</IncludeInPackage>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
Copy link
Member

@sfilipi sfilipi Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true [](start = 3, length = 44)

think i removed this #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need it for symsgd.


In reply to: 206926582 [](ancestors = 206926582)

for (int featIndex = 0; featIndex < numFeat; featIndex++)
weightVector[featIndex] *= weightScaling;
if (threadId == 0)
weightScaling = 1.0f;
Copy link
Member

@sfilipi sfilipi Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  				 [](start = 0, length = 6)

tabs in this file. #Resolved

sfilipi
sfilipi previously approved these changes Aug 1, 2018
Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi dismissed their stale review August 1, 2018 15:49

revoking review

{
int numFeatures = data.Schema.Feature.Type.VectorSize;
var cursorFactory = new FloatLabelCursor.Factory(data, CursOpt.Label | CursOpt.Features | CursOpt.Weight);
int numThreads = 1;
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue tracking turning NumberOfThreads back on? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I will open it. We will need to enable this once openmp support comes in.


In reply to: 206964865 [](ancestors = 206964865)

find_library(MKL_LIBRARY libMklImports.so HINTS ${CMAKE_SOURCE_DIR}/../../packages/mlnetmkldeps/0.0.0.5/runtimes/linux-x64/native)
SET(CMAKE_SKIP_BUILD_RPATH FALSE)
SET(CMAKE_BUILD_WITH_INSTALL_RPATH FALSE)
SET(CMAKE_INSTALL_RPATH "${CMAKE_SOURCE_DIR}/../../packages/mlnetmkldeps/0.0.0.5/runtimes")
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this may work for unit tests, but it may not work on an end-user's machine. We will have to test this as an end user to verify. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. I will make sure to do this as part of bug bash.


In reply to: 206967492 [](ancestors = 206967492)

@@ -90,6 +90,8 @@
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" />
<NativePackageAsset Include="$(NativeAssetsBuiltPath)\$(NativeLibPrefix)FactorizationMachineNative$(NativeLibExtension)"
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" />
<NativePackageAsset Include="$(NativeAssetsBuiltPath)\$(NativeLibPrefix)SymSgdNative$(NativeLibExtension)"
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" />
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This native binary shouldn't go in the Microsoft.ML nuget package. Instead, it needs to go in the Microsoft.ML.HalLearners nuget package. #Resolved

@@ -90,6 +90,8 @@
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" />
<NativePackageAsset Include="$(NativeAssetsBuiltPath)\$(NativeLibPrefix)FactorizationMachineNative$(NativeLibExtension)"
RelativePath="Microsoft.ML\runtimes\$(PackageRid)\native" />
<NativePackageAsset Include="$(NativeAssetsBuiltPath)\$(NativeLibPrefix)SymSgdNative$(NativeLibExtension)"
RelativePath="Microsoft.Hal\runtimes\$(PackageRid)\native" />
Copy link
Member

@eerhardt eerhardt Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real name here is: Microsoft.ML.HalLearners #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was on accident.


In reply to: 206970539 [](ancestors = 206970539)

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. After this gets merged, let's make sure this will work for our end users on all 3 platforms.

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codemzs codemzs merged commit a7a15ff into dotnet:master Aug 1, 2018
@eerhardt eerhardt mentioned this pull request Aug 1, 2018
@codemzs codemzs deleted the symsgd-clean branch August 3, 2018 06:03
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port SymSGD
3 participants