-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Seed property to MLContext and use as default for data splits #4775
Conversation
Should this just use Refers to: src/Microsoft.ML.Core/Environment/HostEnvironmentBase.cs:394 in 9bbccaf. [](commit_id = 9bbccaf, deletion_comment = False) |
Should Refers to: src/Microsoft.ML.Core/Environment/HostEnvironmentBase.cs:136 in 9bbccaf. [](commit_id = 9bbccaf, deletion_comment = False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI builds are giving the following errors:
2020-02-04T02:22:45.6483640Z /__w/1/s/packages/microsoft.dotnet.apicompat/1.0.0-beta.19225.5/build/Microsoft.DotNet.ApiCompat.targets(72,5): error : InterfacesShouldHaveSameMembers : Interface member 'Microsoft.ML.Runtime.IHostEnvironment.Seed' is present in the implementation but not in the contract. [/__w/1/s/src/Microsoft.ML.Core/Microsoft.ML.Core.csproj]
2020-02-04T02:22:45.6484602Z /__w/1/s/packages/microsoft.dotnet.apicompat/1.0.0-beta.19225.5/build/Microsoft.DotNet.ApiCompat.targets(72,5): error : InterfacesShouldHaveSameMembers : Interface member 'Microsoft.ML.Runtime.IHostEnvironment.Seed.get()' is present in the implementation but not in the contract. [/__w/1/s/src/Microsoft.ML.Core/Microsoft.ML.Core.csproj]
Perhaps you are forgetting to declare int Seed
and Seed's get()
function somewhere?
This is because you are adding a member to an interface, which is a breaking change. Interfaces, once shipped, can't be changed. #Resolved |
@@ -518,7 +517,7 @@ internal static void EnsureGroupPreservationColumn(IHostEnvironment env, ref IDa | |||
// instead of having two hash transformations. | |||
var origStratCol = samplingKeyColumn; | |||
samplingKeyColumn = data.Schema.GetTempColumnName(samplingKeyColumn); | |||
var columnOptions = new HashingEstimator.ColumnOptionsInternal(samplingKeyColumn, origStratCol, 30, (uint)(seed ?? host.Rand.Next())); | |||
var columnOptions = new HashingEstimator.ColumnOptionsInternal(samplingKeyColumn, origStratCol, 30, (uint)(seed ?? env.Seed)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env.Seed [](start = 135, length = 8)
This can also be null, can't it? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if env.Seed
is also null, it will fall back to the HashingEstimator
default seed:
machinelearning/src/Microsoft.ML.Data/Transforms/Hashing.cs
Lines 1129 to 1132 in 46e7dc6
internal static class Defaults | |
{ | |
public const int NumberOfBits = NumBitsLim - 1; | |
public const uint Seed = 314489979; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you are casting it to uint
here. If it's null it will throw an exception.
In reply to: 375521058 [](ancestors = 375521058)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you're right. Thanks for catching that @yaeldekel. I've changed the handling back to how it was before I added (uint)(seed ?? host.Rand.Next())
and added a case to handle seed from the environment. Order of precedence for seed now is (1) user specified to TrainTestSplit/CrossValidata, (2) MLContext seed, (3) default seed for HashingEstimator
#Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #4752
Addresses leftover feedback from #4764