Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Seed property to MLContext and use as default for data splits #4775

Merged
merged 8 commits into from
Feb 11, 2020

Conversation

najeeb-kazmi
Copy link
Member

Fixes #4752

Addresses leftover feedback from #4764

@najeeb-kazmi
Copy link
Member Author

            Random rand = (seed.HasValue) ? RandomUtils.Create(seed.Value) : RandomUtils.Create(_rand);

Should this just use _rand here rather than creating another TauswortheHybrid from _rand?


Refers to: src/Microsoft.ML.Core/Environment/HostEnvironmentBase.cs:394 in 9bbccaf. [](commit_id = 9bbccaf, deletion_comment = False)

@najeeb-kazmi
Copy link
Member Author

                Random rand = (seed.HasValue) ? RandomUtils.Create(seed.Value) : RandomUtils.Create(_rand);

Should HostBase use the same TauswortheHybrid as the parent environment or create another one using the parent _rand as seed?


Refers to: src/Microsoft.ML.Core/Environment/HostEnvironmentBase.cs:136 in 9bbccaf. [](commit_id = 9bbccaf, deletion_comment = False)

Copy link
Contributor

@mstfbl mstfbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI builds are giving the following errors:

2020-02-04T02:22:45.6483640Z /__w/1/s/packages/microsoft.dotnet.apicompat/1.0.0-beta.19225.5/build/Microsoft.DotNet.ApiCompat.targets(72,5): error : InterfacesShouldHaveSameMembers : Interface member 'Microsoft.ML.Runtime.IHostEnvironment.Seed' is present in the implementation but not in the contract. [/__w/1/s/src/Microsoft.ML.Core/Microsoft.ML.Core.csproj]
2020-02-04T02:22:45.6484602Z /__w/1/s/packages/microsoft.dotnet.apicompat/1.0.0-beta.19225.5/build/Microsoft.DotNet.ApiCompat.targets(72,5): error : InterfacesShouldHaveSameMembers : Interface member 'Microsoft.ML.Runtime.IHostEnvironment.Seed.get()' is present in the implementation but not in the contract. [/__w/1/s/src/Microsoft.ML.Core/Microsoft.ML.Core.csproj]

Perhaps you are forgetting to declare int Seed and Seed's get() function somewhere?

@eerhardt
Copy link
Member

eerhardt commented Feb 4, 2020

The CI builds are giving the following errors:

2020-02-04T02:22:45.6483640Z /__w/1/s/packages/microsoft.dotnet.apicompat/1.0.0-beta.19225.5/build/Microsoft.DotNet.ApiCompat.targets(72,5): error : InterfacesShouldHaveSameMembers : Interface member 'Microsoft.ML.Runtime.IHostEnvironment.Seed' is present in the implementation but not in the contract. [/__w/1/s/src/Microsoft.ML.Core/Microsoft.ML.Core.csproj]

This is because you are adding a member to an interface, which is a breaking change. Interfaces, once shipped, can't be changed. #Resolved

@najeeb-kazmi najeeb-kazmi requested a review from a team as a code owner February 4, 2020 23:15
@@ -518,7 +517,7 @@ internal static void EnsureGroupPreservationColumn(IHostEnvironment env, ref IDa
// instead of having two hash transformations.
var origStratCol = samplingKeyColumn;
samplingKeyColumn = data.Schema.GetTempColumnName(samplingKeyColumn);
var columnOptions = new HashingEstimator.ColumnOptionsInternal(samplingKeyColumn, origStratCol, 30, (uint)(seed ?? host.Rand.Next()));
var columnOptions = new HashingEstimator.ColumnOptionsInternal(samplingKeyColumn, origStratCol, 30, (uint)(seed ?? env.Seed));
Copy link

@yaeldekel yaeldekel Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env.Seed [](start = 135, length = 8)

This can also be null, can't it? #Resolved

Copy link
Member Author

@najeeb-kazmi najeeb-kazmi Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if env.Seed is also null, it will fall back to the HashingEstimator default seed:

internal static class Defaults
{
public const int NumberOfBits = NumBitsLim - 1;
public const uint Seed = 314489979;
#Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you are casting it to uint here. If it's null it will throw an exception.


In reply to: 375521058 [](ancestors = 375521058)

Copy link
Member Author

@najeeb-kazmi najeeb-kazmi Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you're right. Thanks for catching that @yaeldekel. I've changed the handling back to how it was before I added (uint)(seed ?? host.Rand.Next()) and added a case to handle seed from the environment. Order of precedence for seed now is (1) user specified to TrainTestSplit/CrossValidata, (2) MLContext seed, (3) default seed for HashingEstimator #Resolved

Copy link
Contributor

@justinormont justinormont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@najeeb-kazmi najeeb-kazmi merged commit 6b2d1a5 into dotnet:master Feb 11, 2020
@najeeb-kazmi najeeb-kazmi deleted the 4752_1 branch February 13, 2020 22:39
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default seed is not propagated from MLContext
6 participants