Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename Microsoft.ML.StandardLearners #2792

Merged
merged 2 commits into from
Mar 11, 2019
Merged

Rename Microsoft.ML.StandardLearners #2792

merged 2 commits into from
Mar 11, 2019

Conversation

Potapy4
Copy link
Contributor

@Potapy4 Potapy4 commented Feb 28, 2019

Summary

I renamed StandardLearners to StandardTrainers. If I missed something, please let me know 👌
Fixes #2786


We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

@codemzs
Copy link
Member

codemzs commented Feb 28, 2019

@Potapy4 Thank you for your contribution. Our policy for working on a task is to make sure it isn’t assigned to anyone already. In this case it was assigned to me and that means I could already be working on it. Please feel free to assign this task to yourself but moving forward make sure to check no one is already working on a task before assigning it to yourself and only then start the work.

@TomFinley
Copy link
Contributor

TomFinley commented Feb 28, 2019

Hi @Potapy4 , thank you for working on this. Though, as @codemzs explains, we typically use the "assigned" field on issues to indicate that someone is actually actively working on it, which might lead to duplication of effort if someone else likewise takes on the work. But let's call that past praying for now. 😄

This test failure LoadEntryPointModel is an interesting one.

As we see of point 4 on this PR #970, we wrote the assembly name into the model file, so that we can find the loader signature later. This is on a whole a valuable change, and I think it will serve us well going forward. In this case it presents a difficulty, because in this case we are testing that our model loading is backwards compatible, as we do in several places. However those older models were saved with the old assembly name. It is here:

_ectx.AssertValue(env, "env");
_ectx.Assert(Reader.BaseStream.Position == FpMin + Header.FpModel);
var args = ConcatArgsRev(extra, this);
EnsureLoaderAssemblyIsRegistered(env.ComponentCatalog);
object tmp;
string sig = ModelHeader.GetLoaderSig(ref Header);
if (!string.IsNullOrWhiteSpace(sig) &&
ComponentCatalog.TryCreateInstance<object, TSig>(env, out tmp, sig, "", args))

I see several alternatives, of which I can identify two as possibly best. But between these I am not certain which is best. Note that this same difficulty @Potapy4 has faced here will crop up again when we rename the so-called HalLearner's assembly, as well as fold Microsoft.ML.Transforms into Microsoft.ML.Data.

  1. We take advantage of the fact that we are in preview mode (which is of course why we are undertaking all these breaking changes now, since we cannot do them later!), and say, "backwards compatibility with models built in preview are not a goal," and change the models in the test appropriately.

  2. Internal to our own code, we have code that detects one of the "obsolete" Microsoft.ML.* assemblies of interest (in this case so far there is only one, .StandardLearners) we replace it with the desirable target assembly. Having magic strings in our codebase is a bit undesirable, but might be more desirable than just flat out failing. It would be relatively straightforward to add to the code, and when the time comes and we rename HalLearners in the coming days it may be desirable.

I myself believe 2 is correct, but I am not certain. It would be relatively easy to do I think. The code that fails is here, when we ensure the stored assembly actually has its DI components properly detected and registered.

private void EnsureLoaderAssemblyIsRegistered(ComponentCatalog catalog)
{
if (!string.IsNullOrEmpty(LoaderAssemblyName))
{
var assembly = Assembly.Load(LoaderAssemblyName);
catalog.RegisterAssembly(assembly);
}
}

We could imagine an auxiliary property ForwardedLoaderAssemblyName of roughly this form (pseudocode, don't take literally)...

private string ForwardedLoaderAssemblyName
{ get {
switch (LoaderAssemblyName) {
case "Microsoft.ML.StandardLearners": return "Microsoft.ML.StandardTrainers";
default: return LoaderAssemblyName;
}
} }

Then we change the ensure loaded to work over this private ForwardedLoaderAssemblyName property, and when the time comes to rename the others, we have a mechanism in place to do so.

@TomFinley
Copy link
Contributor

In addition to people already involved on thread I'd welcome the thoughts of @eerhardt. He wrote this assembly storing code and might have anticipated this problem and already have a solution in mind. Of course anyone can participate.

@eerhardt
Copy link
Member

My initial reaction is to lean towards solution (1) above - we've been in preview mode for the past year, I don't think we've strictly guaranteed that we aren't going to break things from preview release to preview release. The API for sure has gone through vast breaking changes. Requiring people to re-build their models one more time might not be horrible.

That being said - this is the last time we get to say these things. Once v1.0 is shipped, there is absolutely no questions here - we need to support model compatibility.

However, solution (2) is so simple that it may be worth doing, the cost of adding that code one time, and keeping it forever, is probably way less than the effort it will take to triage bugs and questions about why their old model no longer works. If it was more complex code, I would rethink that position.

So, in the end, it is probably "worth" doing solution (2).

@TomFinley
Copy link
Contributor

TomFinley commented Feb 28, 2019

So, in the end, it is probably "worth" doing solution (2).

I think so too. So maybe we say, "OK, let's just have this assembly forwarding." Are you comfortable doing that @Potapy4 ?

A natural followup question (which we do not have to answer now) is what .NET team considers a breaking change in the API, whether if we ship this capability in v1.0 to read pre-release versions, we are under obligation to keep it, provided the version v1.0 remains available for whoever might want to convert models (by simply reading and writing them)? You mentioned "keeping it forever" which suggests you would consider it a breaking change. Yet, sometimes I observe some libraries and tools say, "look, we dropped backward compatibility models for version X in version Z, but version Y reads version X format and writes in a format Z understands." (Where X < Y < Z.) This is quite common, but like many common things may be wrong. (This is surely a sliding scale. By some very strict standards, even a change in ToString overload formatting could be considered a breaking change in an API. 😄 My expectation is that .NET team is very strict, but I wonder how much wiggle room a little library like ours might have.)

But that followup question may not be one we want to answer right now. I would be interested though in your perspective.

@eerhardt
Copy link
Member

eerhardt commented Feb 28, 2019

@Potapy4
Copy link
Contributor Author

Potapy4 commented Feb 28, 2019

Alright, first of all @codemzs I would like to apologize for taking this task from your plate - my bad 😔 next time I check the issues before I start working on them.

@TomFinley thanks for the good and detailed explanation about the tests. ❤️ I think we can use the second approach and make this "hack" right now. I also liked your pseudo-code, but I prefer if instead of switch - and I think that in this case if - this is more than enough. But if we gonna fix it later, do we have to somehow mark this place right now (I mean leave a link to issue or comment with TODO in code)?

@TomFinley
Copy link
Contributor

@TomFinley thanks for the good and detailed explanation about the tests. ❤️ I think we can use the second approach and make this "hack" right now. I also liked your pseudo-code, but I prefer if instead of switch - and I think that in this case if - this is more than enough. But if we gonna fix it later, do we have to somehow mark this place right now (I mean leave a link to issue or comment with TODO in code)?

Sounds fine thanks @Potapy4 . My reasoning for preferring the switch is I know for a fact that there will be two others in the immediate future that can use the same mechanism, but of course this can be changed at any time. So that sounds fine.

I wouldn't worry too much about a TODO, since I would consider this work to be part of the other assembly renaming work, which I will be reviewing no doubt.

@codemzs
Copy link
Member

codemzs commented Mar 1, 2019

@Potapy4 I'm seeing you still haven't assigned the associated issue to yourself.

@TomFinley
Copy link
Contributor

TomFinley commented Mar 1, 2019

@Potapy4 I'm seeing you still haven't assigned the associated issue to yourself.

I don't think people can assign issues to themselves that do not have write access but I could be wrong. I cannot assign him and I do have write access. @codemzs , can you do it? Because, I cannot, I am not sure he can, so since you seem to think it necessary maybe you can figure out what's up. Thanks!

@eerhardt
Copy link
Member

eerhardt commented Mar 1, 2019

https://help.github.com/en/articles/assigning-issues-and-pull-requests-to-other-github-users

If you have write access to a repository, you can assign issues and pull requests to yourself, collaborators on personal projects, or members of your organization with read permissions on the repository.

members of your organization is the key there. Non-members need to accept an invitation first before they can be assigned issues.

@codemzs
Copy link
Member

codemzs commented Mar 11, 2019

@Potapy4 I just forced pushed the changes into your branch so that we can close this PR soon as its been open for a while. @TomFinley Lets review and close this stuff ....

@codemzs codemzs self-requested a review March 11, 2019 06:53
Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @codemzs and @Potapy4 ! LGTM.

@codecov
Copy link

codecov bot commented Mar 11, 2019

Codecov Report

Merging #2792 into master will increase coverage by 0.01%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master    #2792      +/-   ##
==========================================
+ Coverage    71.8%   71.82%   +0.01%     
==========================================
  Files         812      812              
  Lines      142644   142649       +5     
  Branches    16090    16090              
==========================================
+ Hits       102432   102460      +28     
+ Misses      35828    35803      -25     
- Partials     4384     4386       +2
Flag Coverage Δ
#Debug 71.82% <50%> (+0.01%) ⬆️
#production 67.97% <50%> (+0.02%) ⬆️
#test 86.24% <ø> (ø) ⬆️
Impacted Files Coverage Δ
...tionMachine/FieldAwareFactorizationMachineUtils.cs 98.03% <ø> (ø)
...oft.ML.StandardTrainers/Standard/SdcaMultiClass.cs 92.22% <ø> (ø)
....StandardTrainers/Standard/LinearPredictorUtils.cs 26.71% <ø> (ø)
...L.StandardTrainers/Standard/Online/OnlineLinear.cs 79.43% <ø> (ø)
...ers/Standard/MultiClass/PairwiseCouplingTrainer.cs 88.6% <ø> (ø)
...LogisticRegression/MulticlassLogisticRegression.cs 67.46% <ø> (ø)
...rosoft.ML.StandardTrainers/Optimizer/LineSearch.cs 0% <ø> (ø)
.../Standard/LogisticRegression/LbfgsPredictorBase.cs 71.26% <ø> (ø)
test/Microsoft.ML.FSharp.Tests/SmokeTests.fs 96.07% <ø> (ø) ⬆️
...crosoft.ML.StandardTrainers/Optimizer/Optimizer.cs 73.33% <ø> (ø)
... and 29 more

docs/code/EntryPoints.md Outdated Show resolved Hide resolved
@codemzs codemzs self-assigned this Mar 11, 2019
@codemzs codemzs merged commit 005fe05 into dotnet:master Mar 11, 2019
@Potapy4 Potapy4 deleted the renaming branch March 12, 2019 14:42
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename Microsoft.ML.StandardLearners to Microsoft.ML.StandardTrainers.
4 participants