Skip to content

Moving IModelCombiner to Ensemble and related changes #1563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 7, 2018

Conversation

TomFinley
Copy link
Contributor

This is an elaborate series of changes that are, incredibly, actually related and strongly dependent on each other. The end result is positive, but how we got there was kind of a wild ride. Hearken to my tale.

  • Move IModelCombiner out of Core to Ensemble since it clearly belongs there,
    not in Core.

  • Remove dependency of Ensemble on FastTree.

  • Remove learners in Ensemble having defaults of FastTree or indeed any
    learner. (Incidentally: fixes Consider defaulting Ensemble Stacking to a trainer in StandardLearners #682.)

  • Rename FastTree Ensemble to TreeEnsemble, so as to avoid namespace/type
    collisions with that type and Ensemble namespace.

  • Add dependency of FastTree to Ensemble project so something there can
    implement TreeEnsembleCombiner.

  • Resolve circular dependency of FastTree -> Ensemble -> StandardLearners ->
    Legacy -> FastTree by removing Legacy as dependency of StandardLearners,
    since no project we intend to keep should depend on Legacy.

  • Move Legacy specific infrastructure that somehow was in StandardLearners
    over to Legacy.

  • Fix documentation in StandardLearners that was incorrectly referring to the
    Legacy pipelines and types directly, since in reality they have nothing to
    do with the types in Legacy.

* Move IModelCombiner out of Core to Ensemble since it clearly belongs there,
  not in Core.

* Remove dependency of Ensemble on FastTree.

* Remove learners in Ensemble having defaults of FastTree or indeed any
  learner. (Incidentally: fixes dotnet#682.)

* Rename *FastTree* Ensemble to TreeEnsemble, so as to avoid namespace/type
  collisions with that type and Ensemble namespace.

* Add dependency of FastTree to Ensemble project so something there can
  implement TreeEnsembleCombiner.

* Resolve circular dependency of FastTree -> Ensemble -> StandardLearners ->
  Legacy -> FastTree by removing Legacy as dependency of StandardLearners,
  since no project we intend to keep should depend on Legacy.

* Move Legacy specific infrastructure that somehow was in StandardLearners
  over to Legacy.

* Fix documentation in StandardLearners that was incorrectly referring to the
  Legacy pipelines and types directly, since in reality they have nothing to
  do with the types in Legacy.
@@ -38,7 +38,6 @@ namespace Microsoft.ML.Runtime.Learners
using TDistPredictor = IDistPredictorProducing<float, float>;
using CR = RoleMappedSchema.ColumnRole;

/// <include file='doc.xml' path='doc/members/member[@name="OVA"]' />
Copy link
Contributor Author

@TomFinley TomFinley Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// [](start = 4, length = 69)

Hi @sfilipi, I think you own this file? Despite its path, this documentation is very specific to ML.NET 0.1 pieplines (it is also linked in Legacy), so maybe we need something new here, or should update and re-link once we deprecate/delete legacy?

@TomFinley TomFinley changed the title Moving IModelCombiner to Ensemble and subsequent adventures Moving IModelCombiner to Ensemble and related changes Nov 7, 2018
@@ -123,5 +125,38 @@ public static ApplyTransformModelOutput Apply(IHostEnvironment env, ApplyTransfo
{
return new ApplyTransformModelOutput() { OutputData = input.TransformModel.Apply(env, input.Data) };
}

[TlcModule.EntryPoint(Name = "Models.OvaModelCombiner", Desc = "Combines a sequence of PredictorModels into a single model")]
public static PredictorModelOutput CombineOvaModels(IHostEnvironment env, CombineOvaPredictorModelsInput input)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my knowledge - what is the plan for these entry points that we plan on keeping? Do we remove all the "real legacy" stuff from ML.Legacy, and then rename the assembly?

Would this entry point be better served if it was in the Microsoft.ML.Ensemble assembly?

Copy link
Contributor Author

@TomFinley TomFinley Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We've added some entry-points to legacy, but it's unclear to me whether these were intended to be legacy API entry-points only, or whether they were intended to be generally useful.

For example, we see the "model combiner" directly above this appears in NimbusML, and same for this one I just moved you're commenting on.

So both were published (because they're entry-points after all.) While the one I moved is not used there, the other one that's already here was, which is interesting.

So it is not going to be a problem for this specific code that I just moved, but it will definitely be a problem for these other things in this file. I've opened an issue #1565. It's not clear to me whether the usage of these entry-points in API was deliberate or a good idea, maybe someone that actually worked on NimbusML can comment more on this.

using System.Collections.Generic;
using Microsoft.ML.Runtime;

namespace Microsoft.ML.Runtime.Ensemble
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I'd prefer if we started matching folders and namespaces as much as possible. It makes finding files easier (just like if the file name and the class name match).

This file is in the Trainer folder, but not in a Trainer namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed. I am making it consistent with the files here (so in a limited, myopic sense my action here was correct), but that does not change the fact that the system, such as it is, is slapdash and haphazard to the point where it's mostly futile to try to find anything without just a broad search. 😛 Let us open an issue on this, I will try to do so before I have to get the kids ready for school.

Copy link
Contributor Author

@TomFinley TomFinley Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is not such a simple matter -- we first need to decide what those namespaces will be. Obviously it won't be Microsoft.ML.Runtime.Ensemble, but what? Microsoft.ML.Ensemble? Maybe there's already a proposal open for all I know. We have namespaces outlined for specific components I believe, but not a general principle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems very reasonable to me that the assembly name and the root namespace match (that's the default in .csproj files). So Microsoft.ML.Ensemble sounds like a good proposal to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this, or Microsoft.ML.Borborygmization, I accept nothing else


In reply to: 231567175 [](ancestors = 231567175)

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@TomFinley
Copy link
Contributor Author

For some reason the build did not even start. Going to close and reopen to tickle it.

@TomFinley TomFinley closed this Nov 7, 2018
@TomFinley TomFinley reopened this Nov 7, 2018
@TomFinley TomFinley merged commit d3b70b5 into dotnet:master Nov 7, 2018
@TomFinley TomFinley deleted the tfinley/MoveModelCombiner branch November 7, 2018 22:26
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider defaulting Ensemble Stacking to a trainer in StandardLearners
3 participants