-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Moving IModelCombiner to Ensemble and related changes #1563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Move IModelCombiner out of Core to Ensemble since it clearly belongs there, not in Core. * Remove dependency of Ensemble on FastTree. * Remove learners in Ensemble having defaults of FastTree or indeed any learner. (Incidentally: fixes dotnet#682.) * Rename *FastTree* Ensemble to TreeEnsemble, so as to avoid namespace/type collisions with that type and Ensemble namespace. * Add dependency of FastTree to Ensemble project so something there can implement TreeEnsembleCombiner. * Resolve circular dependency of FastTree -> Ensemble -> StandardLearners -> Legacy -> FastTree by removing Legacy as dependency of StandardLearners, since no project we intend to keep should depend on Legacy. * Move Legacy specific infrastructure that somehow was in StandardLearners over to Legacy. * Fix documentation in StandardLearners that was incorrectly referring to the Legacy pipelines and types directly, since in reality they have nothing to do with the types in Legacy.
@@ -38,7 +38,6 @@ namespace Microsoft.ML.Runtime.Learners | |||
using TDistPredictor = IDistPredictorProducing<float, float>; | |||
using CR = RoleMappedSchema.ColumnRole; | |||
|
|||
/// <include file='doc.xml' path='doc/members/member[@name="OVA"]' /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// [](start = 4, length = 69)
Hi @sfilipi, I think you own this file? Despite its path, this documentation is very specific to ML.NET 0.1 pieplines (it is also linked in Legacy), so maybe we need something new here, or should update and re-link once we deprecate/delete legacy?
@@ -123,5 +125,38 @@ public static ApplyTransformModelOutput Apply(IHostEnvironment env, ApplyTransfo | |||
{ | |||
return new ApplyTransformModelOutput() { OutputData = input.TransformModel.Apply(env, input.Data) }; | |||
} | |||
|
|||
[TlcModule.EntryPoint(Name = "Models.OvaModelCombiner", Desc = "Combines a sequence of PredictorModels into a single model")] | |||
public static PredictorModelOutput CombineOvaModels(IHostEnvironment env, CombineOvaPredictorModelsInput input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my knowledge - what is the plan for these entry points that we plan on keeping? Do we remove all the "real legacy" stuff from ML.Legacy, and then rename the assembly?
Would this entry point be better served if it was in the Microsoft.ML.Ensemble
assembly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We've added some entry-points to legacy, but it's unclear to me whether these were intended to be legacy API entry-points only, or whether they were intended to be generally useful.
For example, we see the "model combiner" directly above this appears in NimbusML, and same for this one I just moved you're commenting on.
So both were published (because they're entry-points after all.) While the one I moved is not used there, the other one that's already here was, which is interesting.
So it is not going to be a problem for this specific code that I just moved, but it will definitely be a problem for these other things in this file. I've opened an issue #1565. It's not clear to me whether the usage of these entry-points in API was deliberate or a good idea, maybe someone that actually worked on NimbusML can comment more on this.
using System.Collections.Generic; | ||
using Microsoft.ML.Runtime; | ||
|
||
namespace Microsoft.ML.Runtime.Ensemble |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) I'd prefer if we started matching folders and namespaces as much as possible. It makes finding files easier (just like if the file name and the class name match).
This file is in the Trainer
folder, but not in a Trainer
namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed. I am making it consistent with the files here (so in a limited, myopic sense my action here was correct), but that does not change the fact that the system, such as it is, is slapdash and haphazard to the point where it's mostly futile to try to find anything without just a broad search. 😛 Let us open an issue on this, I will try to do so before I have to get the kids ready for school.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is not such a simple matter -- we first need to decide what those namespaces will be. Obviously it won't be Microsoft.ML.Runtime.Ensemble
, but what? Microsoft.ML.Ensemble
? Maybe there's already a proposal open for all I know. We have namespaces outlined for specific components I believe, but not a general principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems very reasonable to me that the assembly name and the root namespace match (that's the default in .csproj files). So Microsoft.ML.Ensemble
sounds like a good proposal to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this, or Microsoft.ML.Borborygmization
, I accept nothing else
In reply to: 231567175 [](ancestors = 231567175)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason the build did not even start. Going to close and reopen to tickle it. |
This is an elaborate series of changes that are, incredibly, actually related and strongly dependent on each other. The end result is positive, but how we got there was kind of a wild ride. Hearken to my tale.
Move IModelCombiner out of Core to Ensemble since it clearly belongs there,
not in Core.
Remove dependency of Ensemble on FastTree.
Remove learners in Ensemble having defaults of FastTree or indeed any
learner. (Incidentally: fixes Consider defaulting Ensemble Stacking to a trainer in StandardLearners #682.)
Rename FastTree Ensemble to TreeEnsemble, so as to avoid namespace/type
collisions with that type and Ensemble namespace.
Add dependency of FastTree to Ensemble project so something there can
implement TreeEnsembleCombiner.
Resolve circular dependency of FastTree -> Ensemble -> StandardLearners ->
Legacy -> FastTree by removing Legacy as dependency of StandardLearners,
since no project we intend to keep should depend on Legacy.
Move Legacy specific infrastructure that somehow was in StandardLearners
over to Legacy.
Fix documentation in StandardLearners that was incorrectly referring to the
Legacy pipelines and types directly, since in reality they have nothing to
do with the types in Legacy.