Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create methods not being called when loading models from disk #4385

Closed
antoniovs1029 opened this issue Oct 25, 2019 · 12 comments · Fixed by #4485
Closed

Create methods not being called when loading models from disk #4385

antoniovs1029 opened this issue Oct 25, 2019 · 12 comments · Fixed by #4485
Assignees

Comments

@antoniovs1029
Copy link
Member

There are some classes that define a Create method which is supposed to be called when loading a model from disk, but the problem is that the method is not being called at all.

For example, I've noticed this unexpected behavior in the following classes:

  1. LinearBinaryModelParameters
  2. GamBinaryModelParameters
  3. FastTreeBinaryModelParameters
  4. FastForestBinaryModelParameters
  5. PlattCalibrator

when loading a model that uses any of these classes, their Create method is expected to be called, but, as stated, the method is not being called.

I also noticed this behavior in 3 other classes of the Calibrator.cs file (namely, the ParameterMixingCalibratedModelParameters, ValueMapperCalibratedModelParameters, and FeatureWeightsCalibratedModelParameters classes), but I've fixed this problem for those specific classes in my recent PR #4306 (which, as of the moment of writing this, is still waiting to get approved). It was while working on that PR that I noticed this problem on these classes, and I commented about it there... but it is appropriate to open this separate issue to better document this, since it is a problem that affects different classes across different files.

In fact, as I will describe below, there's a certain code pattern that is related to this problem, and I've seen this pattern in other classes of the Calibrator.cs file as well. So, the problem I describe here might be affecting even more classes than the ones I've mentioned.

Cause of the problem

The CreateInstanceCore method in the ComponentCatalog.cs file is responsible to try to call the Create method of any class when loading a model from disk.

The current implementation of the method actually checks first if the class has a constructor with parameters (IHostEnvironment env, ModelLoadContext ctx) and invokes that constructor through reflection. If the class doesn't have such a constructor, then it checks if it has a Create((IHostEnvironment env, ModelLoadContext ctx) method, and it gets invoked through reflection.

This behavior is not desired for the classes I've mentioned (and potentially other classes), since they define both a constructor and a Create method with those parameters, but in these cases it's actually expected that the Create method gets called instead of the constructor. Thus, if a class follows the pattern of having a private or internal constructor (with the (env, ctx) parameters) and also has a Create method, then this problem might also be affecting that class.

Since the Create method typically only runs some security checks before calling the constructor, it turns out that the overall process of loading models doesn't seem affected by this issue. But the problem remains that these security checks are being missed along with whatever behavior the Create method adds to the process.

As explained by @yaeldekel in her comment on my recent PR #4306 (under "Answer 1"), this problem might had been introduced before the official release of ML.net, when the ComponentCatalog method was modified in a way that permitted the CreateInstanceCore method to use private and internal constructors, which didn't use to happen... so before those changes were made, classes could have private or internal constructors and a Create method, and the latter would appropriately be called. But now the constructor gets called, and this is the case of all the classes mentioned in this issue.

Since these changes were made while trying to internalize as many APIs as possible before the ML.net official release, many constructors where also made private or internal, and thus the changes in ComponentCatalog that permit using those constructors are also necessary.

Because of these, further investigation is needed to know for sure which classes are being affected by this problem, so to better find a way to fix this problem without affecting all of the other classes that doesn't present this situation.

@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Nov 4, 2019

So I've done a couple of experiments regarding this issue and these are my findings.

Experiment 1: Swaping the create and constructor order in ComponentCatalog.cs

When registering an assembly, the current implementation in ComponentCatalog.cs goes through every loadable class in the assembly and calls TryGetIniters to find the way in which objects of the given class should be loaded. It tries to find an "InstanceGetter", and if it doesn't find it, then goes on to try to find a Constructor of a given parameter list parmTypes or parmTypesWithEnv, and if it doesn't find it, then goes on to try to find a Create method with parameter list parmTypes or parmTypesWithEnv. Notice that if it finds a Constructor, then it doesn't look for a create method. Later on, when loading an specefic model from disk, the CreateInstanceCore method tries to call the appropriate method for loading each specific class used inside the model, it first tries to invoke the InstanceGetter, then a Constructor, and then a Create method, depending on what was found during the TryGetIniters step.

So a simple way to solve this whole issue is to swap the order in which things are done. So in this git branch I modified TryGetIniters so it looks first for a Create method instead of a constructor. Then I ran all the existing tests in the repo and all of them passed. And I also looked into specific cases where the 5 classes I've mentioned in the first post in this issue are loaded from disk, and now they called their Create method as intended.

Notice that, although it was unnecesary, I also modified the CreateInstanceCore method to check first for a Create method instead of a Constructor. The reason it's unnecesary to swap it here, is because in TryGetIniters, it first looks for a Create method, and if there's one, then it doesn't look for a constructor, and the constructor is set to be null. Nonetheless, I also modified the CreateInstanceCore method for consistency throughout the codebase. I also modified the LoadableClassInfo constructor, which also followed this pattern of checking first for a getter, then for a constructor, and then for a create method, although, again, it wouldn't be necessary to change the order in here.

[Question 1.1] If there are other places where this pattern is used, please let me know so I can be aware of them.

My conclusion is that this simple change seems to solve the problem. But I would still want to ask if there was a reason to first look for constructors, and then create methods, or if this order didn't mean anything.

[Question 1.2] Is it okay, then, if I simply solve this problem by swaping the order as I've just described?

Experiment 2: Finding how many classes were affected by this issue

In another experiment in this other git branch I modified the TryGetIniters method so that it would try to find both a constructor and a create method, and log the name of the class if it found both. I then ran all the existing tests and I found that 75 different classes had both a constructor and a create method that could be used by CreateInstanceCore.

Although I haven't manually checked if the issue here described affected all of the 75 classes, I guess it's safe to think that it did, and that the create method of these 75 classes was being ignored. I also haven't looked at each class individually, but after selecting some at random, I could confirm that they had both constructor and create method. The ones I checked all followed the same pattern of having a create method that did some security checks and that then called the private constructor.

If requested I can personally look at each of the classes, to see if any of them present particular behaviors in their create methods that don't follow the pattern I've mentioned. In fact, 4 of the 5 classes I mentioned in the first post of this issue, do have some extra code in their create methods that doesn't follow this pattern, but, as described in issue #4381, it might be the case that this code simply needs to be erased because it was not updated properly when releasing the first version of ML.net. Perhaps other classes also have the same problem of having code that needs to be erased or updated in their create method.

[Question 2] Should I check individually all of these 75 classes' create methods? For what purpose?

At the end of this post I list the 75 classes that I found on this experiment.

Experiment 3: Removing the Instance Getters

Notice that in the TryGetIniters method of experiment 2, I also added code to throw an exception if a class had an "InstanceGetter". After running all the existing tests, none of them threw the exception.

So then I ran another experiment completly removing the code in TryGetIniters that set the 'getter' of a loadable class, and then I ran all the tests, confirming that none of them failed.

So my conclusion is that none of the tests actually loads a model from disk that uses an "instance getter" to load one of its classes. And perhaps it might be the case that no class actually implements an "InstanceGetter".

[Question 3.1] Why do "instance getters" exist? Are there classes that actually use them when loading?
[Queston 3.2] Should they be removed as it seems they're not used?

75 classes thought to be affected by this bug

Microsoft.ML.Data.BinaryPerInstanceEvaluator
Microsoft.ML.Data.ClusteringPerInstanceEvaluator
Microsoft.ML.Data.MulticlassPerInstanceEvaluator
Microsoft.ML.Data.MultiOutputRegressionPerInstanceEvaluator
Microsoft.ML.Data.QuantileRegressionPerInstanceEvaluator
Microsoft.ML.Data.RankingPerInstanceTransform
Microsoft.ML.Data.RegressionPerInstanceEvaluator
Microsoft.ML.Calibrators.PlattCalibrator
Microsoft.ML.Calibrators.IsotonicCalibrator
Microsoft.ML.Calibrators.NaiveCalibrator
Microsoft.ML.Calibrators.ValueMapperCalibratedModelParameters`2[Microsoft.ML.IPredictorProducing`1[System.Single],Microsoft.ML.Calibrators.ICalibrator]
Microsoft.ML.Calibrators.FeatureWeightsCalibratedModelParameters`2[Microsoft.ML.IPredictorProducing`1[System.Single],Microsoft.ML.Calibrators.ICalibrator]
Microsoft.ML.Calibrators.ParameterMixingCalibratedModelParameters`2[Microsoft.ML.IPredictorProducing`1[System.Single],Microsoft.ML.Calibrators.ICalibrator]
Microsoft.ML.Calibrators.SchemaBindableCalibratedModelParameters`2[Microsoft.ML.IPredictorProducing`1[System.Single],Microsoft.ML.Calibrators.ICalibrator]
Microsoft.ML.Data.SchemaBindablePredictorWrapper
Microsoft.ML.Data.SchemaBindableQuantileRegressionPredictor
Microsoft.ML.Data.SchemaBindableBinaryPredictorWrapper
Microsoft.ML.Transforms.FeatureContributionCalculatingTransformer
Microsoft.ML.Transforms.LabelIndicatorTransform
Microsoft.ML.Transforms.SkipTakeFilter
Microsoft.ML.Transforms.SlotsDroppingTransformer
Microsoft.ML.Transforms.GaussianKernel+RandomNumberGenerator
Microsoft.ML.Transforms.LaplacianKernel+RandomNumberGenerator
Microsoft.ML.Transforms.MissingValueDroppingTransformer
Microsoft.ML.Transforms.MissingValueIndicatorTransformer
Microsoft.ML.Trainers.FieldAwareFactorizationMachineModelParameters
Microsoft.ML.Trainers.FieldAwareFactorizationMachinePredictionTransformer
Microsoft.ML.Trainers.LinearBinaryModelParameters
Microsoft.ML.Trainers.LinearRegressionModelParameters
Microsoft.ML.Trainers.PoissonRegressionModelParameters
Microsoft.ML.Trainers.LinearMulticlassModelParameters
Microsoft.ML.Trainers.NaiveBayesMulticlassModelParameters
Microsoft.ML.Trainers.OneVersusAllModelParameters
Microsoft.ML.Trainers.PairwiseCouplingModelParameters
Microsoft.ML.Trainers.RandomModelParameters
Microsoft.ML.Trainers.PriorModelParameters
Microsoft.ML.Trainers.Recommender.MatrixFactorizationModelParameters
Microsoft.ML.Trainers.Recommender.MatrixFactorizationPredictionTransformer
Microsoft.ML.Dnn.ImageClassificationModelParameters
Microsoft.ML.Trainers.FastTree.FastTreeBinaryModelParameters
Microsoft.ML.Trainers.FastTree.FastTreeRankingModelParameters
Microsoft.ML.Trainers.FastTree.FastTreeRegressionModelParameters
Microsoft.ML.Trainers.FastTree.FastTreeTweedieModelParameters
Microsoft.ML.Trainers.FastTree.GamBinaryModelParameters
Microsoft.ML.Trainers.FastTree.GamRegressionModelParameters
Microsoft.ML.Trainers.FastTree.FastForestBinaryModelParameters
Microsoft.ML.Trainers.FastTree.FastForestRegressionModelParameters
Microsoft.ML.Trainers.FastTree.TreeEnsembleFeaturizationTransformer
Microsoft.ML.Trainers.Ensemble.Average
Microsoft.ML.Trainers.Ensemble.Median
Microsoft.ML.Trainers.Ensemble.MultiAverage
Microsoft.ML.Trainers.Ensemble.MultiMedian
Microsoft.ML.Trainers.Ensemble.MultiStacking
Microsoft.ML.Trainers.Ensemble.MultiVoting
Microsoft.ML.Trainers.Ensemble.MultiWeightedAverage
Microsoft.ML.Trainers.Ensemble.RegressionStacking
Microsoft.ML.Trainers.Ensemble.Stacking
Microsoft.ML.Trainers.Ensemble.Voting
Microsoft.ML.Trainers.Ensemble.WeightedAverage
Microsoft.ML.Trainers.Ensemble.EnsembleDistributionModelParameters
Microsoft.ML.Trainers.Ensemble.EnsembleModelParameters
Microsoft.ML.Trainers.Ensemble.EnsembleMulticlassModelParameters
Microsoft.ML.Trainers.KMeansModelParameters
Microsoft.ML.Trainers.PcaModelParameters
Microsoft.ML.Transforms.TimeSeries.IidChangePointDetector
Microsoft.ML.Transforms.TimeSeries.IidSpikeDetector
Microsoft.ML.Transforms.TimeSeries.SrCnnAnomalyDetector
Microsoft.ML.Transforms.TimeSeries.SsaChangePointDetector
Microsoft.ML.Transforms.TimeSeries.SsaForecastingTransformer
Microsoft.ML.Transforms.TimeSeries.SsaSpikeDetector
Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters
Microsoft.ML.Trainers.LightGbm.LightGbmRankingModelParameters
Microsoft.ML.Trainers.LightGbm.LightGbmRegressionModelParameters
Microsoft.ML.Trainers.OlsModelParameters
Microsoft.ML.Transforms.VectorWhiteningTransformer

@yaeldekel
Copy link

Hi @antoniovs1029, thanks for the thorough investigation!
In one of the previous discussions, @eerhardt mentioned that it might be a good idea to make ComponentCatalog throw an exception if there are both a Create method and a constructor with the same signature, since theoretically there is no way to know which one of these method is intended to be called. Currently, in all the components I believe the intention is for the Create method to be called, but that may not always be the case.
Adding this exception would be a pretty big change though, since you would have to change the constructor to have a different signature in each of these 75 classes above.

Regarding the Instance getters, I think it would be safe to remove, it's legacy code, and I'm not even sure that we ever had components that used that (it may have been there 'just in case').

@eerhardt
Copy link
Member

eerhardt commented Nov 6, 2019

I found that 75 different classes had both a constructor and a create method that could be used by CreateInstanceCore.

See #922 and then subsequently #963. These two changes adversely affected this situation. It used to be the case that only public constructors and Create methods were located and invoked. Thus a common pattern was to have a public Create method and a private constructor, and the public Create method was the only one found by the ComponentCatalog.

@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Nov 6, 2019

@yaeldekel Thanks for your reply.

I am still interested to know your opinion on Experiment 1. Would you think it's okay to simply swap the order in which create methods and constructors get invoked?

About @eerhardt suggestion about the exception, I think what he meant is that I should add an exception to see what classes were being affected by this problem: i.e. that I should add it only for experimental purposes and then remove it. I believe he didn't mean that I should add the exception and leave it to then be merged into the master branch, precisely because it would be a big change. I did try to add such an exception, but it wasn't very informative, because the tests that loaded from disk would crash as soon as they tried to load the first class that had both create and constructor... and so I found the approach of Experiment 2 to give me a more comprehensive and practical list.

@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Nov 6, 2019

@eerhardt

See #922 and then subsequently #963. These two changes adversely affected this situation. It used to be the case that only public constructors and Create methods were located and invoked. Thus a common pattern was to have a public Create method and a private constructor, and the public Create method was the only one found by the ComponentCatalog.

Yes, I understand this is why this have become an issue. I only pointed at the 75 classes I found, just to have a general idea of how many classes were affected by it, since it seems to me that in all of them the create method would have been ignored given the changes made in the PRs you mentioned.

Still my question remains if it would be ok to simply change the order in which ComponentCatalog does things, as I did in my Experiment 1.

Thanks for pointing directly to those PRs, though. I see it would be useful to have a reference to those PRs in here.

@eerhardt
Copy link
Member

eerhardt commented Nov 6, 2019

Still my question remains if it would be ok to simply change the order in which ComponentCatalog does things, as I did in my Experiment 1.

My assumption would be to match the previous behavior - if both are found, use the one that is public. If they are both of the same visibility, then throw an exception.

About @eerhardt suggestion about the exception, I think what he meant is that I should add an exception to see what classes were being affected by this problem: i.e. that I should add it only for experimental purposes and then remove it. I believe he didn't mean that I should add the exception and leave it to then be merged into the master branch, precisely because it would be a big change.

I did intend for the exception check to get merged into the master branch once we have fixed everything. That will ensure we don't get into a similar situation.

@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Nov 6, 2019

Oh, sorry for the misunderstanding. I will follow your suggestions asap, and see what happens.

About Experiment 3, on removing the instance getters, do you ( @eerhardt ) have any opinions? Since yael's answer was that:

I think it would be safe to remove, it's legacy code, and I'm not even sure that we ever had components that used that (it may have been there 'just in case').

I am not sure if I should remove them since they seem to be legacy code, or to leave them "just in case". It's my understanding that not all of TLC code has been ported to ML.net, so perhaps in the future they might become needed?

@eerhardt
Copy link
Member

eerhardt commented Nov 6, 2019

On the instance getters, I'm not sure. No matter what you do, it should be separate from the Create method changes.

@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Nov 6, 2019

My assumption would be to match the previous behavior - if both are found, use the one that is public. If they are both of the same visibility, then throw an exception.

Well, I did that and then I got several exceptions when running the tests. So I then decided to make the following experiment.

Experiment 4: Finding the classes whose create and constructor methods share the same visibility

I made another experiment similar to Experiment 2, where I would log whenever a loadable class has both its create and constructor method with the same visibility (i.e. Public vs. Non-Public visibility). The code of the experiment can be found here.

The result is that there are 2 classes whose create method and constructor are public, and 52 classes whose create method and constructor are non-public. So further work will need to be done to know which method to call.

Unless someone suggests something better, the only way I can see of solving this is to manually go through those 54 classes, and see if they follow the pattern where the create method only does security checks and then calls the constructor... Having a quick look at them, it seems this is the case, and then swapping the order in which ComponentCatalog calls the Create or the creator method might be enough to fix the bug for these classes. If any class diverges from this pattern, then something else should be done in those cases.

[Question 4.1] Should I follow this plan of manually looking into these classes to decide if it is better to simply swap the order in which ComponentCatalog does things as I did in Experiment 1?

[Question 4.2] So far I've been assuming that Create methods are only used by the CreateInstanceCore method... are there other places (or tools, such as AutoML or ModelBuilder) that use them? If this is the case I would like to know more about how this affect this issue.

2 classes that have both Public create and constructor methods

Microsoft.ML.Transforms.LabelIndicatorTransform
Microsoft.ML.Transforms.SkipTakeFilter

52 classes that have both Non-Public create and constructor methods.

Microsoft.ML.Calibrators.PlattCalibrator
Microsoft.ML.Calibrators.IsotonicCalibrator
Microsoft.ML.Calibrators.NaiveCalibrator
Microsoft.ML.Calibrators.ValueMapperCalibratedModelParameters`2[Microsoft.ML.IPredictorProducing`1[System.Single],Microsoft.ML.Calibrators.ICalibrator]
Microsoft.ML.Calibrators.FeatureWeightsCalibratedModelParameters`2[Microsoft.ML.IPredictorProducing`1[System.Single],Microsoft.ML.Calibrators.ICalibrator]
Microsoft.ML.Calibrators.ParameterMixingCalibratedModelParameters`2[Microsoft.ML.IPredictorProducing`1[System.Single],Microsoft.ML.Calibrators.ICalibrator]
Microsoft.ML.Calibrators.SchemaBindableCalibratedModelParameters`2[Microsoft.ML.IPredictorProducing`1[System.Single],Microsoft.ML.Calibrators.ICalibrator]
Microsoft.ML.Transforms.FeatureContributionCalculatingTransformer
Microsoft.ML.Transforms.SlotsDroppingTransformer
Microsoft.ML.Transforms.GaussianKernel+RandomNumberGenerator
Microsoft.ML.Transforms.LaplacianKernel+RandomNumberGenerator
Microsoft.ML.Transforms.MissingValueDroppingTransformer
Microsoft.ML.Transforms.MissingValueIndicatorTransformer
Microsoft.ML.Trainers.FieldAwareFactorizationMachineModelParameters
Microsoft.ML.Trainers.FieldAwareFactorizationMachinePredictionTransformer
Microsoft.ML.Trainers.LinearBinaryModelParameters
Microsoft.ML.Trainers.LinearRegressionModelParameters
Microsoft.ML.Trainers.PoissonRegressionModelParameters
Microsoft.ML.Trainers.LinearMulticlassModelParameters
Microsoft.ML.Trainers.NaiveBayesMulticlassModelParameters
Microsoft.ML.Trainers.OneVersusAllModelParameters
Microsoft.ML.Trainers.PairwiseCouplingModelParameters
Microsoft.ML.Trainers.RandomModelParameters
Microsoft.ML.Trainers.PriorModelParameters
Microsoft.ML.Trainers.Recommender.MatrixFactorizationModelParameters
Microsoft.ML.Trainers.Recommender.MatrixFactorizationPredictionTransformer
Microsoft.ML.Vision.ImageClassificationModelParameters
Microsoft.ML.Trainers.FastTree.FastTreeBinaryModelParameters
Microsoft.ML.Trainers.FastTree.FastTreeRankingModelParameters
Microsoft.ML.Trainers.FastTree.FastTreeRegressionModelParameters
Microsoft.ML.Trainers.FastTree.FastTreeTweedieModelParameters
Microsoft.ML.Trainers.FastTree.GamBinaryModelParameters
Microsoft.ML.Trainers.FastTree.GamRegressionModelParameters
Microsoft.ML.Trainers.FastTree.FastForestBinaryModelParameters
Microsoft.ML.Trainers.FastTree.FastForestRegressionModelParameters
Microsoft.ML.Trainers.FastTree.TreeEnsembleFeaturizationTransformer
Microsoft.ML.Trainers.Ensemble.EnsembleDistributionModelParameters
Microsoft.ML.Trainers.Ensemble.EnsembleModelParameters
Microsoft.ML.Trainers.Ensemble.EnsembleMulticlassModelParameters
Microsoft.ML.Trainers.KMeansModelParameters
Microsoft.ML.Trainers.PcaModelParameters
Microsoft.ML.Transforms.TimeSeries.IidChangePointDetector
Microsoft.ML.Transforms.TimeSeries.IidSpikeDetector
Microsoft.ML.Transforms.TimeSeries.SrCnnAnomalyDetector
Microsoft.ML.Transforms.TimeSeries.SsaChangePointDetector
Microsoft.ML.Transforms.TimeSeries.SsaForecastingTransformer
Microsoft.ML.Transforms.TimeSeries.SsaSpikeDetector
Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters
Microsoft.ML.Trainers.LightGbm.LightGbmRankingModelParameters
Microsoft.ML.Trainers.LightGbm.LightGbmRegressionModelParameters
Microsoft.ML.Trainers.OlsModelParameters
Microsoft.ML.Transforms.VectorWhiteningTransformer

@eerhardt
Copy link
Member

eerhardt commented Nov 7, 2019

2 classes that have both Public create and constructor methods
Microsoft.ML.Transforms.LabelIndicatorTransform
Microsoft.ML.Transforms.SkipTakeFilter

You have to look at the whole signature of the method, not just the visibility and the name.

  • SkipTakeFilter doesn't have a constructor that takes a ModelLoadContext, so those constructors don't match what ComponentCatalog is looking for.
  • LabelIndicatorTransform's constructor that takes a ModelLoadContext also takes an IHost, not an IHostEnvironment. So that won't match either. See:

var parmTypesWithEnv = Utils.Concat(new Type[1] { typeof(IHostEnvironment) }, parmTypes);

52 classes that have both Non-Public create and constructor methods.

I spot checked a few of these, and the ones I did do look like a problem. IMO to fix it, the Create method should be marked internal and the constructor method marked private. And ComponentCatalog should pick the internal Create method over the private constructor method.

@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Nov 8, 2019

2 classes that have both Public create and constructor methods
Microsoft.ML.Transforms.LabelIndicatorTransform
Microsoft.ML.Transforms.SkipTakeFilter

You have to look at the whole signature of the method, not just the visibility and the name.

SkipTakeFilter doesn't have a constructor that takes a ModelLoadContext, so those constructors don't match what ComponentCatalog is looking for.
LabelIndicatorTransform's constructor that takes a ModelLoadContext also takes an IHost, not an IHostEnvironment. So that won't match either. See:

I agree that I should pay attention to the whole signature, so I've done it in the experiment I will post below. But I just want to make sure that we are on the same page about "what ComponentCatalog is looking for". By reading your comments on the SkipTakeFilter and LabelIndicatorTransform classes, I am under the impression that you are assuming that only constructors that have ModelLoadContext (and potentially IHostEnvironment) as a parameter will be used by the ComponentCatalog. I believe this assumption would be wrong, since the TryGetIniters method looks for whatever constructor that has a signature that matches with the parmTypes variable that is passed to the TryGetIniters, if it fails then it tries finding one that matches with the parmTypesWithEnv; and if it fails then it does the same but looking for create methods. But notice that there is no reason why parmTypes needs to include a ModelLoadContext type (although as I will show below, it typically does, but not always); particularly its my understanding that parmTypes is always going to be the CtorTypes property from the LoadableClassAttribute being registered, because that's what is passed to the TryGetIniters method in here. So I am not sure if I am missing your point, and maybe you meant something different. Anyway, I will elaborate on how I think it should work for these classes down below, just to make sure that I am not missing something.

Experiment 5: Finding the classes whose create and constructor methods share the same visibility and signature.

Experiment 4 already looked for classes whose create and constructor methods shared the same visibility and signature (with an optional IHostEnvironment extra parameter), and it actually only looked at those that could be used according to the LoadableClass attribute that was used when registering the assemblies. So now I reran exactly the same experiment, but now logging the parmTypes that defines the signature that TryGetInitters is trying to find. This way I can have a more precise idea of what changes need to be made. The code for this experiment can be found here.

Public classes

Microsoft.ML.Transforms.LabelIndicatorTransform         - parmTypes: Microsoft.ML.Transforms.LabelIndicatorTransform+Options, Microsoft.ML.IDataView            - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.SkipTakeFilter                  - parmTypes: Microsoft.ML.Transforms.SkipTakeFilter+SkipOptions, Microsoft.ML.IDataView                 - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.SkipTakeFilter                  - parmTypes: Microsoft.ML.Transforms.SkipTakeFilter+TakeOptions, Microsoft.ML.IDataView                 - reqEnvCtor: True - reqEnvCreate: True

The above list tell us that the only create and constructor method that causes a conflict in the LabelIndicatorTransform are the ones with overload (IHostEnvironment, LabelIndicatorTransform.Options, IDataView). This happened because the LabelIndicatorTransform has the following LoadableClassAttribute in here:

[assembly: LoadableClass(typeof(LabelIndicatorTransform), typeof(LabelIndicatorTransform.Options), typeof(SignatureDataTransform), ...

where the argType is LabelIndicatorTransform.Options, the sigType SignatureDataTransform has only one parameter IDataView, and thus the CtorTypes (aka parmTypes) of this attribute is [LabelIndicatorTransform.Options, IDataView] (since the CtorType is the combination of both argType and the parameters of the sigType, as mentioned here). Then the log also tell us that the TryGetIniters only found a constructor and create method with such a parmTypes, when requiring also the IHostEnvironment… thus the constructor and create method with parmTypesWithEnv (IHostEnvironment, LabelIndicatorTransform.Options, IDataView) are found here and here respectively in the LabelIndicatorTransform class, causing a conflict because it is unclear which one to use. Notice that in this case the parmTypes doesn't include any ModelLoadContext type.

A similar analysis to this other LoadableClassAttribute for the LabelIndicatorClass, tell us that it will try to load a constructor or create method with parmTypes [ModelLoadContext, IDataView] but this time there is no conflict because only this create method would match what the ComponentCatalog is looking for, and no constructor would match. The third LoadableClassAttribute of LabelIndicatorTransform doesn't matter because its instType is typeof(void), and so when registering the assembly the TryGetIniters method wouldn't be called for that attribute.

Applying the same analysis to the SkipTakeFilter class shows me that the only conflictive overloads for the constructor / creator methods would be the ones shown in the log.

Looking at these 3 cases, it's my impression that they should call the create method instead of the constructor. But I wouldn't be sure.

[Question 5.1] Should these 3 cases call its create method? How should I modify the classes to accomplish this?

Non-Public classes

As shown in the log at the end of this post, all of the non-public classes that were found problematic on Experiment 4, have the problem specifically only in their create and constructor methods with overload of (IHostEnvironment, LoadModelContext).

So following this suggestion by @eerhardt:

IMO to fix it, the Create method should be marked internal and the constructor method marked private. And ComponentCatalog should pick the internal Create method over the private constructor method

[Question 5.2] Should I then simply change the visibility of the Create methods with overload (IHostEnvironment, LoadModelContext)?? No other Create methods should be modified, right?

Here's the log:

Microsoft.ML.Calibrators.PlattCalibrator                                     - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Calibrators.IsotonicCalibrator                                  - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Calibrators.NaiveCalibrator                                     - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Calibrators.ValueMapperCalibratedModelParameters`2[...]         - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Calibrators.FeatureWeightsCalibratedModelParameters`2[...]      - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Calibrators.ParameterMixingCalibratedModelParameters`2[...]     - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Calibrators.SchemaBindableCalibratedModelParameters`2[...]      - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.FeatureContributionCalculatingTransformer            - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.SlotsDroppingTransformer                             - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.GaussianKernel+RandomNumberGenerator                 - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.LaplacianKernel+RandomNumberGenerator                - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.MissingValueDroppingTransformer                      - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.MissingValueIndicatorTransformer                     - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FieldAwareFactorizationMachineModelParameters          - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FieldAwareFactorizationMachinePredictionTransformer    - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.LinearBinaryModelParameters                            - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.LinearRegressionModelParameters                        - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.PoissonRegressionModelParameters                       - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.LinearMulticlassModelParameters                        - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.NaiveBayesMulticlassModelParameters                    - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.OneVersusAllModelParameters                            - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.PairwiseCouplingModelParameters                        - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.RandomModelParameters                                  - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.PriorModelParameters                                   - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.Recommender.MatrixFactorizationModelParameters         - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.Recommender.MatrixFactorizationPredictionTransformer   - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Vision.ImageClassificationModelParameters                       - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FastTree.FastTreeBinaryModelParameters                 - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FastTree.FastTreeRankingModelParameters                - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FastTree.FastTreeRegressionModelParameters             - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FastTree.FastTreeTweedieModelParameters                - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FastTree.GamBinaryModelParameters                      - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FastTree.GamRegressionModelParameters                  - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FastTree.FastForestBinaryModelParameters               - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FastTree.FastForestRegressionModelParameters           - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.FastTree.TreeEnsembleFeaturizationTransformer          - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.Ensemble.EnsembleDistributionModelParameters           - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.Ensemble.EnsembleModelParameters                       - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.Ensemble.EnsembleMulticlassModelParameters             - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.KMeansModelParameters                                  - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.PcaModelParameters                                     - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.TimeSeries.IidChangePointDetector                    - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.TimeSeries.IidSpikeDetector                          - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.TimeSeries.SrCnnAnomalyDetector                      - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.TimeSeries.SsaChangePointDetector                    - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.TimeSeries.SsaForecastingTransformer                 - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.TimeSeries.SsaSpikeDetector                          - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters                 - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.LightGbm.LightGbmRankingModelParameters                - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.LightGbm.LightGbmRegressionModelParameters             - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Trainers.OlsModelParameters                                     - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True
Microsoft.ML.Transforms.VectorWhiteningTransformer                           - parmTypes: Microsoft.ML.ModelLoadContext              - reqEnvCtor: True - reqEnvCreate: True

@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Nov 18, 2019

I have discussed this offline with @yaeldekel , and the conclusion is to follow Eric's suggestions, and that this shouldn't be a problem for other tools such as NimbusML, AutoML, etc...

So if a class has both a create and constructor method that matches what the ComponentCatalog is looking for, then it should use the one that is public. If both are non-public, then it should use the one that is not private (i.e. the internal). And if both have the same visibility, then throw an exception.

I would also need to change the methods and constructors access modifiers as appropriate, only changing the ones that are affected by this issue. By suggestion of @yaeldekel I changed this in a manner that the create method would be called instead of the constructor.

antoniovs1029 added a commit that referenced this issue Nov 27, 2019
… disk (#4485)

* Changed ComponentCatalog so it would use the most public "initter"
* Changed the visibility of several constructors and create methods so to choose the correct initter when loading models from disk
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants