-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Adding training statistics for LR in the HAL learners package. #1392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mpute the std errors
Double priorPosRate = PosWeight / WeightSum; | ||
Contracts.Assert(0 <= priorPosRate && priorPosRate <= 1); | ||
float nullDeviance = (priorPosRate <= 1e-15 || 1 - priorPosRate <= 1e-15) ? | ||
0f : (float)(2 * WeightSum * MathUtils.Entropy(priorPosRate, true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we go down this route, this will need to get refactored to avoid repetition. Doing it the fast way to see if it even passes the smell test :)
) #Resolved
|
||
/// <include file='doc.xml' path='doc/members/member[@name="LBFGS"]/*' /> | ||
/// <include file='doc.xml' path='docs/members/example[@name="LogisticRegressionBinaryClassifier"]/*' /> | ||
public sealed partial class LogisticRegressionWithStats : LogisticRegression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) In general I think the file name should match the class name. Here we have LogisticRegressionWithStats
vs LRWithTrainingStatistics.cs
#Resolved
My initial thought is that it might be confusing to add a completely separate trainer that is exactly the same but just has one additional capability of providing the stats. Maybe I just don't have a good idea of what is the alternative solution. Would a utility method be called after training? Or do I have to specify it before training? Conceptually, a user first trains the model, then wants to look up the stats. What do others think? #Resolved |
If we don't worry about the cmd usage, the utility method is fine. #Resolved |
…., rather than a separate trainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sfilipi thanks for working on this. I am not a fan of this. Everywhere else we have been an advocation that the place where this sort of "side information" would be published would be gotten by the transformer that is returned by this Fit
method -- this indeed is the reason why we bothered to make the IEstimator
strongly typed.
And this at least conceptually appears to work pretty well, and I'm not sure I see too many flaws in it. When you fit a term estimator, you could get the dictionary from the transform. When you fit a normalizer, you could get the normalizer scaling/offsets from the normalizer transform. When you train a GBDT, you get the tree from the predictor in the transform. When you train a linear model, you can get the linear terms from the predictor in the transform. And even with training stats, we already have OLS, and that again publishes its statistics in the predictor in the transform.
I would really like it if you could follow the same pattern here. Making IEstimator
s contain information about the model is completely against the spirit of IEstimator
and ITransformer
. ITransformer
is the model, an IEstimator
is a way to make models. They cannot contain model information.
On the whole by doing this I think you will also find the odd architectural decisions you thought you were forced into (the so-called "heresies," which should have been a clue that something seriously wrong was going on) will go away.
From a practical point of view, my impression is that in order to get training statistics from logistic regression, you would have to use MKL. This would then require you to add a rather large assembly to your project. At least in my use case, this may be prohibitive. #Pending |
hi @klausmh, what is the bar for the size addition? We don't use all of MKL, we have a custom build of it that contains only functions that are needed by ML.Net. The whole nuget is about 320MB with libraries for all the platforms, win64. win 86, linux and mac. Windows debug is only 70MB. In reply to: 434880120 [](ancestors = 434880120) |
In our case, this would be added to a desktop installer that is currently 262 MB, so it would be a 25% increase in installer size. I would think that something exceeding 10 MB would be problematic (in this specific case). #Resolved |
|
||
LinearModelStatistics.TryGetBiasStatistics(stats, 2, out stdError, out zScore, out pValue); | ||
|
||
Assert.True(stdError > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May we put a more strict range for each statistic? #Resolved
public static class LogisticRegressionTrainingStats | ||
{ | ||
|
||
public static void ComputeExtendedTrainingStatistics(this LinearBinaryPredictor model, IChannel ch, float l2Weight = LogisticRegression.Arguments.Defaults.L2Weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name can be just compute(...)
because it's inside LogisticRegressionTrainingStats
. #Resolved
// Iterate through all entries of inverse Hessian to make adjustment to variance. | ||
// A discussion on ridge regularized LR coefficient covariance matrix can be found here: | ||
// http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228544/ | ||
// http://www.inf.unibz.it/dis/teaching/DWDM/project2010/LogisticRegression.pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// http://www.inf.unibz.it/dis/teaching/DWDM/project2010/LogisticRegression.pdf | |
// http://www.inf.unibz.it/dis/teaching/DWDM/project2010/LogisticRegression.pdf (Section "Significance testing in ridge logistic regression") | |
``` #Resolved |
for (int iCol = 0; iCol <= iRow; iCol++) | ||
{ | ||
var entry = (Single)invHessian[ioffset]; | ||
var adjustment = -l2Weight * entry * entry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line doesn't make a lot of sense to me. The code first compute \sigma=sqrt(x+\lambda)
and them do something like \sigma + \lambda * \sigma^2
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
// Iterate through all entries of inverse Hessian to make adjustment to variance. | ||
// A discussion on ridge regularized LR coefficient covariance matrix can be found here: | ||
// http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228544/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like putting a very old doc inline and asking someone to read the whole article.
// http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228544/ | |
// http://www.aloki.hu/pdf/0402_171179.pdf (Equations 11 and 25) | |
``` #Resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better @sfilipi thanks for addressing. I'm not sure how to clear my "request changes" review status so approving. An improvement for sure, though not perhaps quite what is needed by the people.
…ions, one the old MKl way in the HAL Learners package, and the other making use of Math.Numerics
@@ -38,5 +39,55 @@ public void TestEstimatorPoissonRegression() | |||
TestEstimatorCore(pipe, dataView); | |||
Done(); | |||
} | |||
|
|||
[Fact] | |||
public void TestLogisticRegressionStats() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestLogisticRegressionStats [](start = 20, length = 27)
combine both tests together #WontFix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently running into issues with this. Will investigate and log a bug.
In reply to: 231416451 [](ancestors = 231416451)
|
||
<PropertyGroup> | ||
<TargetFramework>netstandard2.0</TargetFramework> | ||
<IncludeInPackage>Microsoft.ML</IncludeInPackage> | ||
<AllowUnsafeBlocks>true</AllowUnsafeBlocks> | ||
</PropertyGroup> | ||
|
||
<ItemGroup> | ||
<PackageReference Include="MathNet.Numerics.Signed" Version="$(MathNumericPackageVersion)" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: you'll need to add this PackageReference to our NuGet package as well:
machinelearning/pkg/Microsoft.ML/Microsoft.ML.nupkgproj
Lines 11 to 17 in f222025
<PackageReference Include="Newtonsoft.Json" Version="$(NewtonsoftJsonPackageVersion)" /> | |
<PackageReference Include="System.Reflection.Emit.Lightweight" Version="$(SystemReflectionEmitLightweightPackageVersion)" /> | |
<PackageReference Include="System.Threading.Tasks.Dataflow" Version="$(SystemThreadingTasksDataflowPackageVersion)" /> | |
<PackageReference Include="System.CodeDom" Version="$(SystemCodeDomPackageVersion)" /> | |
<PackageReference Include="System.Memory" Version="$(SystemMemoryVersion)" /> | |
<PackageReference Include="System.Collections.Immutable" Version="$(SystemCollectionsImmutableVersion)" /> | |
<PackageReference Include="System.ComponentModel.Composition" Version="$(SystemComponentModelCompositionVersion)" /> |
@TomFinley most things are changed from your approval, if you wanted to take another look. |
/// More than 1000 weights might take a few minutes. For those cases consider using the instance of <see cref="IComputeLRTrainingStd"/> | ||
/// present in the Microsoft.ML.HalLearners package. That computes the statistics using hardware acceleration. | ||
/// </summary> | ||
public IComputeLRTrainingStd StdComputer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StdComputer; [](start = 40, length = 13)
@tfinley@gmail.com the analyzer complained if i initialize here, and suggest to use the ctor instead. Is that legit? Asking since most of our Arguments are initialized here.
…rning-1 into trainingStats
build/Dependencies.props
Outdated
@@ -22,6 +22,7 @@ | |||
<SystemIOFileSystemAccessControl>4.5.0</SystemIOFileSystemAccessControl> | |||
<SystemSecurityPrincipalWindows>4.5.0</SystemSecurityPrincipalWindows> | |||
<TensorFlowVersion>1.10.0</TensorFlowVersion> | |||
<MathNumericPackageVersion>4.6.0</MathNumericPackageVersion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) since this is a dependency in the MIcrosoft.ML
nuget package, it belongs under the Core Product Dependencies
section above. #Resolved
using System.ComponentModel; | ||
using System.IO; | ||
using System.Linq; | ||
using MathNet.Numerics.LinearAlgebra; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this using necessary? I don't see any Math.NET usages below. #Resolved
@@ -397,4 +426,113 @@ public static CommonOutputs.BinaryClassificationOutput TrainBinary(IHostEnvironm | |||
() => LearnerEntryPointsUtils.FindColumn(host, input.TrainingData.Schema, input.WeightColumn)); | |||
} | |||
} | |||
|
|||
/// <summary> | |||
/// Computes the standart deviation matrix of each of the non-zero training weights, needed to calculate further the standart deviation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
standart deviation [](start = 21, length = 18)
standard deviation
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double[,] matrixHessian = new double[numSelectedParams, numSelectedParams]; | ||
|
||
int hessianLength = 0; | ||
int dimention = numSelectedParams - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimention
=> dimension
. #Resolved
} | ||
} | ||
|
||
var h = Matrix<double>.Build.DenseOfArray(matrixHessian); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must be missing something here, so please forgive my ignorance.
What exactly are we using Math.NET for in this code? Are we just using it for the Matrix type, and for Inversing a Matrix? Or is there something else we are using that I'm not seeing?
If it is just for the Matrix type and Inverse, I don't think a full dependency on Math.NET is necessary. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not missing anything. Math.Net numeric is used just to calculate the inverse. We discussed the option to implement it ourselves, and it was deemed error prone to round-of errors and unnecessary to own that code.
There is also a parallel pending discussion on weather we need to have an alternative method to MKL, which seems like it is leaning towards just keeping the MKL version.
In reply to: 232706971 [](ancestors = 232706971)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for the information.
{ | ||
using Mkl = OlsLinearRegressionTrainer.Mkl; | ||
|
||
public sealed class ComputeLRTrainingStdThroughHal : IComputeLRTrainingStd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) the file name and the class name should match, so it is easier to find the class on the file system. #Resolved
{ | ||
// Iterate through all entries of inverse Hessian to make adjustment to variance. | ||
// A discussion on ridge regularized LR coefficient covariance matrix can be found here: | ||
// http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228544/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this link is different than the link in the link in the ComputeLRTrainingStdThroughHal
class? #Resolved
|
||
if (l2Weight > 0) | ||
{ | ||
// Iterate through all entries of inverse Hessian to make adjustment to variance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe extract this whole chunk of code into an internal utility method so it doesn't need to be duplicated between here and in ComputeLRTrainingStdThroughHal
. #Resolved
/// p-value and z-Score. | ||
/// If you need fast calculations, use the <see cref="IComputeLRTrainingStd"/> implementation in the Microsoft.ML.HALLearners package, | ||
/// which makes use of hardware acceleration. | ||
/// Due to the existence of regularization, an approximation is used to compute the variances of the trained linear coefficients. | ||
/// </summary> | ||
public interface IComputeLRTrainingStd | ||
public abstract class IComputeLRTrainingStd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be naming a class ISomething
. I
is the prefix for interfaces. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wschin did you have any more comments for this PR? Does it look good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not excited about making our core NuGet package dependent on MathNet.Numerics.Signed
, especially for 1 function (inversing a matrix). But I won't stand in your way of merging this PR.
Thanks @eerhardt and @TomFinley. Based on customer needs, and the evaluation of benchmarks, it is likely that i turn around and remove the dependency. I am not thrilled either. |
One option we could do (now that I'm thinking a bit more clearly) is now that we have the extensibility in place, any consumer can now derive their own Then we can remove the direct dependency from Thoughts? |
To address #612 maybe we can introduce another LR trainer in the HAL learners package.
Heresy #1 - it derives from LR making this last one not sealed, and exposing more of its fields.
Heresy #2 - the only thing this learner does differently from LR is to compute the std in the training stats are requested.
I think afa usability this is more accessible than just having a utility method that takes the LR trainer/predictor and recomputes the stats with std.
Thought?
@GalOshri @TomFinley @eerhardt