-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSE is negative when returned by cross_val_score #2439
Comments
You're referring to
in http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html I agree that that it can be more clear in cross_val_score docs Thanks for reporting |
Indeed we overlooked that issue when doing the Scorer refactoring. The following is very counter-intuitive:
/cc @larsmans |
BTW I don't agree that it's a documentation issue. It's |
I also agree that a change to return the actual MSE without the sign switched would be the way better option. The scorer object could just store the |
I agree that we have a usability issue here, but I don't fully agree with @ogrisel's solution that we should
because that's an unreliable hack in the long run. What if someone defines a custom scorer with a name such as
This is what scorers originally did, during development between the 0.13 and 0.14 releases and it made their definition a lot harder. It also made the code hard to follow because the I believe that if we want to optimize scores, then they should be maximized. For the sake of user-friendlyness, I think we might introduce a parameter |
That was a hurried response because I had to get off the train. What I meant by "display" is really the return value from This does introduce an asymmetry between built-in and custom scorers. Ping @GaelVaroquaux. |
I like the score_is_loss solution, or something to that effect.. the sign change to match the scoring name seems hard to maintain could cause problems as @larsmans mentioned |
what's the conclusion, which solution should we go for? :) |
@tdomhan @jaquesgrobler @larsmans Do you know if this applies to |
R² can be either positive or negative, and negative simply means your model is performing very poorly. |
IIRC, @GaelVaroquaux was a proponent of returning a negative number when |
|
What is the consensus on this issue? In my opinion, I can fix it in my PR #2759, since the changes I made make it really easy to fix. The trick is to not flip the sign upfront but, instead, to access the |
Special case are varying behaviors are a source of problems in software. I simply think that we should rename "mse" to "negated_mse" in the list |
I don't think that @ogrisel was suggesting to use name matching, just to be consistent with the original metric. Correct me if I'm wrong @ogrisel. |
That's completely unintuitive if you don't know the internals of scikit-learn. If you have to bend the system like that, I think it's a sign that there's a design problem. |
I disagree. Humans understand things with a lot of prior knowledge and |
What special case do you have in mind? To be clear, I think that the cross-validation scores stored in the AFAIK, flipping the sign was introduced so as to make the grid search implementation a little simpler but was not supposed to affect usability. |
Well, the fact that for some metrics bigger is better, whereas for others
It's not about grid search, it's about separation of concerns: scores |
But that's somewhat postponing the problem to user code. Nobody wants to plot "negated MSE" so users will have to flip signs back in their code. This is inconvenient, especially for multiple-metric cross-validation reports (PR #2759), as you need to handle each metric individually. I wonder if we can have the best of both worlds: generic code and intuitive results. |
Certainly not the end of the world. Note that when reading papers or
Why. If you just accept that its always bigger is better, it makes
The risk is to have very complex code that slows us down for maintainance |
That's what she said :) More seriously, I think one reason this is confusing people is because the output of |
Nice one!
Agreed. That's why I like the idea of changing the name: it would pop up |
And this in turn makes |
Got bitten by this today in 0.16.1 when trying to do linear regression. While the sign of the score is apparently not flipped anymore for classifiers, it is still flipped for linear regression. To add to the confusion, LinearRegression.score() returns a non-flipped version of the score. I'd suggest to make it all consistent and return the non-sign-flipped score for linear models as well. Example: from sklearn import linear_model
from sklearn.naive_bayes import GaussianNB
from sklearn import cross_validation
from sklearn import datasets
iris = datasets.load_iris()
nb = GaussianNB()
scores = cross_validation.cross_val_score(nb, iris.data, iris.target)
print("NB score:\t %0.3f" % scores.mean() )
iris_reg_data = iris.data[:,:3]
iris_reg_target = iris.data[:,3]
lr = linear_model.LinearRegression()
scores = cross_validation.cross_val_score(lr, iris_reg_data, iris_reg_target)
print("LR score:\t %0.3f" % scores.mean() )
lrf = lr.fit(iris_reg_data, iris_reg_target)
score = lrf.score(iris_reg_data, iris_reg_target)
print("LR.score():\t %0.3f" % score ) This gives:
|
Cross-validation flips all signs of models where greater is better. I still disagree with this decision. I think the main proponent of it were @GaelVaroquaux and maybe @mblondel [I remembered you refactoring the scorer code]. |
Oh never mind, all the discussion is above. |
And |
Adding the An idea would be to return the original scores (without sign flip) but instead of returning an ndarray, we return a class which extends ndarray with methods like |
There's no scorer for hinge loss (and I've never seen it being used for evaluation). |
The scorer doesn't return a numpy array, it returns a float, right? |
|
Actually the scores returned by Another idea is to add a my_scorer = make_scorer(my_metric, greater_is_better=False)
scores = my_scorer.sorted(scores) # takes into account my_scorer._sign
best = scores[0] |
You'd also need an argsort method, because in GridSearchCV you want the best score and the best index. |
How to implement "estimate the means and variances of the workers' errors from the control questions, then compute the weighted average after removing the estimated bias for the predictions " by scikit-learn? |
IIRC we discussed this in the sprint (last summer?!) and decided to go with |
yes we agreed on neg_mse AFAIK
|
It was |
We also need:
|
model = Sequential() How to cross validate the above code? I want leave one out cross validation method to be used in this. |
@shreyassks this isn't the correct place for your question but I would check this out: https://keras.io/scikit-learn-api . Wrap your network in a |
Yes. I totally agree! This also happened to Brier_score_loss, it works perfectly fine using Brier_score_loss, but it gets confusing when it comes from the GridSearchCV, the negative Brier_score_loss returns. At least, it would be better output something like, because Brier_score_loss is a loss (the lower the better), the scoring function here flip the sign to make it negative. |
The idea is that cross_val_score should entirely focus on the absolute value of the result. In my knowledge, importance of negative sign (-) obtained for MSE (mean squared error) in cross_val_score is not predefined. Let's wait for the updated version of sklearn where this issue is taken care of. |
For Regression usecase: SVR: Linear Regression: Lasso: Ridge: So which one is best ? |
For Regression usecase: For Regression models which one is better ?
|
@pritishban |
The Mean Square Error returned by sklearn.cross_validation.cross_val_score is always a negative. While being a designed decision so that the output of this function can be used for maximization given some hyperparameters, it's extremely confusing when using cross_val_score directly. At least I asked myself how a the mean of a square can possibly be negative and thought that cross_val_score was not working correctly or did not use the supplied metric. Only after digging in the sklearn source code I realized that the sign was flipped.
This behavior is mentioned in make_scorer in scorer.py, however it's not mentioned in cross_val_score and I think it should be, because otherwise it makes people think that cross_val_score is not working correctly.
The text was updated successfully, but these errors were encountered: