change logloss to auc

shunsunsun · Apr 1, 2019 · 311c60b · 311c60b
1 parent 002849a
commit 311c60b
Show file tree

Hide file tree

Showing 2 changed files with 12 additions and 12 deletions.
diff --git a/example_model.py b/example_model.py
@@ -6,7 +6,7 @@
 
 import pandas as pd
 import numpy as np
-from sklearn import metrics, preprocessing, linear_model
+from sklearn import metrics, linear_model
 
 
 def main():
@@ -78,11 +78,11 @@ def main():
     print("- elizabeth using bernie:",
           sum(correct) / float(validation.shape[0]))
 
-    # Numerai measures models on logloss instead of accuracy. The lower the logloss the better.
-    # Numerai only pays models with logloss < 0.693 on the live portion of the tournament data.
-    # Our validation logloss isn't very good.
-    print("- validation logloss:",
-          metrics.log_loss(validation['target_bernie'], probabilities))
+    # Numerai measures models on AUC. The higher the AUC the better.
+    # Numerai only pays models with AUC that beat the benchmark on the live portion of the tournament data.
+    # Our validation AUC isn't very good.
+    print("- validation AUC:",
+          metrics.roc_auc_score(validation['target_bernie'], probabilities))
 
     # To submit predictions from your model to Numerai, predict on the entire tournament data.
     x_prediction = tournament[features]
@@ -121,7 +121,7 @@ def main():
 
 3. Use all the targets
 As we saw above, a model trained on one target like target_bernie might be good at predicting another target
-like target_elizabeth. Blending models built on each target could also improve your logloss and consistency.
+like target_elizabeth. Blending models built on each target could also improve your AUC.
 """
 
 if __name__ == '__main__':

diff --git a/example_model.r b/example_model.r
@@ -37,10 +37,10 @@ cor(validation$target_bernie, validation$target_elizabeth)
 #you can see that target_elizabeth is accurate using the bernie model as well
 sum(round(probabilities)==validation$target_elizabeth)/nrow(validation)
 
-#Numerai measures models on logloss instead of accuracy. The lower the logloss the better.
-#Numerai only pays models with logloss < 0.693 on the live portion of the tournament data.
-#Our validation logloss isn't very good.
-logLoss(validation$target_bernie, probabilities)
+#Numerai measures models on AUC. The higher the AUC the better.
+#Numerai only pays models with AUC that beat the benchmark on the live portion of the tournament data.
+#Our validation AUC isn't very good.
+auc(validation$target_bernie, probabilities)
 
 #to submit predictions from your model to Numerai, predict on the entire tournament data
 tournament$probability_bernie<-predict.gbm(model, tournament, n.trees=10, type="response")
@@ -72,4 +72,4 @@ write.csv(submission, file="bernie_submission2.csv", row.names=F)
 
 #3. Use all the targets
 #As we saw above, a model trained on one target like target_bernie might be good at predicting another target
-#like target_elizabeth. Blending models built on each target could also improve your logloss and consistency.
+#like target_elizabeth. Blending models built on each target could also improve your AUC.