Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLeap BinaryLogisticRegressionModel calculating result differs with the Spark model #339

Open
ihainan opened this issue Mar 1, 2018 · 2 comments

Comments

@ihainan
Copy link
Contributor

ihainan commented Mar 1, 2018

Hi there. I tried to train a Spark BinaryLogisticRegressionModel with a dataset whose labels are the same value and used this model to make predictions.

// data
val rddData = sc.parallelize(Seq[(Integer, Double, Double, Double, Double, Double, Double, Double, Double, Double, Double, Double, Double, Double, Double)](
    (1, 0.2, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4),
    (1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.8, 0.4, 0.2, 0.1, 1.2, 1.1, 1.1, 1.0, 0.33),
    (1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.8, 0.4, 0.2, 0.1, 1.2, 1.1, 1.1, 1.0, 0.33)))
val data = spark.createDataFrame(rddData).toDF("LABEL", "C1", "C2", "C3", "C4", "C5", "C6", "C7", "C8", 
"C9", "C10", "C11", "C12", "C13", "C14")

// transformers & estimators
val assembler = new VectorAssembler().setInputCols(Array("C1", "C2", "C3", "C4", "C5", "C6", "C7", "C8", "C9", "C10", "C11", "C12", "C13", "C14")).setOutputCol("features")
val featureIndexer = new VectorIndexer().setInputCol("features").setOutputCol("indexedFeatures").setMaxCategories(2)
val lr = new LogisticRegression().setLabelCol("LABEL")

The result looks fine:

{
  "probability":[0.0,1.0], 
  "prediction":1.0
}

After converting to MLeap model, the "probabilities" are all nulls, the prediction result is incorrect as well.

{
  "probability": [null, null],
  "prediction": 0.0
}

Spark Version: 2.1.1
MLeap Version: 0.7.0

Seems that Spark set the intercept parameter to Double.PositiveInfinity but MLeap can't handle this situation.

// org.apache.spark.ml.classification.LogisticRegression
val interceptVec = if (isMultinomial) {
  Vectors.sparse(numClasses, Seq((constantLabelIndex, Double.PositiveInfinity)))
 } else {
   Vectors.dense(if (numClasses == 2) Double.PositiveInfinity else Double.NegativeInfinity)
}
// ml.combust.mleap.core.classification.BinaryLogisticRegressionModel
def margin(features: Vector): Double = {
    BLAS.dot(features, coefficients) + intercept
}
@hollinwilkins
Copy link
Member

@ihainan Would you be able to have a look at a fix for this and submit a PR? I think we would need to support serializing doubles as positive and negative infinity in the JsonSupport file in the bundle-ml submodule of MLeap.

@marvinxu-free
Copy link

seems not resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants