Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.NoClassDefFoundError: scalapb/Message when serializing PySpark Model #675

Closed
its-felix opened this issue May 7, 2020 · 3 comments

Comments

@its-felix
Copy link

I get the following exception in my local PySpark Pipeline when I try to serialize the model using MLeap

Traceback (most recent call last):
  File ".\examples\src\main\python\ml\random_forest_classifier_example.py", line 88, in <module>
    model.serializeToBundle("jar:file:/Users/fwollsch/Downloads/test.zip", model.transform(trainingData))
  File "C:\Program Files\Python37\lib\site-packages\mleap\pyspark\spark_support.py", line 25, in serializeToBundle
    serializer.serializeToBundle(self, path, dataset=dataset)
  File "C:\Program Files\Python37\lib\site-packages\mleap\pyspark\spark_support.py", line 42, in serializeToBundle
    self._java_obj.serializeToBundle(transformer._to_java(), path, dataset._jdf)
  File "C:\Program Files\Python37\lib\site-packages\py4j\java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "C:\Program Files\Python37\lib\site-packages\pyspark\sql\utils.py", line 63, in deco
    return f(*a, **kw)
  File "C:\Program Files\Python37\lib\site-packages\py4j\protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o408.serializeToBundle.
: java.lang.NoClassDefFoundError: scalapb/Message
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at ml.combust.bundle.dsl.Value$.stringList(Value.scala:207)
        at org.apache.spark.ml.bundle.ops.feature.StringIndexerOp$$anon$1.store(StringIndexerOp.scala:20)
        at org.apache.spark.ml.bundle.ops.feature.StringIndexerOp$$anon$1.store(StringIndexerOp.scala:13)
        at ml.combust.bundle.serializer.ModelSerializer$$anonfun$write$1.apply(ModelSerializer.scala:87)
        at ml.combust.bundle.serializer.ModelSerializer$$anonfun$write$1.apply(ModelSerializer.scala:83)
        at scala.util.Try$.apply(Try.scala:192)
        at ml.combust.bundle.serializer.ModelSerializer.write(ModelSerializer.scala:83)
        at ml.combust.bundle.serializer.NodeSerializer$$anonfun$write$1.apply(NodeSerializer.scala:85)
        at ml.combust.bundle.serializer.NodeSerializer$$anonfun$write$1.apply(NodeSerializer.scala:81)
        at scala.util.Try$.apply(Try.scala:192)
        at ml.combust.bundle.serializer.NodeSerializer.write(NodeSerializer.scala:81)
        at ml.combust.bundle.serializer.GraphSerializer$$anonfun$writeNode$1.apply(GraphSerializer.scala:34)
        at ml.combust.bundle.serializer.GraphSerializer$$anonfun$writeNode$1.apply(GraphSerializer.scala:30)
        at scala.util.Try$.apply(Try.scala:192)
        at ml.combust.bundle.serializer.GraphSerializer.writeNode(GraphSerializer.scala:30)
        at ml.combust.bundle.serializer.GraphSerializer$$anonfun$write$2.apply(GraphSerializer.scala:21)
        at ml.combust.bundle.serializer.GraphSerializer$$anonfun$write$2.apply(GraphSerializer.scala:21)
        at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
        at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
        at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
        at ml.combust.bundle.serializer.GraphSerializer.write(GraphSerializer.scala:20)
        at org.apache.spark.ml.bundle.ops.PipelineOp$$anon$1.store(PipelineOp.scala:21)
        at org.apache.spark.ml.bundle.ops.PipelineOp$$anon$1.store(PipelineOp.scala:14)
        at ml.combust.bundle.serializer.ModelSerializer$$anonfun$write$1.apply(ModelSerializer.scala:87)
        at ml.combust.bundle.serializer.ModelSerializer$$anonfun$write$1.apply(ModelSerializer.scala:83)
        at scala.util.Try$.apply(Try.scala:192)
        at ml.combust.bundle.serializer.ModelSerializer.write(ModelSerializer.scala:83)
        at ml.combust.bundle.serializer.NodeSerializer$$anonfun$write$1.apply(NodeSerializer.scala:85)
        at ml.combust.bundle.serializer.NodeSerializer$$anonfun$write$1.apply(NodeSerializer.scala:81)
        at scala.util.Try$.apply(Try.scala:192)
        at ml.combust.bundle.serializer.NodeSerializer.write(NodeSerializer.scala:81)
        at ml.combust.bundle.serializer.BundleSerializer$$anonfun$write$1.apply(BundleSerializer.scala:34)
        at ml.combust.bundle.serializer.BundleSerializer$$anonfun$write$1.apply(BundleSerializer.scala:29)
        at scala.util.Try$.apply(Try.scala:192)
        at ml.combust.bundle.serializer.BundleSerializer.write(BundleSerializer.scala:29)
        at ml.combust.bundle.BundleWriter.save(BundleWriter.scala:31)
        at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$serializeToBundleWithFormat$2.apply(SimpleSparkSerializer.scala:26)
        at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$serializeToBundleWithFormat$2.apply(SimpleSparkSerializer.scala:25)
        at resource.AbstractManagedResource$$anonfun$5.apply(AbstractManagedResource.scala:88)
        at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:125)
        at scala.util.control.Exception$Catch$$anonfun$either$1.apply(Exception.scala:125)
        at scala.util.control.Exception$Catch.apply(Exception.scala:103)
        at scala.util.control.Exception$Catch.either(Exception.scala:125)
        at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88)
        at resource.ManagedResourceOperations$class.apply(ManagedResourceOperations.scala:26)
        at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50)
        at resource.DeferredExtractableManagedResource$$anonfun$tried$1.apply(AbstractManagedResource.scala:33)
        at scala.util.Try$.apply(Try.scala:192)
        at resource.DeferredExtractableManagedResource.tried(AbstractManagedResource.scala:33)
        at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:27)
        at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        ... 74 more

I'm using the random_forest_classifier_example.py from the pyspark-examples, with the addition of MLeap:

    [...]

    # serialize using mleap ( https://mleap-docs.combust.ml/py-spark/ )
    # Imports MLeap serialization functionality for PySpark
    import mleap.pyspark
    from mleap.pyspark.spark_support import SimpleSparkSerializer

    # SimpleSparkSerializer().serializeToBundle(model, "jar:file:/Users/fwollsch/Downloads/test.zip", dataset = trainingData)
    model.serializeToBundle("jar:file:/Users/fwollsch/Downloads/test.zip", model.transform(trainingData))

    spark.stop()

OS: Windows 10
MLeap (installed using pip) in Version 0.15.0
PySpark: 2.4.5
Python: 3.7.2

I have added the missing jars to the jars directory of my PySpark Installation.
The following jars are currently in my /jars directory:

activation-1.1.1.jar
aircompressor-0.10.jar
antlr-2.7.7.jar
antlr4-runtime-4.7.jar
antlr-runtime-3.4.jar
aopalliance-1.0.jar
aopalliance-repackaged-2.4.0-b34.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
apache-log4j-extras-1.2.17.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
arpack_combined_all-0.1.jar
arrow-format-0.10.0.jar
arrow-memory-0.10.0.jar
arrow-vector-0.10.0.jar
automaton-1.11-8.jar
avro-1.8.2.jar
avro-ipc-1.8.2.jar
avro-mapred-1.8.2-hadoop2.jar
bonecp-0.8.0.RELEASE.jar
breeze_2.11-0.13.2.jar
breeze-macros_2.11-0.13.2.jar
bundle-hdfs_2.11-0.15.0.jar
bundle-ml_2.11-0.15.0.jar
calcite-avatica-1.2.0-incubating.jar
calcite-core-1.2.0-incubating.jar
calcite-linq4j-1.2.0-incubating.jar
chill_2.11-0.9.3.jar
chill-java-0.9.3.jar
commons-beanutils-1.9.4.jar
commons-cli-1.2.jar
commons-codec-1.10.jar
commons-collections-3.2.2.jar
commons-compiler-3.0.9.jar
commons-compress-1.8.1.jar
commons-configuration-1.6.jar
commons-crypto-1.0.0.jar
commons-dbcp-1.4.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-lang3-3.5.jar
commons-logging-1.1.3.jar
commons-math3-3.4.1.jar
commons-net-3.1.jar
commons-pool-1.5.4.jar
compilerplugin_2.11-0.10.0-M4.jar
compilerplugin-shaded_2.11-0.10.0-M4.jar
compress-lzf-1.0.3.jar
config-1.4.0.jar
core-1.1.2.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar
derby-10.12.1.1.jar
eigenbase-properties-1.1.5.jar
flatbuffers-1.2.0-3f79e055.jar
generex-1.0.2.jar
gson-2.2.4.jar
guava-14.0.1.jar
guice-3.0.jar
guice-servlet-3.0.jar
hadoop-annotations-2.7.3.jar
hadoop-auth-2.7.3.jar
hadoop-client-2.7.3.jar
hadoop-common-2.7.3.jar
hadoop-hdfs-2.7.3.jar
hadoop-mapreduce-client-app-2.7.3.jar
hadoop-mapreduce-client-common-2.7.3.jar
hadoop-mapreduce-client-core-2.7.3.jar
hadoop-mapreduce-client-jobclient-2.7.3.jar
hadoop-mapreduce-client-shuffle-2.7.3.jar
hadoop-yarn-api-2.7.3.jar
hadoop-yarn-client-2.7.3.jar
hadoop-yarn-common-2.7.3.jar
hadoop-yarn-server-common-2.7.3.jar
hadoop-yarn-server-web-proxy-2.7.3.jar
hive-beeline-1.2.1.spark2.jar
hive-cli-1.2.1.spark2.jar
hive-exec-1.2.1.spark2.jar
hive-jdbc-1.2.1.spark2.jar
hive-metastore-1.2.1.spark2.jar
hk2-api-2.4.0-b34.jar
hk2-locator-2.4.0-b34.jar
hk2-utils-2.4.0-b34.jar
hppc-0.7.2.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.5.6.jar
httpcore-4.4.10.jar
ivy-2.4.0.jar
jackson-annotations-2.6.7.jar
jackson-core-2.6.7.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.6.7.3.jar
jackson-dataformat-yaml-2.6.7.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations-2.6.7.jar
jackson-module-paranamer-2.7.9.jar
jackson-module-scala_2.11-2.6.7.1.jar
jackson-xc-1.9.13.jar
janino-3.0.9.jar
JavaEWAH-0.3.2.jar
javassist-3.18.1-GA.jar
javax.annotation-api-1.2.jar
javax.inject-1.jar
javax.inject-2.4.0-b34.jar
javax.servlet-api-3.1.0.jar
javax.ws.rs-api-2.0.1.jar
javolution-5.5.1.jar
jaxb-api-2.2.2.jar
jcl-over-slf4j-1.7.16.jar
jdo-api-3.0.1.jar
jersey-client-2.22.2.jar
jersey-common-2.22.2.jar
jersey-container-servlet-2.22.2.jar
jersey-container-servlet-core-2.22.2.jar
jersey-guava-2.22.2.jar
jersey-media-jaxb-2.22.2.jar
jersey-server-2.22.2.jar
jetty-6.1.26.jar
jetty-util-6.1.26.jar
jline-2.14.6.jar
joda-time-2.9.3.jar
jodd-core-3.5.2.jar
jpam-1.1.jar
json4s-ast_2.11-3.5.3.jar
json4s-core_2.11-3.5.3.jar
json4s-jackson_2.11-3.5.3.jar
json4s-scalap_2.11-3.5.3.jar
jsp-api-2.1.jar
jsr305-1.3.9.jar
jta-1.1.jar
jtransforms-2.4.0.jar
jul-to-slf4j-1.7.16.jar
kryo-shaded-4.0.2.jar
kubernetes-client-4.6.1.jar
kubernetes-model-4.6.1.jar
kubernetes-model-common-4.6.1.jar
lenses_2.11-0.10.0-M4.jar
leveldbjni-all-1.8.jar
libfb303-0.9.3.jar
libthrift-0.9.3.jar
log4j-1.2.17.jar
logging-interceptor-3.12.0.jar
lz4-java-1.4.0.jar
machinist_2.11-0.6.1.jar
macro-compat_2.11-1.1.1.jar
mesos-1.4.0-shaded-protobuf.jar
metrics-core-3.1.5.jar
metrics-graphite-3.1.5.jar
metrics-json-3.1.5.jar
metrics-jvm-3.1.5.jar
minlog-1.3.0.jar
mleap-base_2.11-0.15.0.jar
mleap-core_2.11-0.15.0.jar
mleap-executor_2.11-0.15.0.jar
mleap-runtime_2.11-0.15.0.jar
mleap-spark_2.11-0.15.0.jar
mleap-spark-base_2.11-0.15.0.jar
mleap-spark-extension_2.11-0.15.0.jar
mleap-tensor_2.11-0.15.0.jar
netty-3.9.9.Final.jar
netty-all-4.1.42.Final.jar
objenesis-2.5.1.jar
okhttp-3.12.0.jar
okio-1.15.0.jar
opencsv-2.3.jar
orc-core-1.5.5-nohive.jar
orc-mapreduce-1.5.5-nohive.jar
orc-shims-1.5.5.jar
oro-2.0.8.jar
osgi-resource-locator-1.0.1.jar
paranamer-2.8.jar
parquet-column-1.10.1.jar
parquet-common-1.10.1.jar
parquet-encoding-1.10.1.jar
parquet-format-2.4.0.jar
parquet-hadoop-1.10.1.jar
parquet-hadoop-bundle-1.6.0.jar
parquet-jackson-1.10.1.jar
protobuf-java-2.5.0.jar
protobuf-runtime-scala_2.11-0.8.3.jar
protoc-bridge_2.11-0.7.14.jar
py4j-0.10.7.jar
pyrolite-4.13.jar
RoaringBitmap-0.7.45.jar
scala-arm_2.11-2.0.jar
scala-compiler-2.11.12.jar
scala-library-2.11.12.jar
scala-parser-combinators_2.11-1.1.0.jar
scalapbc_2.11-0.10.0-M4.jar
scalapb-json4s_2.11-0.10.1-M1.jar
scalapb-runtime_2.11-0.10.0-M4.jar
scalapb-runtime-grpc_2.11-0.10.0-M4.jar
scala-reflect-2.11.12.jar
scala-xml_2.11-1.0.5.jar
shapeless_2.11-2.3.2.jar
shims-0.7.45.jar
slf4j-api-1.7.16.jar
slf4j-log4j12-1.7.16.jar
snakeyaml-1.15.jar
snappy-0.2.jar
snappy-java-1.1.7.3.jar
spark-catalyst_2.11-2.4.5.jar
spark-core_2.11-2.4.5.jar
spark-graphx_2.11-2.4.5.jar
spark-hive_2.11-2.4.5.jar
spark-hive-thriftserver_2.11-2.4.5.jar
spark-kubernetes_2.11-2.4.5.jar
spark-kvstore_2.11-2.4.5.jar
spark-launcher_2.11-2.4.5.jar
spark-mesos_2.11-2.4.5.jar
spark-mllib_2.11-2.4.5.jar
spark-mllib-local_2.11-2.4.5.jar
spark-network-common_2.11-2.4.5.jar
spark-network-shuffle_2.11-2.4.5.jar
spark-repl_2.11-2.4.5.jar
spark-sketch_2.11-2.4.5.jar
spark-sql_2.11-2.4.5.jar
sparksql-scalapb_2.11-0.9.2.jar
spark-streaming_2.11-2.4.5.jar
spark-tags_2.11-2.4.5.jar
spark-tags_2.11-2.4.5-tests.jar
spark-unsafe_2.11-2.4.5.jar
spark-yarn_2.11-2.4.5.jar
spire_2.11-0.13.0.jar
spire-macros_2.11-0.13.0.jar
ST4-4.0.4.jar
stax-api-1.0.1.jar
stax-api-1.0-2.jar
stream-2.7.0.jar
stringtemplate-3.2.1.jar
super-csv-2.2.0.jar
univocity-parsers-2.7.3.jar
validation-api-1.1.0.Final.jar
xbean-asm6-shaded-4.8.jar
xercesImpl-2.9.1.jar
xmlenc-0.52.jar
xz-1.5.jar
zjsonpatch-0.3.0.jar
zookeeper-3.4.6.jar
zstd-jni-1.3.2-2.jar
@ancasarb
Copy link
Member

ancasarb commented May 8, 2020

@codeflush-dev This looks like it could be a scalapb versioning issue, perhaps?

    println(scalapb.compiler.Version.scalapbVersion)

Inside mleap it seems this prints 0.7.1, perhaps try with that instead of 0.10.1-M1/0.10.1-M4?

@its-felix
Copy link
Author

@codeflush-dev This looks like it could be a scalapb versioning issue, perhaps?

    println(scalapb.compiler.Version.scalapbVersion)

Inside mleap it seems this prints 0.7.1, perhaps try with that instead of 0.10.1-M1/0.10.1-M4?

I'll try that and come back to you. Thanks :)

@its-felix
Copy link
Author

Works a expected now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants