Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serializeToBundle object issue #8

Open
drkmd8 opened this issue Oct 16, 2017 · 38 comments
Open

serializeToBundle object issue #8

drkmd8 opened this issue Oct 16, 2017 · 38 comments

Comments

@drkmd8
Copy link

drkmd8 commented Oct 16, 2017

If running the code as above, there's an issue with featurePipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip"). AttributeError: 'Pipeline' object has no attribute 'serializeToBundle'.

If use the following code:
featurePipeline2 = featurePipeline.fit(df2) featurePipeline2.serializeToBundle("jar:file:/tmp/pyspark.example.zip")
There is an error with self._java_obj = _jvm().ml.combust.mleap.spark.SimpleSparkSerializer(), saying "TypeError: 'JavaPackage' object is not callable"

How to solve it?

@hollinwilkins
Copy link
Member

@drkmd8 Can you give us version information for both the Python and MLeap JVM packages as well as Spark that you are using?

@drkmd8
Copy link
Author

drkmd8 commented Jan 13, 2018

I used Python 3.6.1, Mleap 0.8.1 (pip install mleap) and pyspark 2.1.1+hadoop2.7. This problem was caused by python mleap library I think, because scala seems to work fine but python requires running with external jar files where mleap classes are included.

@alexkayal
Copy link

I have the same issue at the moment as well.

@tianhongjie
Copy link

Yes, I have the same issue, my solution is add the jar file to the pyspark jars dir which at the python package path:site-packages/pyspark/jars/ .
I add some jars such as below:
mleap-base_2.11-0.10.0.jar
mleap-core_2.11-0.10.0.jar
mleap-runtime_2.11-0.10.0.jar
mleap-spark_2.11-0.10.0.jar
mleap-spark-base_2.11-0.10.0.jar
mleap-tensor_2.11-0.10.0.jar

I hope it is helpful for you.

@alexkayal
Copy link

alexkayal commented Jun 11, 2018

I also solved it by adding the jars manually to /usr/lib/spark/jars.
But I guess there is a better way to do it: just sudo pip install jip
Then install MLeap. Jip is supposed to take care of your Java dependencies if i understand correctly.

@Khiem-Tran
Copy link

Hi @alexkayal & @tianhongjie, I have tried your solution, it fixed the JavaPackage issue but then I got another one.

Py4JJavaError: An error occurred while calling o261.serializeToBundle.
: java.lang.NoClassDefFoundError: com/trueaccord/scalapb/GeneratedEnum

I am not sure how it can happen, since my dataframe only has primitive types {int, double}. Do you have any ideas on it?

@samant2008
Copy link

@alexkayal @tianhongjie @Khiem-Tran ,
I am also getting the same error. Any idea how to resolve this issue?`

Py4JJavaError: **An error occurred while calling o414.serializeToBundle. : java.lang.NoClassDefFoundError: com/trueaccord/scalapb/GeneratedEnum** at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: com.trueaccord.scalapb.GeneratedEnum at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 23 more

@elgalu
Copy link

elgalu commented Oct 6, 2018

I fixed the com/trueaccord related errors by adding lenses_2.11-0.4.12.jar

@samant2008
Copy link

@elgalu ,

Thank you so much for your prompt response.
I have included the lenses_2.11-0.4.12.jar but still I m getting the same error as above. Do you have any other suggestion to resolve this issue?

@elgalu
Copy link

elgalu commented Oct 7, 2018

Make sure is in the CLASSPATH, note Py4J has its own jars/ folder. And if you install pyspark separately it also comes with its own jars/ folder. What I do is remove all those jars directories and symlink to 1 /jars where a put together the whole set of working versions.

You can find all my working jars at:
https://github.com/elgalu/jupyter-spark-117/tree/master/spark/jars

Pending: to build an sbt or pom.xml project (instead of a bunch of jars)

@siyouhe666
Copy link

@elgalu ,

Thank you so much for your prompt response.
I have included the lenses_2.11-0.4.12.jar but still I m getting the same error as above. Do you have any other suggestion to resolve this issue?

I am same to you, have you find an answer?

@Khiem-Tran
Copy link

@elgalu @siyouhe666, I have been using spark 2.2.1 and the config --packages ml.combust.mleap:mleap-spark_2.11:0.11.0. It seems work for me

Btw, I also follow this (blog)[https://medium.com/@bogdan.cojocar/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdb] to fix xgboost dependency because somehow my mleap-xgboost does not work properly

@siyouhe666
Copy link

@elgalu @siyouhe666, I have been using spark 2.2.1 and the config --packages ml.combust.mleap:mleap-spark_2.11:0.11.0. It seems work for me

Btw, I also follow this (blog)[https://medium.com/@bogdan.cojocar/pyspark-and-xgboost-integration-tested-on-the-kaggle-titanic-dataset-4e75a568bdb] to fix xgboost dependency because somehow my mleap-xgboost does not work properly

Thanks I have solved this problem through changed my spark version to 2.4.0,
Btw, alough the official docs of mleap said they hadn't support for 2.4.0, but i found it works well

@yairdata
Copy link

i am using python 3.6 , pyspark 2.3.1.
used mleap-core_2.11-0.11.0.jar
mleap-spark-base_2.11-0.13.0.jar
mleap-runtime_2.11-0.13.0.jar
tried 2 approaches to have pyspark know mleap:
git clone and use:
sys.path.append('C:\my-mleap\mleap-master\python')
and also by:
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars ....'
or to pip install mleap (installs version 0.8.1)

when calling
model.serializeToBundle(model_file_path, sparkTransformed)
i get:
Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM

@SoloBean
Copy link

SoloBean commented Mar 6, 2019

@yairdata I am same to you, have you find an answer? Thanks a lot.

@SoloBean
Copy link

SoloBean commented Mar 6, 2019

@yairdata I solved this problem by adjust the version of MLeap. Original I used 0.13.0, now I uesd 0.11.0, but raise another problem:
Py4JJavaError: An error occurred while calling o126.serializeToBundle.
: java.lang.NoClassDefFoundError: com/typesafe/config/ConfigFactory
at org.apache.spark.ml.bundle.SparkBundleContext$.apply(SparkBundleContext.scala:37)
at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext$lzycompute(SparkBundleContext.scala:31)
at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext(SparkBundleContext.scala:31)
at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$1.apply(SimpleSparkSerializer.scala:22)
at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$1.apply(SimpleSparkSerializer.scala:22)
at scala.Option.map(Option.scala:146)
at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:22)
at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.typesafe.config.ConfigFactory

@yairdata
Copy link

yairdata commented Mar 6, 2019

@SoloBean - i have solved this problem with 0.13.0 by specifying spark.jars.packages to point to ml.combust.mleap:mleap-spark-base_2.11:0.13.0,ml.combust.mleap:mleap-spark_2.11:0.13.0 .
now i have other issue of missing jars , but this happens because i am behind a firewall , when i am without firewall everything is working +- as expected (i was able to export the model , but to a directory and not to a jar file as mentioned in the documentation)

@SoloBean
Copy link

SoloBean commented Mar 6, 2019

@yairdata - I also solved this problem by add jars to /jars, but after I add all jars as I know, there raise another problem that I don't know how to solved this by add jar:

Py4JJavaError: An error occurred while calling o126.serializeToBundle. : java.lang.NoClassDefFoundError: resource/package$ at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:25) at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: resource.package$ at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 13 more

@yairdata
Copy link

yairdata commented Mar 6, 2019

@SoloBean - i think there is some issue open for that about dependency conflict, not sure.

@yairdata
Copy link

yairdata commented Mar 6, 2019

weird jar dependency issue:
i have com.trueaccord.scalapb:scalapb-runtime_2.11:0.6.7 in spark.jars.packages
i see that it has the GeneratedEnum class in it, but i still get the below error.
tried also to put the jar in the pyspark jars directory.
also i have put the lenses jar that is mentioned above in the classpath.
is there any other jar dependency that is hidden and not referenced as a dependency in the maven dependency tree ?
the error:
Py4JJavaError: An error occurred while calling o438.serializeToBundle. : java.lang.NoClassDefFoundError: scalapb/GeneratedEnum at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: scalapb.GeneratedEnum at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 24 more

@nikhilshekhar
Copy link

Took some time to figure it out, and hence putting the steps to resolve it below.

  1. The issue as reported by @drkmd8 is seen when the java class in question cannot be accessed. Happens when all the relevant jars are not provided on the classpath. Can be resolved by passing in the JAR's via --jars args or placing it on classpath
  2. Once, the above issue is resolved, one can still hit the issue pointed out by @yairdata. This happens because the JVM is unable to initialise the class. This happens because the location being looked into to instantiate the class is messed up.

The most straightforward way to circumvent both the above issues is to invoke pyspark via the below:
pyspark --packages ml.combust.mleap:mleap-spark_2.11:0.11.0
The mleap version in the above can be chosen according to the compatibility matrix - https://github.com/combust/mleap#mleapspark-version
If the above download of packages fail on some particular jar fetch, that can be manually downloaded and placed in corresponding .m2 directory and the command should be re-run. All should be good then.

This issue does not seem to be an issue and can be closed by admins. But I wonder, as to the reason of not publishing newer versions of mleap to PyPy.

@xiangninglyu
Copy link

@SoloBean Meeting with the same resource/package error as you, did you find the solution? I got:

py4j.protocol.Py4JJavaError: An error occurred while calling o103.serializeToBundle.
: java.lang.NoClassDefFoundError: resource/package$
	at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:25)
	at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: resource.package$
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 13 more

@sealzjh
Copy link

sealzjh commented Apr 17, 2019

I have the same issue and add jars then:
File "/Users/alan/local/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1598, in getattr
py4j.protocol.Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM

My version:
Python 2.7.10
pyspark-2.4.0
spark-2.4.0-bin-hadoop2.7
jar:
mleap-base_2.11-0.13.0.jar
mleap-core_2.11-0.1.5.jar
mleap-executor_2.11-0.13.0.jar
mleap-runtime_2.11-0.13.0.jar
mleap-spark-base_2.11-0.13.0.jar
mleap-spark-testkit_2.11-0.13.0.jar
mleap-spark_2.11-0.13.0.jar
mleap-tensor_2.11-0.13.0.jar

@itsmesrds
Copy link

Hello @hollinwilkins.

Even i have the Same Issue and I added jars and i am also facing same issue

ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM

Python 3
pyspark-2.4.0
spark-2.4.0-bin-hadoop2.7
jar:
mleap-base_2.11-0.13.0.jar
mleap-core_2.11-0.1.5.jar
mleap-executor_2.11-0.13.0.jar
mleap-runtime_2.11-0.13.0.jar
mleap-spark-base_2.11-0.13.0.jar
mleap-spark-testkit_2.11-0.13.0.jar
mleap-spark_2.11-0.13.0.jar
mleap-tensor_2.11-0.13.0.jar

Please Help me out of this

Thanks

@yairdata
Copy link

works with mleap 0.13.0 version
i verified that using all the following jars when submitting command:
spark-submit --master yarn --jars {jar list (each jar has to have the full path!)} my_python.py

com.github.rwl#jtransforms;2.4.0 from central in [default]
com.google.protobuf#protobuf-java;3.5.1 from central in [default]
com.jsuereth#scala-arm_2.11;2.0 from central in [default]
com.lihaoyi#fastparse-utils_2.11;1.0.0 from central in [default]
com.lihaoyi#fastparse_2.11;1.0.0 from central in [default]
com.lihaoyi#sourcecode_2.11;0.1.4 from central in [default]
com.thesamet.scalapb#lenses_2.11;0.7.0-test2 from central in [default]
com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 from central in [default]
com.typesafe#config;1.3.0 from central in [default]
io.spray#spray-json_2.11;1.3.2 from central in [default]
ml.combust.bundle#bundle-hdfs_2.11;0.13.0 from central in [default]
ml.combust.bundle#bundle-ml_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-base_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-core_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-runtime_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-spark-base_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-spark_2.11;0.13.0 from central in [default]
ml.combust.mleap#mleap-tensor_2.11;0.13.0 from central in [default]
org.scala-lang#scala-reflect;2.11.8 from central in [default]

@y-tee
Copy link

y-tee commented Nov 25, 2019

hi @yairdata how did you manage to find out which version of the jar file is the compatible one?
Anyone used mleap 0.15.0 yet?

@yairdata
Copy link

@y-tee - alot of trial & error ...i wish it was documented somewhere...since it wasn't i pasted it here to help others.

@y-tee
Copy link

y-tee commented Nov 25, 2019

@yairdata did you try all the versions 😱
Then i should prolly downgrade my mleap to 0.13.0, it works if i just change the github version.py to 0.13.0 instead of default (0.15.0) now since pip will give u a super old version?

@yairdata
Copy link

@y-tee not all versions , there are compatible jar versions , but not all of them are listed as dependencies, so this is trial & error.
regarding newer mleap versions - didn't try them because i am using older spark version (2.3.1) that is compatible with mleap v0.13.0

@ancasarb
Copy link
Member

I've release the python mleap version 0.15.0 just today, fyi https://pypi.org/project/mleap/#history, please let me know if you see any issues.

@RuxuePeng
Copy link

My mleap is 0.15.0, and Spark is 2.4.4, I'm having this issue again.
Code:

pipeline = pipeline.fit(feature_df)
predictions = pipeline.transform(feature_df)
model_local_path = "something"
model_path = "jar:file:" + model_local_path + "/model.zip"
pipeline.serializeToBundle(model_path, predictions)

Error:
Encoutnered error: 'PipelineModel' object has no attribute 'serializeToBundle'"

"Encoutnered error: An error occurred while calling o1480.serializeToBundle.
: java.lang.ExceptionInInitializerError
	at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$1.apply(SimpleSparkSerializer.scala:22)
	at ml.combust.mleap.spark.SimpleSparkSerializer$$anonfun$1.apply(SimpleSparkSerializer.scala:22)
	at scala.Option.map(Option.scala:146)
	at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundleWithFormat(SimpleSparkSerializer.scala:22)
	at ml.combust.mleap.spark.SimpleSparkSerializer.serializeToBundle(SimpleSparkSerializer.scala:17)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: unsupported Spark version: 2.4.4
	at org.apache.spark.ml.bundle.SparkBundleContext$.<init>(SparkBundleContext.scala:27)
	at org.apache.spark.ml.bundle.SparkBundleContext$.<clinit>(SparkBundleContext.scala)

@felixgao
Copy link

I am also have problem with mleap 0.15.0 and Spark 2.4.4.
basically running the code in https://github.com/combust/mleap-demo/blob/master/notebooks/PySpark%20-%20AirBnb.ipynb

pyspark
[I 17:34:36.883 NotebookApp] Loading IPython parallel extension
...
[W 17:34:53.685 NotebookApp] 404 GET /nbextensions/nbextensions_configurator/config_menu/main.js?v=20200212173436 (::1) 7.00ms referer=http://localhost:8888/notebooks/MLeap.ipynb
[I 17:34:54.021 NotebookApp] Kernel started: 56fef487-0dea-47ba-8ad3-8c19241c1193
[W 17:34:54.175 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20200212173436 (::1) 2.67ms referer=http://localhost:8888/notebooks/MLeap.ipynb
Ivy Default Cache set to: /Users/ggao/.ivy2/cache
The jars for the packages stored in: /Users/ggao/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/Cellar/apache-spark/2.4.4/libexec/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-avro_2.11 added as a dependency
ml.combust.mleap#mleap-spark_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-dbeefc3f-8e12-443d-8629-8adf19670d42;1.0
	confs: [default]
	found org.apache.spark#spark-avro_2.11;2.4.4 in central
	found org.spark-project.spark#unused;1.0.0 in local-m2-cache
	found ml.combust.mleap#mleap-spark_2.11;0.15.0 in central
	found ml.combust.mleap#mleap-spark-base_2.11;0.15.0 in central
	found ml.combust.mleap#mleap-runtime_2.11;0.15.0 in central
	found ml.combust.mleap#mleap-core_2.11;0.15.0 in central
	found ml.combust.mleap#mleap-base_2.11;0.15.0 in central
	found ml.combust.mleap#mleap-tensor_2.11;0.15.0 in central
	found io.spray#spray-json_2.11;1.3.2 in central
	found com.github.rwl#jtransforms;2.4.0 in central
	found ml.combust.bundle#bundle-ml_2.11;0.15.0 in central
	found com.google.protobuf#protobuf-java;3.5.1 in central
	found com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 in local-m2-cache
	found com.thesamet.scalapb#lenses_2.11;0.7.0-test2 in local-m2-cache
	found com.lihaoyi#fastparse_2.11;1.0.0 in local-m2-cache
	found com.lihaoyi#fastparse-utils_2.11;1.0.0 in local-m2-cache
	found com.lihaoyi#sourcecode_2.11;0.1.4 in local-m2-cache
	found com.jsuereth#scala-arm_2.11;2.0 in central
	found com.typesafe#config;1.3.0 in local-m2-cache
	found commons-io#commons-io;2.5 in local-m2-cache
	found org.scala-lang#scala-reflect;2.11.8 in local-m2-cache
	found ml.combust.bundle#bundle-hdfs_2.11;0.15.0 in central
:: resolution report :: resolve 547ms :: artifacts dl 16ms
	:: modules in use:
	com.github.rwl#jtransforms;2.4.0 from central in [default]
	com.google.protobuf#protobuf-java;3.5.1 from central in [default]
	com.jsuereth#scala-arm_2.11;2.0 from central in [default]
	com.lihaoyi#fastparse-utils_2.11;1.0.0 from local-m2-cache in [default]
	com.lihaoyi#fastparse_2.11;1.0.0 from local-m2-cache in [default]
	com.lihaoyi#sourcecode_2.11;0.1.4 from local-m2-cache in [default]
	com.thesamet.scalapb#lenses_2.11;0.7.0-test2 from local-m2-cache in [default]
	com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 from local-m2-cache in [default]
	com.typesafe#config;1.3.0 from local-m2-cache in [default]
	commons-io#commons-io;2.5 from local-m2-cache in [default]
	io.spray#spray-json_2.11;1.3.2 from central in [default]
	ml.combust.bundle#bundle-hdfs_2.11;0.15.0 from central in [default]
	ml.combust.bundle#bundle-ml_2.11;0.15.0 from central in [default]
	ml.combust.mleap#mleap-base_2.11;0.15.0 from central in [default]
	ml.combust.mleap#mleap-core_2.11;0.15.0 from central in [default]
	ml.combust.mleap#mleap-runtime_2.11;0.15.0 from central in [default]
	ml.combust.mleap#mleap-spark-base_2.11;0.15.0 from central in [default]
	ml.combust.mleap#mleap-spark_2.11;0.15.0 from central in [default]
	ml.combust.mleap#mleap-tensor_2.11;0.15.0 from central in [default]
	org.apache.spark#spark-avro_2.11;2.4.4 from central in [default]
	org.scala-lang#scala-reflect;2.11.8 from local-m2-cache in [default]
	org.spark-project.spark#unused;1.0.0 from local-m2-cache in [default]
	:: evicted modules:
	com.google.protobuf#protobuf-java;3.5.0 by [com.google.protobuf#protobuf-java;3.5.1] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   23  |   0   |   0   |   1   ||   22  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-dbeefc3f-8e12-443d-8629-8adf19670d42
	confs: [default]
	0 artifacts copied, 22 already retrieved (0kB/15ms)
20/02/12 17:34:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[I 17:34:59.073 NotebookApp] Adapting from protocol version 5.1 (kernel 56fef487-0dea-47ba-8ad3-8c19241c1193) to 5.3 (client).
20/02/12 17:35:36 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
20/02/12 17:36:22 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
20/02/12 17:36:23 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
20/02/12 17:36:23 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
Exception in thread "Thread-4" java.lang.NoClassDefFoundError: ml/combust/bundle/serializer/SerializationFormat
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
	at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
	at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
	at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
	at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: ml.combust.bundle.serializer.SerializationFormat
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 9 more

The error from the notebook

ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
    response = connection.send_command(command)
  File "/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
<ipython-input-18-e6e5bbbb80b2> in <module>()
----> 1 sparkPipelineLr.serializeToBundle(f"jar:file:{root_dir}/out/pyspark.lr.zip", sparkPipelineLr.transform(dataset_imputed))
      2 sparkPipelineLogr.serializeToBundle(f"jar:file:{root_dir}/out/pyspark.logr.zip", dataset=sparkPipelineLogr.transform(dataset_imputed))

/usr/local/lib/python3.7/site-packages/mleap/pyspark/spark_support.py in serializeToBundle(self, path, dataset)
     22 
     23 def serializeToBundle(self, path, dataset=None):
---> 24     serializer = SimpleSparkSerializer()
     25     serializer.serializeToBundle(self, path, dataset=dataset)
     26 

/usr/local/lib/python3.7/site-packages/mleap/pyspark/spark_support.py in __init__(self)
     37     def __init__(self):
     38         super(SimpleSparkSerializer, self).__init__()
---> 39         self._java_obj = _jvm().ml.combust.mleap.spark.SimpleSparkSerializer()
     40 
     41     def serializeToBundle(self, transformer, path, dataset):

/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __getattr__(self, name)
   1596                 answer[proto.CLASS_FQN_START:], self._gateway_client)
   1597         else:
-> 1598             raise Py4JError("{0} does not exist in the JVM".format(new_fqn))
   1599 
   1600 

Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM

@MauricioLins
Copy link

@felixgao have you fixed this problem? I am using the same versions and facing the same problem.

@peterfig
Copy link

I agree with others that this is a tricky dependency problem, not a problem with MLeap per se. Here is how I solved it on my MacBook:

spark-submit --packages ml.combust.mleap:mleap-spark_2.11:0.16.0 my_program.py

My PySpark version is 2.4.5 (see the MLeap Github page for what version of MLeap works with what version of Spark).

When I first ran spark-submit, I got a further error that Spark could not download some additional dependencies:
these can be installed with Maven.

First, brew install maven from the command line.

Then, use maven from the command line to download dependencies. Here are the three I needed:

mvn org.apache.maven.plugins:maven-dependency-plugin:3.1.2:get -Dartifact=org.scala-lang:scala-reflect:2.11.12

mvn org.apache.maven.plugins:maven-dependency-plugin:3.1.2:get -Dartifact=com.google.protobuf:protobuf-java:3.5.1

mvn org.apache.maven.plugins:maven-dependency-plugin:3.1.2:get -Dartifact=com.typesafe:config:1.3.0

If you need different jars, you can find the coordinates by searching mvnrepository.com in your browser.

@prasadpande1990
Copy link

Hi,

I am trying to build an AWS Sagemaker model which includes Spark pipeline model for feature transformation.

When I use mleap inside my docker container for serializing the pipelinemodel I am getting similar exception.

I am not very sure how can I use all these mleap jars into my docker container?

Can anyone help me to get around this?

@gs-alt
Copy link

gs-alt commented Jun 18, 2021

Same issue here running pyspark 2.4.3 and mleap 0.17.0. I tried two things:

Adding all jar files manually to the jars folder in pyspark:

  • mleap-core_2.12-0.17.0.jar
  • mleap-executor_2.12-0.17.0.jar
  • mleap-runtime_2.12-0.17.0.jar
  • mleap-base_2.12-0.17.0.jar
  • mleap-spark-base_2.12-0.17.0.jar
  • mleap-serving_2.12-0.17.0.jar
  • mleap-spark-extension_2.12-0.17.0.jar
  • mleap-spark_2.12-0.17.0.jar
  • mleap-tensor_2.12-0.17.0.jar

And running with the spark submit:

spark-submit --packages ml.combust.mleap:mleap-spark_2.12:0.17.0 main.py

Neither method worked

@z7ye
Copy link

z7ye commented Nov 15, 2021

got the same issue.

When running the code from the tutorial,
fittedPipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip", fittedPipeline.transform(df2))
got the following error.

> ---------------------------------------------------------------------------
> TypeError                                 Traceback (most recent call last)
> /tmp/ipykernel_5527/4288136627.py in <module>
> ----> 1 fittedPipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip", fittedPipeline.transform(df2))
> 
> ~/conda/pyspark30_p37_cpu_v2/lib/python3.7/site-packages/mleap/pyspark/spark_support.py in serializeToBundle(self, path, dataset)
>      22 
>      23 def serializeToBundle(self, path, dataset=None):
> ---> 24     serializer = SimpleSparkSerializer()
>      25     serializer.serializeToBundle(self, path, dataset=dataset)
>      26 
> 
> ~/conda/pyspark30_p37_cpu_v2/lib/python3.7/site-packages/mleap/pyspark/spark_support.py in __init__(self)
>      37     def __init__(self):
>      38         super(SimpleSparkSerializer, self).__init__()
> ---> 39         self._java_obj = _jvm().ml.combust.mleap.spark.SimpleSparkSerializer()
>      40 
>      41     def serializeToBundle(self, transformer, path, dataset):
> 
> TypeError: 'JavaPackage' object is not callable

Any suggestions pls?

also, tried to install mleap from source, followed the instructions, but got this error.

[error] (mleap-core/compile:compileIncremental) javac returned nonzero exit code
[error] Total time: 117 s, completed Nov 14, 2021 11:36:52 PM

@drei34
Copy link

drei34 commented Feb 10, 2023

If you are using mleap 0.21.1 should serializeToBundle work? I am getting an error as below. Is the only option to go down? pyspark is 3.1.3. This is after resolving several other issues.

Py4JError: ml.combust.mleap.spark.SimpleSparkSerializer does not exist in the JVM

I make a spark context like this:

`def gen_spark_session():
return SparkSession.builder.appName("happy").config(
"hive.exec.dynamic.partition", "True").config(
"hive.exec.dynamic.partition.mode", "nonstrict").config(
"spark.jars.packages",
"ml.combust.mleap:mleap-spark_2.12:0.20.0,"
"ml.combust.mleap:mleap-spark-base_2.12:0.20.0"
).enableHiveSupport().getOrCreate()

spark = gen_spark_session()`

UPDATE: I was on Java 8 and apparently 0.21.1 is no good, it needs Java 11. I moved to 0.20.0 But, I still get this issue. I'm on Scala 2.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests