Skip to content

cannot save pipeline model #13813

@cometta

Description

@cometta

Is there an existing issue for this?

  • I have searched the existing issues and did not find a match.

Who can help?

as from discussion thread #13812 ,

spark config used

        "spark.sql.warehouse.dir" : "s3a://bucket-example/nlp",
        "spark.hadoop.fs.s3a.access.key":"<masked>",
        "spark.hadoop.fs.s3a.secret.key": "<masked>",
        "spark.serializer": "org.apache.spark.serializer.KryoSerializer", 
        "spark.kryoserializer.buffer.max": "2000M",
        "spark.driver.maxResultSize": "0",
        "spark.kubernetes.container.image": "tested on pyspaark 3.3 and 3.4",
        "spark.kubernetes.container.image.pullPolicy" : "Always",
        "spark.jsl.settings.pretrained.cache_folder": "/opt/spark/work-dir",
        "spark.kubernetes.driver.volumes.persistentVolumeClaim.lighter-sparknlptest-pvc.options.claimName": "lighter-sparknlptest-pvc",
        "spark.kubernetes.driver.volumes.persistentVolumeClaim.lighter-sparknlptest-pvc.mount.path": "/opt/spark/work-dir",
        "spark.kubernetes.executor.volumes.persistentVolumeClaim.lighter-sparknlptest-pvc.options.claimName": "lighter-sparknlptest-pvc",
        "spark.kubernetes.executor.volumes.persistentVolumeClaim.lighter-sparknlptest-pvc.mount.path": "/opt/spark/work-dir",
        "spark.jsl.settings.annotator.log_folder": "/opt/spark/work-dir/logs"

when I save model to PVC, no issue

model.write().overwrite().save('/path_to_pvc/test_model_greview_bert')

but when i save to s3a

model.write().overwrite().save("s3a://bucket-example/nlp/models/greview_bert")

I get below error

please note, if i use pyspark without sparkNLP, no issue saving/loading dataframe into s3a

An error was encountered:
Py4JJavaError
[Traceback (most recent call last):
,   File "/tmp/spark-5ad2d697-515d-43d8-82da-cbc35328adcb/shell_wrapper.py", line 113, in exec
    self._exec_then_eval(code)
,   File "/tmp/spark-5ad2d697-515d-43d8-82da-cbc35328adcb/shell_wrapper.py", line 106, in _exec_then_eval
    exec(compile(last, '<string>', 'single'), self.globals)
,   File "<string>", line 2, in <module>
,   File "/opt/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 197, in save
    self._jwrite.save(path)
,   File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
,   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
    return f(*a, **kw)
,   File "/opt/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
, py4j.protocol.Py4JJavaError: An error occurred while calling o427.save.
: org.apache.hadoop.fs.PathIOException: `Cannot get relative path for URI:file:///tmp/1ceaa5db4f81_bert_sentence4029999211103636104/bert_sentence_tensorflow': Input/output error
	at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.getFinalPath(CopyFromLocalOperation.java:360)
	at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.uploadSourceFromFS(CopyFromLocalOperation.java:222)
	at org.apache.hadoop.fs.s3a.impl.CopyFromLocalOperation.execute(CopyFromLocalOperation.java:169)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$copyFromLocalFile$25(S3AFileSystem.java:3920)
	at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
	at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2356)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.copyFromLocalFile(S3AFileSystem.java:3913)
	at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2448)
	at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2411)
	at com.johnsnowlabs.ml.tensorflow.WriteTensorflowModel.writeTensorflowModelV2(TensorflowSerializeModel.scala:85)
	at com.johnsnowlabs.ml.tensorflow.WriteTensorflowModel.writeTensorflowModelV2$(TensorflowSerializeModel.scala:61)
	at com.johnsnowlabs.nlp.embeddings.BertSentenceEmbeddings.writeTensorflowModelV2(BertSentenceEmbeddings.scala:151)
	at com.johnsnowlabs.nlp.embeddings.BertSentenceEmbeddings.onWrite(BertSentenceEmbeddings.scala:399)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesWritable.$anonfun$write$1(ParamsAndFeaturesWritable.scala:51)
	at com.johnsnowlabs.nlp.ParamsAndFeaturesWritable.$anonfun$write$1$adapted(ParamsAndFeaturesWritable.scala:51)
	at com.johnsnowlabs.nlp.FeaturesWriter.saveImpl(ParamsAndFeaturesWritable.scala:38)
	at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Unknown Source)

]

What are you working on?

Current Behavior

Expected Behavior

Steps To Reproduce

Spark NLP version and Apache Spark

spark-nlp==4.3.0
https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.3.0.jar

Type of Spark Application

Python Application

Java Version

No response

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

No response

Link to your project (if available)

No response

Additional Information

No response

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions