Skip to content

When Attempting to loadSavedModel, I Encountered 'java.lang.Exception: Could Not Retrieve the SavedModelBundle + () #14215

Open
@TerryLaw535

Description

Is there an existing issue for this?

  • I have searched the existing issues and did not find a match.

Who can help?

No response

What are you working on?

I fine-tuned a T5 model from Hugging Face and wanted to import it into Spark NLP.

Current Behavior

I followed the instructions from [https://github.com/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/HuggingFace%20in%20Spark%20NLP%20-%20BERT.ipynb], and as I tried to loadSavedModel in Spark NLP, it reported "Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.annotators.seq2seq.BartTransformer.loadSavedModel: java.lang.Exception: Could not retrieve the SavedModelBundle + ()". Afterward, I attempted to import several other models like t5-base and bart from Hugging Face, and I encountered the same problem.

Expected Behavior

I hope that my fine-tuned T5 model can be successfully loaded in the spark nlp. Or if possible, could anyone make a similar instruction that how to import seq2seq model such as T5 and Bart from huggingface to sparknlp?

Steps To Reproduce

MODEL_NAME = 'google-t5/t5-small'

tokenizer = T5Tokenizer.from_pretrained(MODEL_NAME)
tokenizer.save_pretrained('./{}_tokenizer/'.format(MODEL_NAME))

from transformers import TFAutoModelForSeq2SeqLM, TFT5ForConditionalGeneration
model = TFT5ForConditionalGeneration.from_pretrained(MODEL_NAME)

@tf. function(
input_signature=(
tf.TensorSpec(name="input_ids", shape=(None, None), dtype=tf.int32),
tf.TensorSpec(name="attention_mask", shape=(None, None), dtype=tf.int32)
),
jit_compile=False,
)

def serving_fn(input_ids, attention_mask):
outputs = model.generate(
input_ids=input_ids,
attention_mask=attention_mask,
max_length=512,
min_length=10,
# length_penalty=0.9,
# repetition_penalty=2.0,
# num_beams=4,
# early_stopping=True,
return_dict_in_generate=True,
)
return {"sequences": outputs["sequences"]}

model.save_pretrained("./{}".format(MODEL_NAME), saved_model=True, signatures={'serving_default':serving_fn})

asset_path = '{}/saved_model/1/assets'.format(MODEL_NAME)
!cp {MODEL_NAME}_tokenizer/spiece.model {asset_path}
labels = model.config.id2label
labels = [value for key, value in sorted(labels.items(), reverse=False)]
with open(asset_path+'/labels.txt', 'w') as f:
f.write('\n'.join(labels))

spark = sparknlp.start()
from sparknlp.annotator import *
T5 = T5Transformer.loadSavedModel('{}/saved_model/1'.format(MODEL_NAME), spark)

Spark NLP version and Apache Spark

spark nlp version: '5.3.2'
spark version: '3.5.1'
tensorflow version: '2.15.0'
transformer version: '4.39.1'

Type of Spark Application

Python Application

Java Version

No response

Java Home Directory

No response

Setup and installation

pip

Operating System and Version

Lunix

Link to your project (if available)

No response

Additional Information

No response

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions