-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting error RuntimeError: unexpected EOF, expected 5253807 more bytes. The file might be corrupted #74
Comments
This string might be too long and not being formatted properly when copy pasted. Add a |
Thank you for response. I tried to short the context be:
However the error is still same like this: Traceback (most recent call last): Do I have any problems? |
Is this the actual snippet? If so, the last 2 lines (with the triple dots and the EOF) shouldn't be there! |
No, this is not actual snippet. Here the actual snippet:
I also have tried to use the real dataset of dev-v1.1.json from squad1.1 for training. However it results same error. What did happened? |
Can you try one of the other examples like the classification example? If you get a similar error, I think your pytorch or python installation is corrupted. |
Even I comment all of the codes excepts: from simpletransformers.question_answering import QuestionAnsweringModel Create the QuestionAnsweringModelmodel = QuestionAnsweringModel('distilbert', 'distilbert-base-uncased-distilled-squad', args={'reprocess_input_data': True, 'overwrite_output_dir': True}) It results same error. Any suggestion for this issue? |
I have tried to perform Minimal Start for Binary Classification, and this runned well. FYI, I used PyTorch version for cudatoolkit=9.0 by installing with this script: conda install pytorch torchvision cudatoolkit=9.0 -c pytorch Could you know what was it occured for minimal example in QuestionAnswering model? |
I don't think it's a cuda issue if the classification is working and only QA is having issues. But it's hard to completely rule it out. The error trace points to an issue in a torch file. Can you try reinstalling torch with a new environment? Use the latest version of torch from their website. |
I have reinstalled new pytorch version in a new environment at conda by using: conda install pytorch cudatoolkit=10.1 -c pytorch However, it results a similar error: Traceback (most recent call last): Aborted Any other suggestion? |
Could you tell what size of distilbert-base-uncased-distilled-squad model is? |
You can find all the model details here. Try it with another model. I think something went wrong with the model download.
|
I have tried to use another model: model = QuestionAnsweringModel('bert', 'bert-base-uncased', args={'reprocess_input_data': True, 'overwrite_output_dir': True}) However, it still results same error. |
Try that. |
Done. And then, what's the next? |
Did it download the model successfully? If so, try this now.
|
Done. I didn't see any error messages. It is okay, isn't it?. If yes, What's the next? By the way, where is the downloaded model stored with this script? DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', force_download=True) |
If you don't see any error messages now, that means the issue was with the models not being downloaded properly. The downloaded models should be in You'll have to run the script with the |
Okay. However, when I re-tried to perform completely the code for minimal question answering. It still results same error. Should I modify for any codes? |
This error like below: (simpletransformers) yani@riset-3x-1080-2:~/projects/latihan/bert/sources/simpletransformers$ python example_qa.py Aborted |
Yeah, you should try with
|
Okay. Now I got this error when I re-tried to perform step above and then run completely the code: Traceback (most recent call last): |
Try this. It's the same minimal example, except with the slash added.
|
Yeaaah.... It works. Thank you very much. |
Sorry I have one question. Is it possible if I use a real dataset like train-v1.1.json from squad1.1? If yes, why when I tried it to perform that code above appears error like this: Traceback (most recent call last): |
Okay. I tried this code: from simpletransformers.question_answering import QuestionAnsweringModel with open('/home/yani/projects/latihan/bert/data/squad1.1/train-v1.1.json', 'r') as f: train_data = [item for topic in train_data['data'] for item in topic['paragraphs'] ] os.makedirs('data', exist_ok=True) model = QuestionAnsweringModel('distilbert', 'distilbert-base-uncased-distilled-squad', args={'reprocess_input_data': True, 'overwrite_output_dir': True}) model.train_model('data/train.json') print(result) print('-------------------') to_predict = [{'context': 'If bidirectionality is so powerful, why hasn’t it been done before? To understand why, consider that unidirectional models are efficiently trained by predicting each word conditioned on the previous words in the sentence', 'qas': [{'question': 'What are bidirectional usage of BERT?', 'id': '0'}]}] print(model.predict(to_predict)) And it results this: What was happened for is_impossible = qa["is_impossible"] ? |
Squad 2.0 has an additional attribute |
I have changed dataset to squad2.0. It results: Traceback (most recent call last): What should I modify? |
It happened when performing: result, text = model.eval_model('data/dev.json') |
I actually have not understood what you explain in https://towardsdatascience.com/question-answering-with-bert-xlnet-xlm-and-distilbert-using-simple-transformers-4d8785ee762a especially in your explanation below: " Does it mean squad2.0 could be evaluated with the manual splitting the train-v20.json into two (train and dev data)? If so, why when I tried to evaluate using the same file with the train data (train-v2.0.json) it results the error above? Hopefully, I could get the response about this. Thank you. |
Yes, that is what it means. However, the guide also says that SQuAD data needs to be converted into a format that is compatible with Simple Transformers. I see you closed the issue so I hope you fixed it. If not, try the following.
|
Thank you for response. I have tried the code above, but still results this error: Traceback (most recent call last): |
When I tried to perform the minimal example of QuestionAnswering, I found an error like this:
Traceback (most recent call last):
File "example.py", line 60, in
model = QuestionAnsweringModel('distilbert', 'distilbert-base-uncased-distilled-squad', args={'reprocess_input_data': True, 'overwrite_output_dir': True})
File "/home/yani/projects/latihan/bert/sources/simpletransformers/simpletransformers/question_answering/question_answering_model.py", line 73, in init
self.model = model_class.from_pretrained(model_name)
File "/home/yani/anaconda3/envs/simpletransformers/lib/python3.7/site-packages/transform
ers/modeling_utils.py", line 395, in from_pretrained
state_dict = torch.load(resolved_archive_file, map_location='cpu')
File "/home/yani/anaconda3/envs/simpletransformers/lib/python3.7/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/yani/anaconda3/envs/simpletransformers/lib/python3.7/site-packages/torch/serialization.py", line 581, in _load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: unexpected EOF, expected 5253807 more bytes. The file might be corrupted.
I use GPU 8GB to perform that.
Anyone knows why was it happened?
The text was updated successfully, but these errors were encountered: