Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix serialization bug #1188

Merged
merged 10 commits into from
Jan 21, 2019
Merged

Fix serialization bug #1188

merged 10 commits into from
Jan 21, 2019

Conversation

flauted
Copy link
Contributor

@flauted flauted commented Jan 19, 2019

This should solve the same problem as #991 . Tested against Python 2.7 and Python 3.7.

@flauted
Copy link
Contributor Author

flauted commented Jan 20, 2019

@vince62s could you review please? The diff should be straightforward.

To go into some detail: I don't know a lot about Pickle, but forwarding __reduce_ex__(proto) to super().__reduce_ex__(proto) doesn't seem to work (which to me anyways is counterintuitive since the Python 2.7, 3.6, and 3.7 docs indicate that's the "correct" signature) and not having any __reduce* method doesn't seem to work. Noting that super(DatasetBase) doesn't even define __reduce_ex__, I thought that just removing it altogether would work. Again, nope. By trial and error - partially inspired by nothing seeming to depend on the protocol version - I found that forwarding to __reduce__() works in both Python 2.7 and 3.7, and the CI tests indicate that it works in other versions.

I didn't mean to include the change in preprocess but I still feel like it's valuable. If there's some non-obvious reason to change that back to sys.stderr and sys.exit, I can revert. Originally I tried to do a FileExistsError but that broke Python 2. That's why I went with a custom exception.

@vince62s
Copy link
Member

Well, as you say, this def __reduce_ex__(self, proto) should not be there at all.
For Python 3.5 3.6 it works without it => what about 3.7 ? (I d'ont have it installed)
But it crashes on Python2.7 see this pytorch/pytorch#14893

If you fix makes it work in all versions, we could go with it but I need full confirmation, not only Travis.

For the second point, it would be better in another PR but what is the point excatly ?

@vince62s
Copy link
Member

vince62s commented Jan 21, 2019

As a matter of fact I just tested your fix, it does not work on Python2.7.12

[2019-01-21 10:36:02,901 INFO]  * tgt vocab size: 20929.
[2019-01-21 10:36:02,925 INFO]  * src vocab size: 15824.
[2019-01-21 10:36:02,925 INFO]  * merging src and tgt vocab...
[2019-01-21 10:36:03,042 INFO]  * merged vocab size: 29413.
Traceback (most recent call last):
  File "preprocess.py", line 177, in <module>
    main()
  File "preprocess.py", line 173, in main
    build_save_vocab(train_dataset_files, fields, opt)
  File "preprocess.py", line 118, in build_save_vocab
    torch.save(fields, vocab_path)
  File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 218, in save
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 143, in _with_file_like
    return body(f)
  File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 218, in <lambda>
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 291, in _save
    pickler.dump(obj)
cPickle.PicklingError: Can't pickle torch.int64: attribute lookup __main__.torch.int64 failed
Traceback (most recent call last):

@flauted
Copy link
Contributor Author

flauted commented Jan 21, 2019

Oh that's not good. Let me see if I can reproduce. Are you seeing that error when you run pull_request_chk.sh?

On the second point, I felt it was cleaner but you're right that it doesn't belong here. I reverted it.

``torchtext.Dataset.__getattr__`` is a generator. That doesn't
play well with pickle. Returning a generator (when appropriate)
seems to fix the issue without changing API.
@flauted
Copy link
Contributor Author

flauted commented Jan 21, 2019

I wasn't able to reproduce your error, but...

Well, as you say, this def __reduce_ex__(self, proto) should not be there at all.

Then it should be removed. Simply commenting it out breaks Python 2 (see the end).

To debug, I altered DatasetBase.save: torch.save(self, path, pickle_module=pickle) (the default for Py2 is cPickle, but the traceback is not helpful - again, see the end). Then I got the following traceback:

Traceback (most recent call last):
  File "/home/dylan/code/OpenNMT-py/./preprocess.py", line 177, in <module>
    main()
  File "/home/dylan/code/OpenNMT-py/./preprocess.py", line 167, in main
    train_dataset_files = build_save_dataset('train', fields, opt)
  File "/home/dylan/code/OpenNMT-py/./preprocess.py", line 100, in build_save_dataset
    dataset.save(data_path)
  File "/home/dylan/code/OpenNMT-py/onmt/inputters/dataset_base.py", line 107, in save
    torch.save(self, path, pickle_module=pickle)
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/site-packages/torch/serialization.py", line 218, in save
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/site-packages/torch/serialization.py", line 143, in _with_file_like
    return body(f)
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/site-packages/torch/serialization.py", line 218, in <lambda>
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/site-packages/torch/serialization.py", line 291, in _save
    pickler.dump(obj)
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/pickle.py", line 306, in save
    rv = reduce(self.proto)
TypeError: 'generator' object is not callable

And drilling down into pickle, we find this: reduce = getattr(obj, "__reduce_ex__", None). That reduce is a generator because torchtext.Dataset.__getattr__ is a generator of getattr over all the examples.

So the short of it is that torchtext.Dataset.__getattr__ isn't written with pickling in mind.

My solution is to override __getattr__ and have it return a generator when appropriate (rather than be a generator).

Now __getstate__, __setstate__ are unnecessary since the default behavior of pickle works (I don't think they were doing anything before actually). And, __reduce_ex__ is unnecessary (originally it was there so that pickle's getattr for __reduce_ex__ would find it instead of fall back to torchtext's __getattr__).

I pushed the change. Travis is passing. The pull_request_chk.sh is passing on Python 2.7.12, 2.7.15, 3.6.8, and 3.7.2. The demo preprocesses and trains on 2.7.12. @vince62s could you please review and check if you still get that error? If you do, would you please post the command so that I can try it myself?


# def __reduce_ex__(self, proto):
#     return super(DatasetBase, self).__reduce_ex__()

Python 2.7.12 FAILED

Python 2.7.15 FAILED

Python 3.6.8 Succeeded

Python 3.7.2 Succeeded

...
======================================================================
ERROR: test_dataset_build_standard (onmt.tests.test_preprocess.TestData)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dylan/code/OpenNMT-py/onmt/tests/test_preprocess.py", line 81, in test_method
    getattr(self, methodname)(opt)
  File "/home/dylan/code/OpenNMT-py/onmt/tests/test_preprocess.py", line 50, in dataset_build
    train_data_files = preprocess.build_save_dataset('train', fields, opt)
  File "/home/dylan/code/OpenNMT-py/preprocess.py", line 100, in build_save_dataset
    dataset.save(data_path)
  File "/home/dylan/code/OpenNMT-py/onmt/inputters/dataset_base.py", line 95, in save
    torch.save(self, path)
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/site-packages/torch/serialization.py", line 218, in save
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/site-packages/torch/serialization.py", line 143, in _with_file_like
    return body(f)
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/site-packages/torch/serialization.py", line 218, in <lambda>
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/dylan/miniconda3/envs/opennmt2712/lib/python2.7/site-packages/torch/serialization.py", line 291, in _save
    pickler.dump(obj)
TypeError: 'generator' object is not callable

----------------------------------------------------------------------
Ran 96 tests in 8.441s

FAILED (errors=46)

@vince62s
Copy link
Member

It looks like a much better fix. Thanks. double checking and merging then.

@AndyELiu
Copy link

This bug still exists!

@vince62s
Copy link
Member

are you on master ? Pytorch 1.0 and torchtext 0.4 ?
please give your command line and python version

@vince62s
Copy link
Member

@flauted can you please check pytorch/pytorch#14893
and anwser there?
I might be mistaken, maybe this is another error and our case was more torchtext related then pytorch itself.
Thanks for your insight.

@flauted flauted mentioned this pull request Jan 26, 2019
@YinongLong
Copy link

Traceback (most recent call last):
  File "preprocess.py", line 221, in <module>
    main(opt)
  File "preprocess.py", line 202, in main
    'train', fields, src_reader, tgt_reader, opt)
  File "preprocess.py", line 145, in build_save_dataset
    torch.save(fields, vocab_path)
  File "/home/aladdin/mms-hpc-devel.dynet2/lib/python2.7/site-packages/torch/serialization.py", line 219, in save
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/aladdin/mms-hpc-devel.dynet2/lib/python2.7/site-packages/torch/serialization.py", line 144, in _with_file_like
    return body(f)
  File "/home/aladdin/mms-hpc-devel.dynet2/lib/python2.7/site-packages/torch/serialization.py", line 219, in <lambda>
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/home/aladdin/mms-hpc-devel.dynet2/lib/python2.7/site-packages/torch/serialization.py", line 292, in _save
    pickler.dump(obj)
cPickle.PicklingError: Can't pickle torch.int64: attribute lookup __main__.torch.int64 failed

dude, this bug still exists in master branch!!!
python and torch versions are 2.7.11 and 1.0.1 respectively.

@vince62s
Copy link
Member

vince62s commented Sep 6, 2019

upgrade pytorch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants