-
Notifications
You must be signed in to change notification settings - Fork 19.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update TensorFlow as 1.9 on Travis #10674
Conversation
The time-out error with Python 3.6 still occurs... |
What can we do debug it? It would be important for us to upgrade. |
Probably the best thing to do is to attempt to bisect the test codebase until we find which tests are linked to the timeout. |
The issue you started for convenience: #10100 We could try to run the docker locally. As you said, it may be related to the RAM. I'll try to play around with jobs. Info: https://docs.travis-ci.com/user/reference/overview/ NOTE: SUCCESS HERE DOESN'T MEAN IT WORKS DON'T MERGE THIS |
This seems related to multiprocessing.Manager not being started. All the tests that were blocked used fit_generator with generator. (Which uses a Manager). @fchollet could we get someone from TF on this as well? This may be GIL related. EDIT: RAM could cause this as well and this is probably one of the reason. Still digging! |
Thanks for the analysis @Dref360. I will see if anyone on the TF team can understand what's going on. |
Okay so after a lot of work, multiprocessing seems to be the cause. Quick fix so that we update. Create a new job that will only do those 2? I'm not sure why, but it requires a lot of memory to start processes for generators. We may want to update this code to a Pool (similar to Sequence). Note:
|
@Dref360, I agree with you. The multiprocessing tests seems to be the main reason for insufficient memory errors. The additional commit, disabling the tests, can make the CI success on TF 1.9. I have a question. I tested different backends on Python 2 and 3 with your script in #10756. Both Theano and CNTK showed stable memory usage, but TensorFlow did quite random patterns. For example, TF 1.3 often seems to require more memory than TF 1.9. I want to understand clearly why there is no problem so far and the problem starts to occur since TF 1.8. |
@taehoonlee should we merge this PR, or is there a risk that not testing |
@fchollet, @Dref360, I did some more testing and found a simple trick. In Over memory usage on TF still exists but it is out of the scope of Keras. If we focus on only Keras, there is no risk that not testing multiprocessing stuffs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, thanks for finding a simple fix!
I did some experiments and it seems that using 'spawn' instead of 'fork' is solving our issue.
|
Summary
This PR updates TensorFlow as 1.9 on Travis.
Related Issues
PR Overview