Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"KeyboardInterrupt" detected when submitting pilot #28

Closed
ggg121 opened this issue Aug 3, 2020 · 5 comments
Closed

"KeyboardInterrupt" detected when submitting pilot #28

ggg121 opened this issue Aug 3, 2020 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@ggg121
Copy link
Contributor

ggg121 commented Aug 3, 2020

I'm trying to run a workflow through the current version of FACTS (devel branch; workflow configuration is included there). The code runs as expected until it tries to submit the pilot. For some reason, it detects a "keyboardInterrupt" somewhere and closes out of everything. I can assure you that I'm not pressing anything on the keyboard while the code is running.

Below is my call to FACTS and the resulting STDOUT:

(slr_venv3) ggg46@ggg46-vb:~/research/slr_framework/code/facts$ python3 FACTS.py experiments/temp_exp
EnTK session: re.session.ggg46-vb.ggg46.018477.0001
Creating AppManagerSetting up RabbitMQ system                                 ok
                                                                              ok
Validating and assigning resource manager                                     ok
Setting up RabbitMQ system                                                   n/a
new session: [re.session.ggg46-vb.ggg46.018477.0001]                           \
database   : [mongodb://facts:zChLV6Qd5D3JwAYp@129.114.17.185/facts]          ok
create pilot manager                                                          ok
submit 1 pilot(s)
        pilot.0000   local.localhost           2 cores       0 gpus           ok
closing session re.session.ggg46-vb.ggg46.018477.0001                          \
close pilot manager                                                            \
wait for 1 pilot(s)
              0                                                               ok
                                                                              ok
session lifetime: 19.1s                                                       ok
wait for 1 pilot(s)
              0                                                          timeout
All components terminated
Traceback (most recent call last):
  File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/entk/execman/rp/resource_manager.py", line 179, in _submit_resource_request
    self._pilot.wait([rp.PMGR_ACTIVE, rp.DONE, rp.FAILED, rp.CANCELED])
  File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/pilot/compute_pilot.py", line 536, in wait
    time.sleep(0.1)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/entk/appman/appmanager.py", line 416, in run
    self._rmgr._submit_resource_request()
  File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/entk/execman/rp/resource_manager.py", line 192, in _submit_resource_request
    raise KeyboardInterrupt
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "FACTS.py", line 432, in <module>
    run_experiment(args.edir, args.debug, args.no_total)
  File "FACTS.py", line 372, in run_experiment
    amgr.run()
  File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/entk/appman/appmanager.py", line 441, in run
    raise KeyboardInterrupt
KeyboardInterrupt

Below is the result of radical-stack:

(slr_venv3) ggg46@ggg46-vb:~/research/slr_framework/code/facts$ radical-stack

  python               : 3.7.5
  pythonpath           : /home/ggg46/pylibs/ssht
  virtualenv           : /home/ggg46/slr_venv3

  radical.entk         : 1.4.1.post1
  radical.pilot        : 1.4.1
  radical.saga         : 1.4.0
  radical.utils        : 1.4.0

Everything was running as expected from start to finish as recently as 29 July (Wednesday) afternoon. I first noticed the problem on 31 July (Friday). The only change I made was to the workflow configuration file to add additional (tested and working) modules to the workflow. I've also reverted the configuration file back to what it was on 29 July, but the problem persists.

Any insight would be greatly appreciated.

@ggg121
Copy link
Contributor Author

ggg121 commented Aug 3, 2020

Additional note: The same problem occurs when running locally or on a remote machine.

@andre-merzky
Copy link
Contributor

Can you please attach the client sandbox? Thanks!

@ggg121
Copy link
Contributor Author

ggg121 commented Aug 6, 2020

Sorry for the delay. Attached is the sandbox (I think). Please let me know if I got the wrong/incomplete items.
keyboardInterrupt_sandbox.zip

@andre-merzky
Copy link
Contributor

Hi @ggg121 : the problem was caused by a change in pip command line options (you will find the error in bootstrap_0.out). We recently merged a fix for this (see here), and the latest release (1.5.2) should not have this problem anymore. Can you please give it a try?

@andre-merzky andre-merzky added the bug Something isn't working label Aug 6, 2020
@ggg121
Copy link
Contributor Author

ggg121 commented Aug 6, 2020

Hi @andre-merzky : The 1.5.2 release seems to work for me. I was able to run a complete workflow without incident. I consider this problem solved. If there are no objections, I'll go ahead and close this issue.

Thank you again for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants