You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to run a workflow through the current version of FACTS (devel branch; workflow configuration is included there). The code runs as expected until it tries to submit the pilot. For some reason, it detects a "keyboardInterrupt" somewhere and closes out of everything. I can assure you that I'm not pressing anything on the keyboard while the code is running.
Below is my call to FACTS and the resulting STDOUT:
(slr_venv3) ggg46@ggg46-vb:~/research/slr_framework/code/facts$ python3 FACTS.py experiments/temp_exp
EnTK session: re.session.ggg46-vb.ggg46.018477.0001
Creating AppManagerSetting up RabbitMQ system ok
ok
Validating and assigning resource manager ok
Setting up RabbitMQ system n/a
new session: [re.session.ggg46-vb.ggg46.018477.0001] \
database : [mongodb://facts:zChLV6Qd5D3JwAYp@129.114.17.185/facts] ok
create pilot manager ok
submit 1 pilot(s)
pilot.0000 local.localhost 2 cores 0 gpus ok
closing session re.session.ggg46-vb.ggg46.018477.0001 \
close pilot manager \
wait for 1 pilot(s)
0 ok
ok
session lifetime: 19.1s ok
wait for 1 pilot(s)
0 timeout
All components terminated
Traceback (most recent call last):
File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/entk/execman/rp/resource_manager.py", line 179, in _submit_resource_request
self._pilot.wait([rp.PMGR_ACTIVE, rp.DONE, rp.FAILED, rp.CANCELED])
File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/pilot/compute_pilot.py", line 536, in wait
time.sleep(0.1)
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/entk/appman/appmanager.py", line 416, in run
self._rmgr._submit_resource_request()
File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/entk/execman/rp/resource_manager.py", line 192, in _submit_resource_request
raise KeyboardInterrupt
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "FACTS.py", line 432, in <module>
run_experiment(args.edir, args.debug, args.no_total)
File "FACTS.py", line 372, in run_experiment
amgr.run()
File "/home/ggg46/slr_venv3/lib/python3.7/site-packages/radical/entk/appman/appmanager.py", line 441, in run
raise KeyboardInterrupt
KeyboardInterrupt
Everything was running as expected from start to finish as recently as 29 July (Wednesday) afternoon. I first noticed the problem on 31 July (Friday). The only change I made was to the workflow configuration file to add additional (tested and working) modules to the workflow. I've also reverted the configuration file back to what it was on 29 July, but the problem persists.
Any insight would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
Hi @ggg121 : the problem was caused by a change in pip command line options (you will find the error in bootstrap_0.out). We recently merged a fix for this (see here), and the latest release (1.5.2) should not have this problem anymore. Can you please give it a try?
Hi @andre-merzky : The 1.5.2 release seems to work for me. I was able to run a complete workflow without incident. I consider this problem solved. If there are no objections, I'll go ahead and close this issue.
I'm trying to run a workflow through the current version of FACTS (devel branch; workflow configuration is included there). The code runs as expected until it tries to submit the pilot. For some reason, it detects a "keyboardInterrupt" somewhere and closes out of everything. I can assure you that I'm not pressing anything on the keyboard while the code is running.
Below is my call to FACTS and the resulting STDOUT:
Below is the result of radical-stack:
Everything was running as expected from start to finish as recently as 29 July (Wednesday) afternoon. I first noticed the problem on 31 July (Friday). The only change I made was to the workflow configuration file to add additional (tested and working) modules to the workflow. I've also reverted the configuration file back to what it was on 29 July, but the problem persists.
Any insight would be greatly appreciated.
The text was updated successfully, but these errors were encountered: