Open
Description
Quite often I start a sorting job with spykingcircus2 and end up with a lot of "defunct" processes. The output of ps command looks like this:
PID TTY STAT TIME COMMAND
3213 ? Ss 0:00 /anaconda/envs/azureml_py38/bin/python /usr/local/bin/EDAT_Engine/engine.p
3216 ? Ssl 139:56 /anaconda/envs/jupyter_env/bin/python3.10 /anaconda/envs/jupyter_env/bin/j
4808 pts/0 Ss+ 0:00 /bin/bash -l
5337 pts/0 Sl 0:48 pythonautomate_sorting.py --sorter spykingcircus2
28372 pts/0 Sl 13:11 python hpoptuna.py --sorter spykingcircus2 --dataset hgt19 --ma
29653 pts/0 S 0:00 /anaconda/envs/boc_minimal/bin/python -c from multiprocessing.resource_tra
30090 pts/0 S 0:00 python hpoptuna.py --sorter spykingcircus2 --dataset hgt19 --ma
30092 pts/0 Z 1:24 [python] <defunct>
30094 pts/0 Z 1:22 [python] <defunct>
30096 pts/0 Z 1:21 [python] <defunct>
30098 pts/0 Z 1:22 [python] <defunct>
30100 pts/0 Z 1:22 [python] <defunct>
30102 pts/0 Z 1:22 [python] <defunct>
30104 pts/0 Z 1:23 [python] <defunct>
30106 pts/0 Z 1:23 [python] <defunct>
30108 pts/0 Z 1:22 [python] <defunct>
30110 pts/0 Z 1:22 [python] <defunct>
30112 pts/0 Z 1:23 [python] <defunct>
30114 pts/0 Z 1:25 [python] <defunct>
30116 pts/0 Z 1:23 [python] <defunct>
30118 pts/0 Z 1:24 [python] <defunct>
30120 pts/0 Z 1:23 [python] <defunct>
30122 pts/0 Z 1:23 [python] <defunct>
30124 pts/0 Z 1:22 [python] <defunct>
30126 pts/0 Z 1:22 [python] <defunct>
30128 pts/0 Z 1:23 [python] <defunct>
30130 pts/0 Z 1:25 [python] <defunct>
30132 pts/0 Z 1:22 [python] <defunct>
30134 pts/0 Z 1:21 [python] <defunct>
30136 pts/0 Z 1:24 [python] <defunct>
30138 pts/0 Z 1:23 [python] <defunct>
30140 pts/0 Z 1:24 [python] <defunct>
30142 pts/0 Z 1:24 [python] <defunct>
30144 pts/0 Z 1:23 [python] <defunct>
30146 pts/0 Z 1:22 [python] <defunct>
30148 pts/0 Z 1:20 [python] <defunct>
30150 pts/0 Z 1:21 [python] <defunct>
30152 pts/0 Z 1:22 [python] <defunct>
30154 pts/0 Z 1:23 [python] <defunct>
This is a Azure linux environment. Not sure what's going on.
When I kill the process running the sorting (here 30090) it immediately proceeds with the sorting.
Found 1497965 spikes
scipy.optimize.least_squares error: Initial guess is outside of provided bounds
scipy.optimize.least_squares error: Initial guess is outside of provided bounds
scipy.optimize.least_squares error: Initial guess is outside of provided bounds
scipy.optimize.least_squares error: Initial guess is outside of provided bounds
Kept 61 units after final merging
...
The output makes me think it didn't go beyond this line:
I will try to play around with the mp_context parameters.
If this issue looks familiar and anyone tips, feel free to provide suggestions.
This may have to do with the custom environment.