Test example applications and rllib in jenkins tests. #707

robertnishihara · 2017-07-04T13:40:48Z

This currently tests A3C and evolution strategies in CI. For some reason the policy gradient example doesn't seem to work in Docker.

This should address #558.

AmplabJenkins · 2017-07-04T13:52:16Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-04T13:52:16Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1169/
Test FAILed.

AmplabJenkins · 2017-07-04T14:18:20Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-04T14:18:20Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1172/
Test FAILed.

AmplabJenkins · 2017-07-05T21:23:55Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-05T21:23:55Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1185/
Test FAILed.

AmplabJenkins · 2017-07-05T22:32:35Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-05T22:32:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1187/
Test FAILed.

AmplabJenkins · 2017-07-05T23:01:24Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-05T23:01:25Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1188/
Test FAILed.

AmplabJenkins · 2017-07-07T15:17:30Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-07T15:17:30Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1194/
Test FAILed.

AmplabJenkins · 2017-07-07T15:36:21Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-07T15:36:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1197/
Test FAILed.

AmplabJenkins · 2017-07-07T17:37:24Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-07T17:37:24Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1198/
Test FAILed.

shaneknapp · 2017-07-07T17:38:56Z

test this please

AmplabJenkins · 2017-07-07T17:56:23Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-07T17:56:23Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1202/
Test FAILed.

AmplabJenkins · 2017-07-07T20:14:43Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-07T20:14:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1207/
Test FAILed.

shaneknapp · 2017-07-07T20:30:58Z

btw, when the test ran this morning (https://amplab.cs.berkeley.edu/jenkins/job/Ray-PRB/1198/), it hung and left myriad zombie processes on the worker:

root@amp-jenkins-staging-worker-01:~# ps auxww|grep Z
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     136544  0.0  0.0      0     0 ?        Z    09:05   0:00 [redis-server] <defunct>
root     136548  0.0  0.0      0     0 ?        Z    09:05   0:00 [redis-server] <defunct>
root     136552  0.0  0.0      0     0 ?        Z    09:05   0:00 [python] <defunct>
root     136553  0.0  0.0      0     0 ?        Z    09:05   0:00 [python] <defunct>
root     136561  0.0  0.0      0     0 ?        Z    09:05   0:00 [local_scheduler] <defunct>
root     136562  0.2  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136563  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136564  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136565  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136566  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136567  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136568  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136569  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136570  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136571  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136572  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136573  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136574  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136575  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136576  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136577  0.2  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136578  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136579  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136580  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136581  0.3  0.0      0     0 ?        Z    09:05   0:17 [python] <defunct>
root     136582  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136583  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136584  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136585  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136586  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136587  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136588  0.2  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136589  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136590  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136591  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136592  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136593  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136594  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136595  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136596  0.2  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136597  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136598  0.2  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136599  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136600  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136601  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136602  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136603  0.2  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136604  0.3  0.0      0     0 ?        Z    09:05   0:17 [python] <defunct>
root     136605  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136606  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136607  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136608  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136609  0.3  0.0      0     0 ?        Z    09:05   0:16 [python] <defunct>
root     136610  0.1  0.0      0     0 ?        Z    09:05   0:06 [jupyter-noteboo] <defunct>
root     136853  1.4  0.0      0     0 ?        Z    09:05   1:17 [python] <defunct>
root     136854  1.4  0.0      0     0 ?        Z    09:05   1:19 [python] <defunct>
root     136855  1.4  0.0      0     0 ?        Z    09:05   1:17 [python] <defunct>
root     136856  1.3  0.0      0     0 ?        Z    09:05   1:15 [python] <defunct>
root     136857  1.3  0.0      0     0 ?        Z    09:05   1:13 [python] <defunct>

this is, um, suboptimal and really bad as the only way to recover is a hard reboot of the server. it can also affect other builds from different projects from running on the same machine... a clipper PRB build that fired off immediately after these zombies appeared hung indefinitely, and i discovered these zombies during my investigations for them.

robertnishihara · 2017-07-07T21:18:47Z

Thanks @shaneknapp, it looks like there may have been some corruption in the local scheduler. Or maybe the problem is something else. I'll see if I can reproduce it locally.

AmplabJenkins · 2017-07-07T23:09:08Z

Merged build finished. Test FAILed.

AmplabJenkins · 2017-07-07T23:09:08Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1209/
Test FAILed.

shaneknapp · 2017-07-08T00:06:36Z

yep, it did it again on amp-jenkins-staging-worker-02.amp. please try and reproduce and fix locally before doing more jenkins tests... i'm heading out for the weekend and won't be in until late sunday.

…

On Fri, Jul 7, 2017 at 4:09 PM, UCB AMPLab ***@***.***> wrote: Merged build finished. Test FAILed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#707 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABiDrNrNupoCiu4uTowb9fYHR1lflzm7ks5sLrqVgaJpZM4ONXlt> .

AmplabJenkins · 2017-07-16T01:50:01Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-07-16T01:50:02Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1306/
Test PASSed.

robertnishihara · 2017-07-16T01:51:38Z

@pcmoritz, @ericl let me know if you have any comments about this.

ericl · 2017-07-16T03:19:34Z

test/jenkins_tests/run_multi_node_tests.sh

+#     --iterations=2
+
+docker run --shm-size=10G --memory=10G $DOCKER_SHA \
+    python /ray/python/ray/rllib/evolution_strategies/example.py \


Maybe set --env-name as well here? We could also run rllib/train.py instead, which somewhat supercedes the example.py files.

Otherwise this looks good to me.

Fixed. I agree switching to rllib/train.py would be a good change, perhaps in a subsequent PR, to make things more uniform.

AmplabJenkins · 2017-07-16T07:55:02Z

Merged build finished. Test PASSed.

AmplabJenkins · 2017-07-16T07:55:02Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1309/
Test PASSed.

robertnishihara added 4 commits July 15, 2017 14:13

Test example applications in Jenkins.

8875415

Fix default upload_dir argument for Algorithm class.

719b28c

Fix evolution strategies.

650a5b1

Comment out policy gradient example which doesn't seem to work.

1caf40e

ericl reviewed Jul 16, 2017

View reviewed changes

Set --env-name for evolution strategies.

8fc6166

pcmoritz merged commit 80e8426 into ray-project:master Jul 16, 2017

pcmoritz deleted the jenkinsexamples branch July 16, 2017 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test example applications and rllib in jenkins tests. #707

Test example applications and rllib in jenkins tests. #707

robertnishihara commented Jul 4, 2017 •

edited

Loading

AmplabJenkins commented Jul 4, 2017

AmplabJenkins commented Jul 4, 2017

AmplabJenkins commented Jul 4, 2017

AmplabJenkins commented Jul 4, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

shaneknapp commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

shaneknapp commented Jul 7, 2017

robertnishihara commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

shaneknapp commented Jul 8, 2017 via email

AmplabJenkins commented Jul 16, 2017

AmplabJenkins commented Jul 16, 2017

robertnishihara commented Jul 16, 2017

ericl Jul 16, 2017 •

edited

Loading

robertnishihara Jul 16, 2017

AmplabJenkins commented Jul 16, 2017

AmplabJenkins commented Jul 16, 2017

Test example applications and rllib in jenkins tests. #707

Test example applications and rllib in jenkins tests. #707

Conversation

robertnishihara commented Jul 4, 2017 • edited Loading

AmplabJenkins commented Jul 4, 2017

AmplabJenkins commented Jul 4, 2017

AmplabJenkins commented Jul 4, 2017

AmplabJenkins commented Jul 4, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 5, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

shaneknapp commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

shaneknapp commented Jul 7, 2017

robertnishihara commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

AmplabJenkins commented Jul 7, 2017

shaneknapp commented Jul 8, 2017 via email

AmplabJenkins commented Jul 16, 2017

AmplabJenkins commented Jul 16, 2017

robertnishihara commented Jul 16, 2017

ericl Jul 16, 2017 • edited Loading

Choose a reason for hiding this comment

robertnishihara Jul 16, 2017

Choose a reason for hiding this comment

AmplabJenkins commented Jul 16, 2017

AmplabJenkins commented Jul 16, 2017

robertnishihara commented Jul 4, 2017 •

edited

Loading

ericl Jul 16, 2017 •

edited

Loading