Travis-CI testing #111

ghost · 2018-03-28T17:57:08Z

This PR implements Travis CI for CICE. The configuration is similar to Icepack, and is based on GCC and open-mpi.

There are still a few issues that need to be worked out before I recommend merging this PR. The only tests that currently succeed are the build tests––the run tests all fail. Here is an example build log, with an excerpt below:

#------- 
#repo = https://github.com/anders-dc/CICE.git
#bran = 
#hash = 960aaadcf40762e984dd7a75ea36b96df8feef8b
#hshs = 960aaadcf4
#hshu = Anders Damsgaard <andersd@riseup.net>
#hshd = Wed Mar 28 13:20:19 2018 -0400
#date = 2018-03-28
#time = 17:24:42
#mach = travisCI
#user = travis
#vers = CICE 6.0.0.alpha
#------- 
#---
PASS travisCI_gnu_smoke_gx3_8x2_diag1_run5day build
FAIL travisCI_gnu_smoke_gx3_8x2_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_8x2_diag24_medium_run1year build
FAIL travisCI_gnu_smoke_gx3_8x2_diag24_medium_run1year run
#---
PASS travisCI_gnu_smoke_gx3_4x1_debug_diag1_run5day build
FAIL travisCI_gnu_smoke_gx3_4x1_debug_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_8x2_debug_diag1_run5day build
FAIL travisCI_gnu_smoke_gx3_8x2_debug_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_4x2_diag1_run5day build
FAIL travisCI_gnu_smoke_gx3_4x2_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_4x1_diag1_run5day_thread build
FAIL travisCI_gnu_smoke_gx3_4x1_diag1_run5day_thread run
#---
PASS travisCI_gnu_restart_gx3_8x1_diag1 build
PEND travisCI_gnu_restart_gx3_8x1_diag1 exact-restart
FAIL travisCI_gnu_restart_gx3_8x1_diag1 run-initial
#---
PASS travisCI_gnu_restart_gx3_4x2_debug build
PEND travisCI_gnu_restart_gx3_4x2_debug exact-restart
FAIL travisCI_gnu_restart_gx3_4x2_debug run-initial
#---
PASS travisCI_gnu_restart_gx3_8x2_diag1_pondcesm build
PEND travisCI_gnu_restart_gx3_8x2_diag1_pondcesm exact-restart
FAIL travisCI_gnu_restart_gx3_8x2_diag1_pondcesm run-initial
#---
PASS travisCI_gnu_restart_gx3_8x2_diag1_pondtopo build
PEND travisCI_gnu_restart_gx3_8x2_diag1_pondtopo exact-restart
FAIL travisCI_gnu_restart_gx3_8x2_diag1_pondtopo run-initial
#---
PASS travisCI_gnu_smoke_gx1_32x1_diag1_run5day_thread build
FAIL travisCI_gnu_smoke_gx1_32x1_diag1_run5day_thread run
#---
PASS travisCI_gnu_smoke_gx1_16x2_diag1_run5day build
FAIL travisCI_gnu_smoke_gx1_16x2_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx1_8x4_debug_run2day build
FAIL travisCI_gnu_smoke_gx1_8x4_debug_run2day run
#---
PASS travisCI_gnu_restart_gx1_32x1 build
PEND travisCI_gnu_restart_gx1_32x1 exact-restart
FAIL travisCI_gnu_restart_gx1_32x1 run-initial
#---
PASS travisCI_gnu_restart_gx1_13x2 build
PEND travisCI_gnu_restart_gx1_13x2 exact-restart
FAIL travisCI_gnu_restart_gx1_13x2 run-initial

15 of 36 tests PASSED
15 of 36 tests FAILED
6 of 36 tests PENDING

I set ICE_MACHINE_TPNODE = 4 in configuration/scripts/machines/env.travisCI, which makes the build steps succeed. However, Travis-CI does not support the resultant nprocs values during execution. By grep'ing the generated casescripts, nprocs ends up with values of 4, 8, 13, 16, or 32. This, by far, exceeds the capabilities of Travis. I suggest designing tests that are suitable for Travis.

Furthermore, I had to remove -Wextra from the compiler flags (configuration/scripts/machines/Macros.travisCI), as Travis fails a build if the size of STDOUT/STDERR text exceeds 4 megabytes.

Developer(s): Anders Damsgaard, Princeton/NOAA-GFDL (github.com/anders-dc, adamsgaard.dk)

Are the code changes bit for bit, different at roundoff level, or more substantial? There are minor changes to the underlying code which shouldn't affect other uses.

Is the documentation being updated with this PR? (Y/N) No.

If not, does the documentation need to be updated separately? (Y/N) No.

apcraig · 2018-03-28T18:03:21Z

I see this in the build log for several tests
(abort_ice)ABORTED:
(abort_ice) error = ice: Input nprocs not same as system request
That suggests there is an inconsistency in the tasks/threads used for testing and those defined by the test and/or in namelist. What we need to do is make sure we're setting up tests that can be carried out by travisCI.

Does travisCI support MPI and/or openMP and if so, how many tasks and threads can we have?

ghost · 2018-03-28T18:07:36Z

Whoops, I forgot to launch the tests with mpirun. I've fixed that in a143396 and
ed04056. TravisCI does support MPI and OpenMP, but the virtual machines are two cores only. However, I think the question is if we can overload the system with additional threads, which would presumably result in slower execution.

apcraig · 2018-03-28T18:07:43Z

It looks like the travisCI machine setup has 4 tasks per node, and also no way to request resources, we are just running interactively. How many resources do we get, just one node? That means we may have to develop a suite that uses no more than 4 tasks*threads for all tests.

ghost · 2018-03-28T18:14:10Z

Yes, I set it up for 4 tasks per node in order to be able to build. We get just one node with two cores, and I'm pretty sure these are not hyperthreaded. I agree, the best solution would be to have a test suite which is designed for this environment.

apcraig · 2018-03-28T18:19:58Z

We can do that. We'll need to setup a suite of test that use less resources than we have currently defined. It would be nice to get access to more cores though so we can test a mix of task and thread counts with different decompositions. 8 or 16 would be great for instance.

I was just looking to see what VIC is doing and it looks like they use travis for a bunch of unit tests, https://travis-ci.org/UW-Hydro/VIC, but I will try to ask them about whether they are able to test on higher pe counts.

ghost · 2018-03-28T18:43:09Z

Sounds good, thanks!

Meanwhile, it looks like we are getting there. I encounter into this error:

Fortran runtime error: Cannot open file '/home/travis/CICE_data/grid/gx3/grid_gx3.bin': No such file or directory

We used a wget call to get external [Icepack_data.tar.gz from a UCAR FTP server]. Is there a similar archive for CICE?

EDIT: Nvm, just found the information in the wiki

ghost · 2018-03-28T22:32:48Z

Excellent, thank you Tony. The new test suite seems to mostly succeed (raw log).

#------- 
#repo = https://github.com/anders-dc/CICE.git
#bran = 
#hash = 76c37bf34ea295fbd2ad889375696104e6e50c7e
#hshs = 76c37bf34e
#hshu = Anders Damsgaard <andersd@riseup.net>
#hshd = Wed Mar 28 18:04:22 2018 -0400
#date = 2018-03-28
#time = 22:08:40
#mach = travisCI
#user = travis
#vers = CICE 6.0.0.alpha
#------- 
#---
PASS travisCI_gnu_smoke_gx3_1x2_diag1_run5day build
PASS travisCI_gnu_smoke_gx3_1x2_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_2x1_debug_diag1_run5day build
FAIL travisCI_gnu_smoke_gx3_2x1_debug_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_1x2_debug_diag1_run5day build
FAIL travisCI_gnu_smoke_gx3_1x2_debug_diag1_run5day run
#---
PASS travisCI_gnu_smoke_gx3_1x1_diag1_run5day_thread build
FAIL travisCI_gnu_smoke_gx3_1x1_diag1_run5day_thread run
#---
PASS travisCI_gnu_smoke_gx3_2x1_diag1_run5day_thread build
PASS travisCI_gnu_smoke_gx3_2x1_diag1_run5day_thread run
FAIL travisCI_gnu_smoke_gx3_2x1_diag1_run5day_thread bfbcomp travisCI_gnu_smoke_gx3_1x2_diag1_run5day.travisCItest different-data
#---
PASS travisCI_gnu_restart_gx3_2x1_diag1 build
PASS travisCI_gnu_restart_gx3_2x1_diag1 run-initial
PASS travisCI_gnu_restart_gx3_2x1_diag1 run-restart
PASS travisCI_gnu_restart_gx3_2x1_diag1 exact-restart
#---
PASS travisCI_gnu_restart_gx3_1x2_diag1 build
PASS travisCI_gnu_restart_gx3_1x2_diag1 run-initial
PASS travisCI_gnu_restart_gx3_1x2_diag1 run-restart
PASS travisCI_gnu_restart_gx3_1x2_diag1 exact-restart
#---
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondcesm build
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondcesm run-initial
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondcesm run-restart
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondcesm exact-restart
#---
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondtopo build
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondtopo run-initial
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondtopo run-restart
PASS travisCI_gnu_restart_gx3_2x1_diag1_pondtopo exact-restart

23 of 27 tests PASSED
4 of 27 tests FAILED
0 of 27 tests PENDING

Travis decides to terminate it as it loops through the runlogs in after_failure because of the excessive output.

apcraig · 2018-03-28T22:37:57Z

Can we try again, but instead of writing the entire log file at the end, can we just tail -100 each log file?

ghost · 2018-03-28T23:33:43Z

Great, this is more informative. Here's the raw log.

EDIT: The runtime errors come from Icepack:

At line 783 of file /home/travis/build/anders-dc/CICE/icepack/columnphysics/icepack_zbgc.F90
Fortran runtime error: Array bound mismatch for dimension 1 of array 'kn_bac' (1/3)

apcraig · 2018-03-29T01:02:34Z

We've seen that error before, probably just need to fix an interface call on the cice side. there is another different error for the 1x1 case. i'll have to look at that one a little closer.

apcraig · 2018-03-29T22:25:58Z

Just FYI that I have duplicated these errors on another machine with the gnu compiler and am working on them. Hope to have an update soon.

apcraig · 2018-03-30T17:14:27Z

@anders-dc I just updated my travis branch again with several fixes.
https://github.com/apcraig/CICE/tree/travis
The specific commit is
apcraig@ad52ab4
I assume you can pull these updates into your branch and run another test? If you have problems with the pull, let me know. thanks!

update CICE to address test failures, several issues added

ghost · 2018-03-30T17:32:22Z

Thanks @apcraig, here's the newest run.

apcraig · 2018-03-30T17:56:56Z

I'm watching it. We've already hit the log size limit and been going 20 minutes. We need to add an option to the scripts that doesn't write build output to the terminal. I will take care of that next. I've also had an idea that we should be reusing binaries if we can. That's not so easy to do with CICE because the decomposition is built into the build. But maybe we can for travis. Let me prototype that too and see if I can get something that works.

apcraig · 2018-03-30T18:02:48Z

It failed, but we can't tell why. I'll try to fix the length of the logging and propose another pull later today.

ghost · 2018-03-30T18:03:56Z

I agree although the raw log is still going. The -Wzerotrip compiler warnings (included in -Wall; "Warning: DO loop at (1) will be executed zero times") are also a major issue.

apcraig · 2018-03-30T18:09:10Z

You're right Anders, we can see the raw log. Forgot about that. We're still getting a couple errors. I'll look into those too, but we're getting closer.

ghost · 2018-03-30T18:21:13Z

Yes, we're getting there. Maybe it would be worth suppressing more compiler warnings. The main product of Travis is the boolean yes/no to whether the compilation and runtime tests for a commit are successful. Only rarely will somebody look into the log of a passed build.

apcraig · 2018-03-30T21:17:15Z

@anders-dc OK, there is another set of commits on the travis branch,
https://github.com/apcraig/CICE/tree/travis
specifically
apcraig@6be69e4

You should also add

setenv ICE_MACHINE_QUIETMODE true

to your env.travisCI_gnu file. that will stop the spewing of the build output. if the build fails, it will do a tail -10 automatically on the build log file, so hopefully that will work for us. if not, we'll continue to tweak.

In addition to adding the quiet mode, I have also added a couple tests to the travis suite. I want to see what we get. I have not been able to duplicate the error on another machine. I even used the travisCI Macros file just to make sure it wasn't a small diff in the build settings. I am getting some errors with other compilers (pgi) in what seems to be the same point, but I can't be sure it's the same thing. I spent a few minutes looking at the pgi error but it's going to take a little more work to sort out. My plan is to add an issue.

What I propose is we run this next set of tests and see what we get. Then we should turn off, for now, the ones that are failing on travisCI. We can then push this to master and separately work on the outstanding issues. I think we've made some reasonable progress at this point.

update travis suite and add quiet mode to scripts

apcraig · 2018-03-30T22:15:49Z

So, the latest test suite does more or less what I expected. We definitely have some reproducibility problems, and that's one issue we're not seeing on other platforms so far. That's even without OpenMP. There is work to do, but most of that needs to happen outside Travis. I propose the following changes to the travis_suite, change

smoke gx3 2x1 diag1,run5day smoke_gx3_1x1_diag1_run5day
smoke gx3 1x2 diag1,run5day
smoke gx3 1x1 diag1,run5day,thread smoke_gx3_1x2_diag1_run5day
smoke gx3 2x1 diag1,run5day,thread smoke_gx3_1x2_diag1_run5day

to

#smoke gx3 2x1 diag1,run5day smoke_gx3_1x1_diag1_run5day
smoke gx3 2x1 diag1,run5day
smoke gx3 1x2 diag1,run5day
#smoke gx3 1x1 diag1,run5day,thread smoke_gx3_1x2_diag1_run5day
#smoke gx3 2x1 diag1,run5day,thread smoke_gx3_1x2_diag1_run5day
smoke gx3 2x1 diag1,run5day,thread

Basically, we're turning off the 1x1 test that fails and turning off all the bfb compares for the other tests. Not ideal, but OK for now. @anders-dc can you make that change and retest. If you prefer for me to make the change on my branch, just let me know. thanks!

ghost · 2018-03-30T23:20:25Z

Hooray! The build took quite a long time to complete (33 mins), but passed. Thanks @apcraig!

apcraig · 2018-03-30T23:25:20Z

Great. I think we an execute the PR now. We should have @eclare108213 give a quick review too. There are some code mods. I may further reduce the test list or try to figure out a way for it to go a little faster. 30 minutes seems a little long for a "quick" status test.

Anders Damsgaard and others added 21 commits February 20, 2018 10:55

First attempt at travis CI

0670791

Add Travis-CI badge to README

300c207

Add quotes around ICE_SCRIPTS definition to allow spaces in folder names

8024386

Report the number of failed tests as an exit code

dc10c9b

Rename WKDIR and BASELINE vars

a491970

Merge branch 'master' of github.com:CICE-Consortium/CICE into travisCI

0cb8182

Add conditional for travis-CI

818d63f

Add travisCI as runtime environment

ec3c01b

Increase ICE_MACHINE_TPNODE and add verbose output

b02e7bc

Merge branch 'master' of github.com:CICE-Consortium/CICE into travisCI

377d2b5

Merge branch 'master' of github.com:CICE-Consortium/CICE into travisCI

b108217

Disable verbose flag for batch script

bc4c710

Increase the number of threads on Travis-CI to 4 (may not work)

ece29fa

Install openmpi, remove llvm and Icepack data fetch

29a0292

Update Macros for GCC and openmpi compilers

0c7be40

Remove libnetcdff-dev requirement

72266e4

Merge branch 'master' of github.com:CICE-Consortium/CICE into travisCI

ad1f14e

Disable -Wextra because test output exceeds travis limits

960aaad

Remove debug commands

40e7755

Launch cice with mpirun

a143396

Merge branch 'travisCI' of github.com:anders-dc/CICE into travisCI

e87cc2e

ghost mentioned this pull request Mar 28, 2018

Errors when running CICE test suite on Travis-CI #90

Closed

Fix typo in mpirun call

ed04056

Download and unpack archive with grids and initial conditions

7e945ea

Restrict runlog output to final 100 lines

cd2391a

update CICE to address test failures, several issues added

ad52ab4

Merge pull request #2 from apcraig/travis

b3c781f

update CICE to address test failures, several issues added

update travis suite and add quiet mode to scripts

6be69e4

anders-dc and others added 2 commits March 30, 2018 17:23

Merge pull request #3 from apcraig/travis

b871231

update travis suite and add quiet mode to scripts

Set quiet build mode for Travis

dda56b7

Turn off 1x1 test that fails and turn off BFB compares for other tests

d1d7cf5

Merge branch 'master' into travisCI

76bfcd9

apcraig requested review from eclare108213 and apcraig March 30, 2018 23:25

apcraig approved these changes Mar 30, 2018

View reviewed changes

eclare108213 approved these changes Mar 31, 2018

View reviewed changes

eclare108213 merged commit 536e2a6 into CICE-Consortium:master Mar 31, 2018

ghost deleted the travisCI branch March 31, 2018 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Travis-CI testing #111

Travis-CI testing #111

ghost commented Mar 28, 2018

apcraig commented Mar 28, 2018

ghost commented Mar 28, 2018

apcraig commented Mar 28, 2018

ghost commented Mar 28, 2018

apcraig commented Mar 28, 2018

ghost commented Mar 28, 2018 •

edited by ghost

Loading

ghost commented Mar 28, 2018 •

edited by ghost

Loading

apcraig commented Mar 28, 2018

ghost commented Mar 28, 2018 •

edited by ghost

Loading

apcraig commented Mar 29, 2018

apcraig commented Mar 29, 2018

apcraig commented Mar 30, 2018

ghost commented Mar 30, 2018

apcraig commented Mar 30, 2018

apcraig commented Mar 30, 2018

ghost commented Mar 30, 2018 •

edited by ghost

Loading

apcraig commented Mar 30, 2018

ghost commented Mar 30, 2018

apcraig commented Mar 30, 2018

apcraig commented Mar 30, 2018

ghost commented Mar 30, 2018

apcraig commented Mar 30, 2018

Travis-CI testing #111

Travis-CI testing #111

Conversation

ghost commented Mar 28, 2018

apcraig commented Mar 28, 2018

ghost commented Mar 28, 2018

apcraig commented Mar 28, 2018

ghost commented Mar 28, 2018

apcraig commented Mar 28, 2018

ghost commented Mar 28, 2018 • edited by ghost Loading

ghost commented Mar 28, 2018 • edited by ghost Loading

apcraig commented Mar 28, 2018

ghost commented Mar 28, 2018 • edited by ghost Loading

apcraig commented Mar 29, 2018

apcraig commented Mar 29, 2018

apcraig commented Mar 30, 2018

ghost commented Mar 30, 2018

apcraig commented Mar 30, 2018

apcraig commented Mar 30, 2018

ghost commented Mar 30, 2018 • edited by ghost Loading

apcraig commented Mar 30, 2018

ghost commented Mar 30, 2018

apcraig commented Mar 30, 2018

apcraig commented Mar 30, 2018

ghost commented Mar 30, 2018

apcraig commented Mar 30, 2018

ghost commented Mar 28, 2018 •

edited by ghost

Loading

ghost commented Mar 28, 2018 •

edited by ghost

Loading

ghost commented Mar 28, 2018 •

edited by ghost

Loading

ghost commented Mar 30, 2018 •

edited by ghost

Loading