updates to drfini, ufbtab, ufbmex #625

jbathegit · 2024-10-11T15:10:27Z

Fixes #623
including new outtest13 test code.

Also found and fixed a similar do loop issue in ufbtab and added a new test to expand ufbtab code coverage.

Also simplified some coding in calls to x48 and x84.

jbathegit · 2024-10-11T15:47:52Z

Hmm OK, an error that I've never seen before is now popping up with gcov in the test step of the developer check.

This happens after all of the tests themselves have run successfully, beginning at about line 7937 of the test step output. If you scroll all the way to the bottom of the test step output, it mentions that there's a bug in the gcov tool, but if you click on the URL it takes you to a stale "not found" page.

I could try adding the suggested option to the gcov command in the developer.yml file, but if you scroll up there are other diagnostic messages as well that are concerning, which leads me to think that it's now just giving up and excluding some of the test codes altogether from its coverage analysis, and which is kind of the whole point here.

@edwardhartnett @AlexanderRichert-NOAA @aerorahul @jack-woollen have you ever seen anything like this before? I did try rerunning the developer check to see if it was maybe just some one-time glitch with gcov, but no dice.

…han 2^32 times

AlexanderRichert-NOAA · 2024-10-11T17:59:54Z

No, I don't think I've run into that. My initial thought would be to either see if there's been a change in the gcovr version, or see if there's some part of the modified code that's tripping it up (doesn't look it though, since the line in question isn't new).

jbathegit · 2024-10-11T18:00:58Z

I did some more digging and found that there's a block of code in subroutine makestab that's now being hit more than 2^32 times:

      do k=1,maxjl
        knt(k) = 0
      enddo

As far as I can tell this is legitimate code for initializing an internal storage array when building a jump/link table. But with all of the testing we're now doing it's apparently being hit so many times across all of the various test programs that it's blowing past the largest number that an internal variable can hold inside of the gcovr python code, and which in turn throws the error. This is a known issue in gcovr, and I've now adopted their suggested solution of adding an option to gcovr so that it spits out a warning (once :-) and no longer crashes.

@jack-woollen, you're more of an expert than me on the innards of subroutine makestab, so I'm going to defer to you as to whether that above block of code really needs to initialize all maxjl members of the knt array every time through the loop inside that subroutine. If there's maybe some way for the overall logic to still work without continuously looping through the entire array so many times during each call to the subroutine (maybe it only really needs to initialize a smaller subsection of the array to 0, rather than all maxjl of the members?), then maybe that's a worthwhile update at some later date for purposes of efficiency to cut down overall run time? Just let me know - thanks!

@jbathegit It looks like the knt array clearing may be able to move out in front of the loop it's currently in.

! Set up expansion segments for type 'SUB', 'DRP', and 'DRS' nodes.
newn = 0
do k=1,maxjl
knt(k) = 0
enddo
do n=1,ntab
and so on

It passed all the tests when I tried that out.

jack-woollen

@jbathegit I'll comment that the original loop looked like the following, which I think accounts for the changing nval(lun). I like your update better than the original, but I'm not sure where the bug came in!

C COMFORM THE TEMPLATES TO THE DELAYED REPLICATION FACTORS
C --------------------------------------------------------

  M = 0
  N = 0

10 DO N=N+1,NVAL(LUN)
NODE = INV(N,LUN)
IF(ITP(NODE).EQ.1 .AND. TAG(NODE).EQ.DRFTAG) THEN
M = M+1
CALL USRTPL(LUN,N,MDRF(M))
GOTO 10
ENDIF
ENDDO

jbathegit · 2024-10-11T20:00:49Z

Interesting, it looks like that particular change came in sometime between v12.0.0 and v12.1.0, and honestly that was likely my own doing since I was refactoring a lot during that time, including trying to eliminate GOTO statements where possible.

However, what's even more interesting is that the aforementioned IASI encoding logic doesn't work correctly with either v12.0.0 or v12.1.0. But it does work correctly with v11.5.0 and v11.7.0, and which suggests that whatever the real bug was in this particular case may have been introduced sometime during the interval between v11.7.0 and v12.0.0. So that's a bit weird, but in any case thanks for the feedback!

@jbathegit Which code is failing? I think we should find the real bug.

jack-woollen · 2024-10-11T20:05:54Z

@jbathegit It looks like the knt array clearing may be able to move out in front of the loop it's currently in.

! Set up expansion segments for type 'SUB', 'DRP', and 'DRS' nodes.
newn = 0
do k=1,maxjl
knt(k) = 0
enddo
do n=1,ntab
and so on

It passed all the tests when I tried that out.

jbathegit · 2024-10-11T20:22:25Z

Thanks @jack-woollen , I'll look into setting up a PR for that next week.

In the meantime, my merge commit for this branch is failing its automated checkout for the Linux and developer CIs (but not for MacOS or Intel), and even though all of the CIs ran without incident for the latest commit within this PR. The error seems to be Python-related in the install-deps step:

@AlexanderRichert-NOAA @edwardhartnett @aerorahul any ideas what may be going on here? Again, it's odd that all of the same CI tests ran fine just a few hours ago within this PR, but now some of them are failing during the merge commit to the develop branch!?

AlexanderRichert-NOAA · 2024-10-11T23:17:21Z

Looks like a bug with the apt python installations. The ubuntu-latest runner just switched to 24.04, which will require some modifications anyway (24.04 doesn't have gcc 11), so for now you could revert to ubuntu-22.04 and I can look into it more this coming week.

jbathegit · 2024-10-15T22:06:05Z

Thanks for taking a look at this @AlexanderRichert-NOAA

I presume your short-term suggestion means to change:

runs-on: ubuntu-latest

to:

runs-on: ubuntu-22.04

in the developer.yml, Spack.yml, Linux.yml and Intel.yml files? I could do that, but I'm also going to be away at a meeting during the next couple of days anyway, so it might also make sense to just leave it alone for now and give you some time to explore the other modifications you were suggesting. So I'll just sit tight for the time being and check back with you on Friday.

In the meantime, what's really interesting to me is that the Intel checks passed anyway (even with the above line) and it was only the developer, Spack and Linux checks that failed during the merge commit. Furthermore, all of the CI checks passed on this PR barely 2 hours earlier at around 2:00pm EDT, but then suddenly the developer, Spack and Linux ones didn't work anymore at around 4:00pm EDT when I did the merge commit. So I guess that means they must have switched the runner sometime during that 2-hour window.

jbathegit · 2024-10-21T22:50:02Z

Hi @AlexanderRichert-NOAA just checking to see if you've had any luck or made any progress with this? I also saw where you recently opened a separate issue #627, which I presume is related.

Please let me know - thanks!

AlexanderRichert-NOAA · 2024-10-21T23:24:35Z

Yes, there turn out to be a couple issues but all fixable. Before I put in a PR-- the ubuntu-22.04 runner provides gcc's 10-12, whereas ubuntu-latest (24.04) provides 12-14. Do you have any preference toward testing those earlier versions, or are you okay with just testing within the more recent set? We can do all ubuntu-latest, all ubuntu-22.04, or a mix (eventually we'll want to test with gcc 14 once we figure out why that's failing...).

jbathegit · 2024-10-21T23:45:40Z

I'm not particularly partial to either option. But all things being equal, I'd presume it makes more sense to always use ubuntu-latest if/whenever we can.

That said, on WCOSS2 it looks like the only available gcc versions are 10.2.0, 10.3,0, 11.2.0 and 12.1.0. So I'm not sure how much help I could be in testing out gcc versions 13-14, short of just pushing up PRs in trial-and-error mode and seeing what breaks in the runners.

Bottom line, if you think it's better to stick with ubuntu-22.04 or a mix for the time being, then I'm fine with that instead.

jbathegit added 2 commits October 10, 2024 17:11

fix bug in drfini for multiple replication factors

477d5d4

fix similar do loop bug in ufbtab, add ufbtab test, simplify x84 calls

008b273

add gcovr option to prevent gcov error when a statement is hit more t…

2421808

…han 2^32 times

jbathegit requested review from edwardhartnett, jack-woollen and AlexanderRichert-NOAA October 11, 2024 18:08

jack-woollen approved these changes Oct 11, 2024

View reviewed changes

jbathegit merged commit 2d9b1a6 into develop Oct 11, 2024
10 checks passed

jbathegit deleted the jba_delreps branch October 11, 2024 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updates to drfini, ufbtab, ufbmex #625

updates to drfini, ufbtab, ufbmex #625

jbathegit commented Oct 11, 2024 •

edited

Loading

jbathegit commented Oct 11, 2024

AlexanderRichert-NOAA commented Oct 11, 2024

jbathegit commented Oct 11, 2024 •

edited by jack-woollen

Loading

jack-woollen left a comment

jbathegit commented Oct 11, 2024 •

edited by jack-woollen

Loading

jack-woollen commented Oct 11, 2024

jbathegit commented Oct 11, 2024

AlexanderRichert-NOAA commented Oct 11, 2024

jbathegit commented Oct 15, 2024

jbathegit commented Oct 21, 2024

AlexanderRichert-NOAA commented Oct 21, 2024

jbathegit commented Oct 21, 2024

updates to drfini, ufbtab, ufbmex #625

updates to drfini, ufbtab, ufbmex #625

Conversation

jbathegit commented Oct 11, 2024 • edited Loading

jbathegit commented Oct 11, 2024

AlexanderRichert-NOAA commented Oct 11, 2024

jbathegit commented Oct 11, 2024 • edited by jack-woollen Loading

jack-woollen left a comment

Choose a reason for hiding this comment

jbathegit commented Oct 11, 2024 • edited by jack-woollen Loading

jack-woollen commented Oct 11, 2024

jbathegit commented Oct 11, 2024

AlexanderRichert-NOAA commented Oct 11, 2024

jbathegit commented Oct 15, 2024

jbathegit commented Oct 21, 2024

AlexanderRichert-NOAA commented Oct 21, 2024

jbathegit commented Oct 21, 2024

jbathegit commented Oct 11, 2024 •

edited

Loading

jbathegit commented Oct 11, 2024 •

edited by jack-woollen

Loading

jbathegit commented Oct 11, 2024 •

edited by jack-woollen

Loading