Fixed int128t issue. Hopefully. #250

zbeekman · 2016-11-11T19:35:53Z

This PR is to supersede #237 if that's OK with @vehre and everyone else.

codecov-io · 2016-11-11T21:23:43Z

Current coverage is 48.18% (diff: 100%)

Merging #250 into master will not change coverage

@@             master       #250   diff @@
==========================================
  Files             2          2          
  Lines           907        907          
  Methods          42         42          
  Messages          0          0          
  Branches        195        195          
==========================================
  Hits            437        437          
  Misses          392        392          
  Partials         78         78

Powered by Codecov. Last update 611ee79...153efbb

zbeekman · 2016-11-12T15:30:02Z

@vehre thanks for taking a look!

zbeekman · 2016-11-12T19:16:04Z

Re-approve after pulling master:

👍

zbeekman · 2016-11-12T22:48:55Z

LGTM

I confirmed that the only test failure with latest GCC trunk (built today) is still the get_self test (same_loc.f90), more specifically the "get row" portion by compiling the latest GCC trunk and and then MPICH 3.1.4 with GCC. I'll dig into the source of the test failure some more... it could be a problem with the test logic, not sure yet. But I don't see why that needs to block this PR.

$ mpif90 --version
GNU Fortran (GCC) 7.0.0 20161112 (experimental)

rouson · 2016-11-14T09:54:10Z

@zbeekman the latest virtual machine now has a trunk build from the same date cited above and I too confirm that there is only one test failure, but my recollection is that it's a different than the once cited above. In the virtual machine, the one test that fails is get_with_offset_1d.f90, which runs as test 9. I thought that was the same one you were seeing on OS X. No?

@vehre, I believe my previous problems with building in the virtual machine related to some twisted logic in the environment modules. The problem has mostly been sorted out. Would it be helpful for you to download the latest virtual machine and run our tests with the default compiler in the sourcery student account? At least then you would be able to reproduce the exact scenario that I'm seeing.

vehre · 2016-11-14T12:38:13Z

@rouson @zbeekman and I figured, that the testcase # 9 is missing synchronisation. Because everything is parallel in the testcase, the image doing the check, might start doing so before the data has been set by the remote images. So there may still be an issue with the sync all. I will download newest VM and recheck.

zbeekman · 2016-11-14T17:15:40Z

@vehre After looking at the standard some more and discussing with others, I'm not convinced that the issue is with missing synchronization. Gets, e.g., local_b(:) = remote_a(:)[me+1] should be blocking on the local (me) image. So either the sync all is failing before the get, or there is some deeper issue.

Also, I don't think we need explicit synchronization at the end of the test: shouldn't the runtime ensure the buffers are flushed before terminating the program? I think I saw some comments in caf_mpi.c that may pertain to this---perhaps it's not implemented yet.

zbeekman · 2016-11-14T17:16:36Z

I'll do some more testing and reading up on the standard to confirm this, but I think what I said is true.

rouson · 2016-11-14T18:13:55Z

I just reviewed the test. Although I could read the standard some to confirm, my experience with CAF indicates to me that the test is correct and the failure of the test represents a bug either in gfortran or in OpenCoarrays. To corroborate that, I'll mention that @afanfa has previously communicated to me that the standard requires "gets" are blocking, which really is the only thing that could make sense, given the lack of a mechanism in Fortran 2008 for checking whether the communication has completed.

@zbeekman, you raise a very interesting and subtle point regarding whether the second synchronization is needed. I have generally assumed it is, but you caused me to think more about it and I'm not certain for the following reason:

program termination is a three step process: (1) initiate termination, (2) synchronize, (3) terminate. I wonder if an error stop is able to cause error termination of all images during step (2). My suspicion is yes, in which case the second synchronization is not needed.

I'll be seeing Bill Long of Cray, Inc., later today and hope to have a moment to probe him on this. Cray invented coarrays. Bill was the lead author on the TS that added the new parallel features in Fortran 2015 and I assume he was involved in designing the coarray features of 2008 so he's an authority.

rouson · 2016-11-14T18:15:14Z

I'm going to refactor the test to make the code more compact and modern, e.g., using an array statement instead of a loop. My changes should not affect the outcome. At some point soon, I should go through all the tests and make similar changes.

vehre · 2016-11-15T12:02:30Z

Now, as I have explained already, is testcase # 9 using an allocatable array on the lhs, namely the b. This mandates the use of the new API. The new API publishes the array descriptors as mpi-windows in this case the "a". Those array descriptors live on the stack of the main program. Don't mix up array descriptors and array data. The array data is allocatable and allocated using ALLOCATE(). The array descriptor is the meta-data giving the rank the extends in each dimension and the datatype of the array content.

Now when an image leaves the main program, its stack is voided and caf_finalize() is called, which overwrites the memory published as mpi-window. This can cause a crash or inconsistent data.

In principle is the calling sequence starting from invocating the program from a shell in every Fortran program like this:

int main(int argc, char **argv) {
gfortran_caf_init (&argc, &argv);
// do some other init steps.
MAIN_ (); // This is the Fortran main program which creates a new frame on the stack as explained above
_gfortran_caf_finalize ();
return 0;
}

void MAIN__() {
array_descriptor_1 a; // declare the array descriptor of array a on the stack.

a.data = 0B; // Set that the allocatable array is unallocated.
// Do the other code of the main program.
}
In the MAIN__ the array descriptor lives on the stack. As @zbeekman pointed out is the runtime responsible for synchronisation at program termination. It does so, but in a different function which then corrupts the array descriptor published by overwriting the stack with its own data.

I am still thinking of how to fix this. Approaches may be:

make array descriptors of coarrays implicitly save,
add an synchronisation at the end of the function.

I don't like the latter, because it may add to many synchronisation points. What are your opinions?

Andre

zbeekman · 2016-11-15T16:37:56Z

I am still thinking of how to fix this. Approaches may be:

make array descriptors of coarrays implicitly save,

add an synchronisation at the end of the function.

I don't like the latter, because it may add to many synchronisation points. What are your opinions?

At the end of which function? MAIN__()? (caf_finalize() already calls MPI_barrier(), however I guess the problem is that the array descriptors have already been freed/gone out of scope.) Is my understanding her correct? Are you referring to adding synchronization at the end of MAIN__()?

Adding synchronization when exiting the Fortran "main" seems like it might be an acceptable solution to me, but I defer to @afanfa's expertise. The only obvious case that concerns me is if there is a failed image, or unexpected termination of some sort on one of the images, that will then cause all the other images to hang in, while waiting for a sync all or MPI_barrier etc. that is never going to come. However, it seems like this may already be the case due to MPI_barrier() in caf_finalize, and I know failed images etc. are part of @afanfa's ongoing work. In addition it seems as though there may already be issues flushing IO buffers, so that approach could kill two birds with one stone. However, I do agree that we should be very careful not to introduce synchronization in excess of those explicitly required by the standard or implicitly required to ensure correct Coarray Fortran semantics.

I'm not sure I fully understand the implications of approach # 1: the array descriptors (and data, right?) would stick around until the barrier in caf_finalize is hit. Is there a draw back to this? It seems to me as though this might be preferable due to one less synchronization.

vehre · 2016-11-15T17:40:27Z

At the end of which function? MAIN__()? (caf_finalize() already calls MPI_barrier(), however I guess the problem is that the array descriptors have already been freed/gone out of scope.) Is my understanding her correct? Are you referring to adding synchronization at the end of MAIN__()?

Yes, your understanding is correct. In this special case yes to MAIN__(). But in fact would that approach need synchronisation in every function that has coarray stack variables, which makes the approach not feasible.

I did some research what it would need to make the variable static: exactly one statement in the compiler and instantly the issue is gone.

Btw, on my system the testcases changed their numbering: get_with_offset_1d is now # 10. With my newest copy of git:///vehre/coarray only testcase get_self/sameloc.f90 fails.

I'm not sure I fully understand the implications of approach # 1: the array descriptors (and data, right?) would stick around until the barrier in caf_finalize is hit. Is there a draw back to this? It seems to me as though this might be preferable due to one less synchronization.

Right, they would go to an rss-segment in the executable and no longer on the stack. Because gfortran does not have this variable declared as SAVE, the variable will be reinitialized upon reentering a function. Shoot, that's the next issue. Imagine this is done in a loop. Oh no, what a nightmare.

zbeekman · 2016-11-15T18:03:05Z

Btw, on my system the testcases changed their numbering: get_with_offset_1d is now # 10. With my newest copy of git:///vehre/coarray only testcase get_self/sameloc.f90 fails.

yes, sorry for the confusion surrounding the numbering, I re-enabled some more tests in 76c4492 I've got a patch for building GCC from trunk on OS X now, so I'll take another look at the test failures. I'm a little bit surprised the issue you mentioned with array descriptors on the stack getting corrupted by caf_finalize has not shown up sooner... I'm going to try to find some time to scrutinize the tests more carefully.

vehre · 2016-11-15T18:09:02Z

I'm a little bit surprised the issue you mentioned with array descriptors on the stack getting corrupted by caf_finalize has not shown up sooner.

It couldn't. Only with introducing the new API (the caf*_by_ref()-functions) those descriptors need to be published. The old API did not need the descriptor present any time, because those needed to in-sync on all images, so publishing them was not necessary. And with allowing reallocation of the destination buffer by the API-function this information needs to be present (and changed).

vehre · 2016-11-16T09:42:56Z

I'm a little bit surprised the issue you mentioned with array descriptors on the stack getting corrupted by caf_finalize has not shown up sooner...

Those are needed only by the new API of caf*_by_ref() functions. Because this API allows reallocation of result arrays the descriptor needs to be fixed accordingly, which of course needs the descriptor to be available. Furthermore does the new API need the descriptor for reffing into the arrays. The old API implemented reffing only for arrays, that are same sized on all images. Therefore supplying the image's local descriptor to the communication-routine was sufficient. But with the support for allocatable components, each component can have a different size on the different images, which enforces the need for the array descriptor to be accessible. While designing the new API we tried to be as universal as possible and therefore the array descriptor is needed whenever a coarray is registered, even when the descriptor is never used.

vehre · 2016-11-16T12:56:40Z

I have analyzed the issue in sameloc further. With introducing the new API sendget(_by_ref) is used, whenever a coarray is present on the left- and right-hand side of an assignment. Debugging into _caf_sendget() I do not see the stride respected properly. I therefore deem the bug in _caf_sendget(). I tried to fix this by branching to _caf_get() when the image_index_s is this_image. This helped for cases where only get() was involved, unfortunately is also a send involved.

afanfa · 2016-11-16T16:59:21Z

FYI: caf_send does not support the "advanced" strided transfer support implemented in caf_get and caf_send. When a strided transfer is required, caf_sendget performs the transfer element-by-element.

zbeekman · 2016-11-16T23:24:46Z

@afanfa does STRIDED need to be defined to enable the efficient strided transfers? If so is there a reason to ever disable it?

vehre · 2016-11-17T10:01:37Z

@afanfa : I totally understand that caf_sendget does not have an optimized version for strided access, but I claim that caf_sendget does not work at all with strided access. IMO does it not work when the stride is in the get-part nor when it is in the send part.

I have attached a testcase, that shows my claim.
stridedsendget.f90.gz
I had to gzip it or github won't let me attach it.

afanfa · 2016-11-17T13:48:01Z

@zbeekman: yes, STRIDED must be defined to enable the efficient strided transfer (but in caf_sendget is not implemented). You want to disable it when you are dealing with strided transfers of derived type coarrays. In that case, the library is forced to send the data byte-by-byte.

Fixed int128t issue. Hopefully.

2f4f505

zbeekman force-pushed the wrong-preproc-sym-4-haveint-128 branch from 5194fce to 2f4f505 Compare November 12, 2016 00:51

vehre approved these changes Nov 12, 2016

View reviewed changes

Merge branch 'master' into wrong-preproc-sym-4-haveint-128

55ce14f

zbeekman mentioned this pull request Nov 12, 2016

Pull request on wrong preprocessor symbol for HAVE_INT128 #237

Closed

Merge branch 'master' into wrong-preproc-sym-4-haveint-128

153efbb

zbeekman merged commit 3e92b21 into master Nov 12, 2016

zbeekman deleted the wrong-preproc-sym-4-haveint-128 branch November 12, 2016 23:35

zbeekman mentioned this pull request Nov 22, 2016

sendget with strides is failing #254

Closed

Uh oh!

Uh oh!

Fixed int128t issue. Hopefully. #250

Fixed int128t issue. Hopefully. #250

Uh oh!

Conversation

zbeekman commented Nov 11, 2016

Uh oh!

codecov-io commented Nov 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 48.18% (diff: 100%)

Uh oh!

zbeekman commented Nov 12, 2016 • edited by rouson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zbeekman commented Nov 12, 2016 • edited by rouson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zbeekman commented Nov 12, 2016 • edited by rouson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rouson commented Nov 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vehre commented Nov 14, 2016

Uh oh!

zbeekman commented Nov 14, 2016

Uh oh!

zbeekman commented Nov 14, 2016

Uh oh!

rouson commented Nov 14, 2016

Uh oh!

rouson commented Nov 14, 2016

Uh oh!

vehre commented Nov 15, 2016

Uh oh!

zbeekman commented Nov 15, 2016

Uh oh!

vehre commented Nov 15, 2016

Uh oh!

zbeekman commented Nov 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vehre commented Nov 15, 2016

Uh oh!

vehre commented Nov 16, 2016

Uh oh!

vehre commented Nov 16, 2016

Uh oh!

afanfa commented Nov 16, 2016

Uh oh!

zbeekman commented Nov 16, 2016

Uh oh!

vehre commented Nov 17, 2016

Uh oh!

afanfa commented Nov 17, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-io commented Nov 11, 2016 •

edited

Loading

zbeekman commented Nov 12, 2016 •

edited by rouson

Loading

zbeekman commented Nov 12, 2016 •

edited by rouson

Loading

zbeekman commented Nov 12, 2016 •

edited by rouson

Loading

rouson commented Nov 14, 2016 •

edited

Loading

zbeekman commented Nov 15, 2016 •

edited

Loading