-
Notifications
You must be signed in to change notification settings - Fork 931
Use reference-counted immutable strings with info objects #8330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/c7c0b6dee7cd6fc661cff61f6d09c935 |
|
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/067a7731e09c8fd095275e5dfebfd889 |
a7e42a1 to
70b7d7d
Compare
|
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/76fbd5a8ef087d885cdc943eae75d519 |
|
I fixed two issues I didn't see in my local setup (in the gpfs integration and with |
May or may not be true - I'd have to look at it more generally. However, given that this is an MPI_Info problem you are addressing, I would recommend constraining it strictly to that realm and suggest you simply cast the string to silence the warning. |
|
@artemry-mlnx Looks like another issue in the Mellanox CI environment: |
That is not a solution. In fact, the current implementation was not correct either. Casting away It is an unfortunate part of C's history that string literals are of type |
There is such a thing as being too anal. I suggest you rethink your philosophy a bit and take a more practical turn. My point here is that OMPI doesn't dictate to PMIx as to when and how to define its APIs. This is true of all the libraries that OMPI uses. So declaring that you don't like something isn't very motivational to those who write/maintain those libraries. Not everyone feels the need to be a "purist" 😄 |
|
This is not about purism but about correctness and consistency, and about the fact that we, the developers' community, learned to do better. Many major libraries (such as the libc) have updated their API for all string manipulation functions to clearly mark the immutability of strings via There are better practices put forward by the community, it is only constructive to suggest they should be followed more strictly. And for what is worth, even the MPI standard now follows such practices having added |
|
Some of us who have dug deeper into this tend not to view it as strongly as your team clearly does. The Compilers don't offer that stronger guarantee for the simple reason that it is unobtainable. The purpose of the directive is to aid developers of that specific package with their internal work. In other words, it is to assist the developers in ensuring that their own code lives up to certain inter-functional directives. What it cannot do is ensure that any secondary library you call with that variable will behave in a particular manner. The reason is simple: the compiler has no visibility into that library and its layered calls. Thus, it ceases to have any meaning once you transition to a library that is being linked against but lies outside of your own source code. The question raised here regarded precisely this last situation. OMPI sees a warning because it is passing a This is why even the meaning of the I took the time to write all this solely to help educate the community as to why some of us might not feel quite so strongly about the value of |
|
@rhc54 You are right that the compiler cannot enforce that a
The answer is fairly clear: unless you are willing to jeopardize thread-safety and risk access violations (by writing to write-protected memory such as string literals on some platforms) the variable should not be changed, period.
I'd be happy if you could point out such instances in OMPI. Unsafe behavior should not justify more of it... Anyway, I was only trying to be helpful and provide a suggestion (won't happen again, I promise). By all means, keep the PMIx API as it is, I have adjusted the OMPI side in the last commit last night. |
|
bot:ibm:pgi:retest |
|
There are so many incorrect statements above that this discussion is doing more of a disservice to the community than anything else. To give a simple example, if a const argument is altered inside a function segfault can ensure if the caller, thrusting the API calls the function with a pointer to a const variable (because they could be placed in a read-only data segment, and thus protected). As @devreal suggested, if @rhc54 can pinpoint us (and here I am talking the OMPI community not my team) to the places where OMPI internally alters const char* that would be extremely helpful, as this is not only bad practice, but incorrect code. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is great, actually. I think it cleans up the info code tremendously, and actually gives us a path to implementing the cleaned-up MPI-4 info semantics, too.
Per one of my comments, it would be nice to also clean up the _IN_ monstrosity for info keys, but that wasn't technically in the scope of this PR.
| if (0 == strncmp(iterator->ie_key, OPAL_INFO_SAVE_PREFIX, | ||
| strlen(OPAL_INFO_SAVE_PREFIX))) | ||
| // If we see an __IN_<key> key but no <key>, decide what to do based on mode. | ||
| // If we see an __IN_<key> and a <key>, skip since it'll be handled when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate this whole "prefix" thing, but I realize it's not of this PR's making.
...but now that we have proper and info and string classes, can we do something better?
This is a whole hunk of new/heavily modified code here. Do we have any tests that test what the whole "prefix" thing was doing to know if this new code is working properly? (i.e., preserving user-set values, and exercising the different cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding for having this mechanism is that we allow info subscribers to change the value and are not allowed to hand out a different value in MPI_Info_get. I don't know why this is useful, components that don't support an info value should just ignore it.
@markalle Can you shed some light on why we allow subscribers to change the value? What was the use case for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have dim recollections that this has something to do with distinguishing -- and preserving -- the info values that users set vs. the info values that Open MPI set. I don't remember the exact semantics, but it may have had something to do with what MPI_INFO_GET is supposed to return (or the info that results from MPI_COMM_DUP and the like?) -- i.e., what values the user set vs. the values that Open MPI is actually using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this actually works correctly on master then: MPI_Info_get will only query the key provided by the user and not check whether there is a backup key prefixed with __IN_. So if we set another value for a key we will actually hand out that new value to the user (my reading of the standard is that this is incorrect behavior). The only place where I can see the prefixed key being used is in opal_info_dup_mode which is called from MPI_Info_dup. There it is used to drop new values and use the backup instead.
Allowing subscriber callbacks to alter an info value is problematic for another reason: assuming we have more than one component subscribing to an info value on the same object (which I don't think we have right now), the value they see depends on the order in which they subscribed (the one set by the user or the one changed by any previous component). That doesn't seem right to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this actually works correctly on
masterthen:MPI_Info_getwill only query the key provided by the user and not check whether there is a backup key prefixed with__IN_. So if we set another value for a key we will actually hand out that new value to the user (my reading of the standard is that this is incorrect behavior). The only place where I can see the prefixed key being used is inopal_info_dup_modewhich is called fromMPI_Info_dup. There it is used to drop new values and use the backup instead.
It's quite possible that I'm not remembering correctly.
I think that the intent was to preserve 2 sets of values for cases of conflict... but I don't remember what the precise conflict was -- something to do with what OMPI set vs. what the user set. I don't remember exactly why we had to maintain both values.
Allowing subscriber callbacks to alter an info value is problematic for another reason: assuming we have more than one component subscribing to an info value on the same object (which I don't think we have right now), the value they see depends on the order in which they subscribed (the one set by the user or the one changed by any previous component). That doesn't seem right to me...
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bosilca Pointed me to §12.2.7 (MPI_Win_set_info and MPI_Win_get_info) which says that the info object retrieved through MPI_Win_get_info should contain the actual values used for hints, not necessarily the ones set through MPI_Win_set_info. So it makes sense that the info subscriber can signal the use of a different value.
However, I still don't understand why we store a backup then. The info object returned by MPI_Win_get_info is a new info object that is duplicated from the info object stored in the window object. It is completely independent of the info object used during window creation and if I MPI_Win_dup the info returned by MPI_Win_get_info I still want to see these values, not some backup. @markalle What am I missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- we need to figure out what the right behavior is for dup'ing infos, etc.
That being said, I still kinda hate the whole _IN prefix thing. Can we just have storage for 2 strings -- one that the user set and one that Open MPI is using? In many (most?) cases, they will be the same (and could be ref counted, or one of them could even be NULL or something). But more importantly, it would make the code more clear about the explicit choice to save a 2nd value sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to figure out whether we actually need this functionality at all. If not, we can remove it and don't have to bother fixing it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to figure out whether we actually need this functionality at all. If not, we can remove it and don't have to bother fixing it...
Sorry: I wasn't disagreeing. All I was trying to say was: if we need the "have 2 set possible values" functionality, then it would be better to ditch the _IN prefix stuff and just explicitly store 2 strings (e.g., in two different members/variables).
|
Oops -- I didn't mean to approve yet 😉 -- there's one or two simple issues to resolve before we can merge, but regardless: in general, I am strongly in favor of this PR. |
b119733 to
5b35ebc
Compare
|
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/484f472b5bfc9e235f9491b6a9feed3c |
5b35ebc to
8fddcb8
Compare
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
The key and info strings are now stored as opal_cstring_t objects so returning them works by incrementing their reference count. The caller is responsible for releasing the objects eventually. This avoids many unnecessary string copy operations, which are now only required if the string is to be modified (e.g., passed to strtok). Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
8fddcb8 to
3c5c132
Compare
|
Thanks - will wait for @jsquyres 👍 |
|
bot:aws:retest |
1 similar comment
|
bot:aws:retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
|
This should be cherry-picked to 5.0. |
This PR adds a new OPAL class
opal_cstring_tthat represents a reference-counted immutable string. These are used to store and pass info keys and values, which reduces the instances in which strings actually have to copied.The interface is relatively simple: the
opal_cstring_tstruct contains two relevant memberslengthandstringthat contain the number of characters (excl. the null terminator) and the string itself. They are markedconstso that they cannot be modified once the object has been created. A new object is created usingopal_cstring_create(string)(creates a new cstring withstringas its value) oropal_cstring_create_l(string, len)(creates a new cstring with the firstlencharacters ofstringas its value). The cstring is eventually free'd once a call toOBJ_RELEASEon them finds the object unreferenced.The function
opal_info_getnow provides a pointer to a cstring if the key is found, with its reference count incremented. It is thus important that the caller releases the object again once processing has completed. The same holds for the key returned byopal_info_get_nthkey. If another thread removes the key/value pair from the info object the string remains valid untilOBJ_RELEASEis called. A new functionopal_info_set_cstringallows setting a value directly using a cstring, which avoids unnecessary memory allocation and only causes the cstring's reference count to be incremented.Some notes
The
opal_cstring_tstruct contains a single character member that is meant to force the compiler to add padding bytes at the end, which we are used to null-terminate the string even if it is not created throughopal_cstring_createbut throughOBJ_NEWand thus no additional memory is allocated. The latter will create an empty, null-terminated string. This works because the flexible array member (FAM)stringpoints to the first byte following the members of the struct but the compiler has to insert padding bytes to ensure alignment as if the FAM was empty. This provides us with 3B (32bit systems) or 7B (64bit systems), respectively, which are sufficient to store the null-terminator for empty strings. The padding bytes will also be used for non-empty strings and are thus factored in during memory allocation inopal_cstring_create.The introduction of
const char*info strings had some ripple effects.const char*strings (which it should have been from the beginning) and reusing cstring objects where possible (namely when saving old info values).PMIx_Get_attribute_stringfunction takes achar*argument but is passed string constants in many places. Digging into it, it doesn't seem to actually modify the string (which it shouldn't anyway if passed string literals). It really should take aconst char*argument. Long story short, this currently triggers a warning indpm_convertabout a discardedconstqualifier. @rhc54 do you agree with this analysis?The MPI level info handling functions now deal with cstring objects and handle the strings according to their own requirements (C/Fortran). Implementing MPI-4 info string functions on top of this interface should be straight forward without touching the opal layer again.
I renamed and moved the functions
opal_info_value_to_intandopal_info_value_to_booltoopal_cstring_to_intandopal_cstring_to_bool. They were not exactly info-specific anyway and are imo better suited in the context of the cstring implementation. I did not changeopal_string_copyas that operates onchar*strings.This PR touches quite a few places that I am not familiar with (all places that process info values) so having some more eyes look at these changes would be greatly appreciated :)
This PR supersedes Go back to using opal_string_copy in opal_info_get #7972, which contains the initial discussion on this approach.