-
Notifications
You must be signed in to change notification settings - Fork 931
Go back to using opal_string_copy in opal_info_get #7972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 9b88e60. Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the ompio functionality I am fine with the changes.
|
You need to check with a Fortran expert here before such a major change, as the manipulation of the info in Fortran is "different". Let me tag @jsquyres our in-house Fortran guru. |
|
I tried to understand the Fortran string handling and it looks to me like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I applaud this PR for trying to clean up our info string handling. It's currently somewhat of a mess, and really should be cleaned up. The fact that most real-world MPI applications don't seem to bump up against MPI_MAX_INFO_VAL and MPI_MAX_INFO_KEY doesn't change the fact that our info implementation is... messy.
Second, let me confirm: yes, the Fortran handling is as @devreal describes. The one difference to keep in mind between the C and Fortran handling of strings is that C expects strings to be \0-terminated, but Fortran expects them to be space-padded on the right. Meaning: Fortran treats string copies more like memcpy -- there's a pointer and a length. E.g., if you have an 8-character string in Fortran and you put the value "dog" in there, the memory will be exactly 8 bytes long and its contents are guaranteed to be: dog<SP><SP><SP><SP><SP> (vs. C, where the contents of memory will be dog\0****, where * can be any value).
All this being said, I'm not convinced that this PR is taking the right tact.
The real issue is that MPI-3.1 does not define that info keys or values must be \0-terminated in C. Indeed, it makes very confusing/ambiguous statements about the "end of string character" (it doesn't mandate that they are present, but does insist that the valuelen does not include them). I think some improvements may have been made in MPI-4.0, but we'll still have the legacy issue about what to do for MPI-3.1.
MPI-3.1 MPI_INFO_GET says:
valuelen is the number of characters available in value. If it is less than the actual size of the value, the value is truncated. In C, valuelen should be one less than the amount of allocated space to allow for the null terminator.
Note that this doesn't mandate that \0 is there (yes, it's a mess).
Also note that in Open MPI:
- OPAL_INFO_MAX_VAL is 256
- MPI_INFO_MAX_VAL is 256 (i.e., it's defined to be
OPAL_INFO_MAX_VAL) in C and 255 in Fortran (i.e., it's a constant equal toOPAL_INFO_MAX_VAL-1)
But with this change, we have to add one to OPAL_INFO_MAX_VAL everywhere (in C) to make it long enough. This is problematic for two reasons:
- The name
OPAL_INFO_MAX_VALis a lie: it's not the "max" length. It's one less than the max length. My $0.02 is that this degrades the readability of this already-confusing code. - C MPI applications cannot use
char foo[MPI_INFO_MAX_VAL]because it's too short.
The second point is what kills the approach of this PR for me.
Why don't we change the internals of OPAL info handling to be more like a memcpy with an associated length? E.g., something like:
typedef struct {
char *key, *value;
// MPI-3.1 does not guarantee that key or value are \0-terminated,
// so we maintain their lengths manually. These length values
// will include the \0 _if it is present_.
size_t key_len_in_bytes, value_len_in_bytes;
} opal_info_entry_t;
typedef struct {
opal_list_t info_entries;
// ...
} opal_info_t;
void opal_info_get(opal_info_t *info, const char *key, char *value, int value_len_in_bytes, ...)
{
opal_info_entry_t *entry = opal_find_info_entry(info, key);
if (NULL == entry) return;
if (value_len_in_bytes < entry->value_len_in_bytes) {
memcpy(value, entry->value, value_len_in_bytes);
} else {
memcpy(value, entry->value, entry->value_len_in_bytes);
}
}This way, the OPAL info implementation can store however many bytes are relevant for the info and key, and we always know exactly how long they are (including the \0, if it is present).
...I just read the changes in MPI-4.0 for info:
- MPI_INFO_SET defines that C key and value strings are null-terminated.
- That clears up some of the ambiguities.
- Meaning: if the user declares
char foo[MPI_INFO_MAX_VAL], that must include the\0character.
- MPI_INFO_GET was deprecated (i.e., not fixed / ambiguities not clarified).
- MPI_INFO_GET_VALUELEN was deprecated (i.e., not fixed / ambiguities not clarified).
- MPI_INFO_GET_STRING was introduced. It seems to have clear descriptions of string handling, including the
\0character.
I think that the implementation I proposed above will allow for both the MPI-3.1-style MPI_INFO_GET / MPI_INFO_GET_VALUELEN and the new MPI-4.0-style MPI_INFO_SET / MPI_GET_INFO_STRING. I.e., there can be a little extra logic in the top MPI layer for handling the MPI-3.1 and MPI-4.0 semantics (because the OPAL layer will be a dumb char-and-length storage/lookup layer).
Thoughts?
| * We have found the element, so we can return the value | ||
| * Set the flag and value | ||
| * NOTE: the interface states that "'valuelen' should be one less than | ||
| * the allocated space to allow for the null terminator." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What interface are you referring to here? This is a static function in opal/util/info.c. There is no comment describing the interface to opal_info_get_nolock() as far as I can see...?
Are you referring to MPI_INFO_GET? Because if so, MPI_INFO_GET != opal_info_get_nolock().
| struct opal_info_t { | ||
| opal_list_t super; | ||
| opal_mutex_t *i_lock; | ||
| opal_mutex_t *i_lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to worry about this PR, but in the future, it would be nice if whitespace/style/comment typo fixes could be separated out into their own commits.
|
|
||
| if (NULL != fh) { | ||
| char char_stripe[MPI_MAX_INFO_VAL]; | ||
| char char_stripe[128]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your discussion of this in the PR description, but I guess I still don't understand why this is necessary...? Hard-coding array lengths is, in general, "bad".
Going from MPI_MAX_INFO_VAL (255) to 128 saves a few bytes, but is it significant? These are stack variables, after all -- it's not like we're allocating these strings and leaving them around/needlessly consuming extra memory that we know is not going to be used.
| char stdin_target[OPAL_MAX_INFO_VAL+1]; | ||
| char params[OPAL_MAX_INFO_VAL+1]; | ||
| char mapper[OPAL_MAX_INFO_VAL+1]; | ||
| char slot_list[OPAL_MAX_INFO_VAL+1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my main question about this PR.
|
Can one of the admins verify this patch? |
|
Sorry for the long silence on this one, too many other things crossed my path. While I agree with @jsquyres that info handling in MPI in particular and string handling in C are a mess, I think both interfaces propose here are not a cure for our problem, but rather a band-aid with plenty of room for error. Even with the interface proposed by @jsquyres the problem that the call has to allocate the right amount of memory remains. Taking a step back here: why do we copy the string in The way to do that is by storing both strings in ref-counted objects and making them struct opal_info_key_t {
opal_object_t super;
const size_t len; // length of the info key
const char key[OPAL_MAX_INFO_KEY + 1];
};
struct opal_info_value_t {
opal_object_t super;
const size_t len; // length of the info key
const char value[]; // value, flexible array member
};
struct opal_info_entry_t {
opal_list_item_t super; /**< required for opal_list_t type */
opal_info_value_t *ie_value; /**< value part of the (key, value) pair. */
opal_info_key_t *ie_key; /**< "key" part of the (key, value)
* pair */
};Setting a new value for an existing info key simply becomes: int opal_info_set_nolock (opal_info_t *info, const char *key, const char *value) {
opal_info_entry_t *old_info;
old_info = info_find_key (info, key);
if (NULL != old_info) {
size_t len = strlen(value);
// allocate space for the object and the string
opal_info_value_t *info_val = malloc(sizeof(*info_val) + len);
OBJ_CONSTRUCT(info_val, opal_info_value_t);
// have to cast away const here
strcpy((char*)&info_val->val, value);
*(size_t*)(&info_val->len) = len;
// release the old object and set the new one
// info_old->ie_value will remain valid until the last user has released it
OBJ_RELEASE(old_info->ie_value);
old_info->ie_value = info_val;
} else {
// set a new value, similar to the above
}Querying an info value will return an opal_info_value_t
opal_info_get_nolock (opal_info_t *info, const char *key)
{
opal_info_entry_t *search;
search = info_find_key (info, key);
if (NULL == search){
return NULL;
} else {
OBJ_RETAIN(search->ie_value);
return search->ie_value;
}
}A caller may then look at the returned object and finally release it: opal_info_value_t *info_val = opal_info_get(info, "info_key");
if (info_val != NULL) {
if (0 == strcmp(info_val->value, "true")) {
// do whatever you have to do
}
OBJ_RELEASE(info_val);
}So: no temporary memory to be allocated (either on stack or heap), no getting the string terminator right, no string copying. Of course, the caller could cast away the As for the upper MPI layer: with the provided I admit that we end up with slightly more objects that have to be allocated but info object modification should not be in the critical path so I don't think that is a major concern. The proposed interface might make info querying faster by avoiding copying of the string. I'd like to get some freedback from @jsquyres and anyone else who wants to chime in before making such a sweeping change (plenty of places to touch throughout the code). I think we should get a fix in for 5.0 and I'm happy to work on it. |
|
In general, I think @devreal's proposal sounds fine: have the OPAL info stuff do some bare-bones minimum of handling, and the MPI layer can add on additional semantics necessary for A few minor points:
|
|
Superseded by #8330, closing. |
This PR fixes the handling of info values throughout the code. In particular:
opal_info_getto always copy the full info value but could lead to out-of-bounds reads and potentially non-terminated strings because it does not match the description of the interface in the header file. The interface expects a buffer ofvaluelen+1bytes, wherevaluelenis the maximum expected length of the info value. The info value may be shorter than that.OPAL_MAX_INFO_VALstring length. In ompio, I opted to limit the length of the buffer to 128B where it is clear that the info value should contain a numeric value (e.g., stripe size). If the value is larger than that it cannot be properly handled anyway. I'm happy to revert that change (the previous implementation was missing the extra byte though).The PR also fixes some whitespace issues I found (tabs and trailing spaces). I'll be happy to take them out if that is an issue.
Fixes #7961