Go back to using opal_string_copy in opal_info_get #7972

devreal · 2020-07-29T16:53:12Z

This PR fixes the handling of info values throughout the code. In particular:

It reverts a "fix" introduced in 9b88e60, which was meant to force opal_info_get to always copy the full info value but could lead to out-of-bounds reads and potentially non-terminated strings because it does not match the description of the interface in the header file. The interface expects a buffer of valuelen+1 bytes, where valuelen is the maximum expected length of the info value. The info value may be shorter than that.
It fixes the allocation of stack buffers in several places to include the extra byte required for the null-terminator while allowing for the full OPAL_MAX_INFO_VAL string length. In ompio, I opted to limit the length of the buffer to 128B where it is clear that the info value should contain a numeric value (e.g., stripe size). If the value is larger than that it cannot be properly handled anyway. I'm happy to revert that change (the previous implementation was missing the extra byte though).
It fixes the info handling in the osc/portals4 component, which was too complicated and tried to read non-existing mca parameters.

The PR also fixes some whitespace issues I found (tabs and trailing spaces). I'll be happy to take them out if that is an issue.

Fixes #7961

This reverts commit 9b88e60. Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

ompi/mca/sharedfp/individual/sharedfp_individual.c

edgargabriel

for the ompio functionality I am fine with the changes.

bosilca · 2020-07-30T03:12:55Z

You need to check with a Fortran expert here before such a major change, as the manipulation of the info in Fortran is "different". Let me tag @jsquyres our in-house Fortran guru.

devreal · 2020-08-05T13:03:22Z

I tried to understand the Fortran string handling and it looks to me like MPI_Info_get in both mpi-f.h and mpi-f08 is mapped to ompi_info_get_f, which allocates a static char array of size MPI_MAX_INFO_VAL+1 that is eventually passed to opal_info_get and later copied to the Fortran string. That should be safe with the changes in this PR. Maybe @jsquyres can confirm that?

jsquyres

First, I applaud this PR for trying to clean up our info string handling. It's currently somewhat of a mess, and really should be cleaned up. The fact that most real-world MPI applications don't seem to bump up against MPI_MAX_INFO_VAL and MPI_MAX_INFO_KEY doesn't change the fact that our info implementation is... messy.

Second, let me confirm: yes, the Fortran handling is as @devreal describes. The one difference to keep in mind between the C and Fortran handling of strings is that C expects strings to be \0-terminated, but Fortran expects them to be space-padded on the right. Meaning: Fortran treats string copies more like memcpy -- there's a pointer and a length. E.g., if you have an 8-character string in Fortran and you put the value "dog" in there, the memory will be exactly 8 bytes long and its contents are guaranteed to be: dog<SP><SP><SP><SP><SP> (vs. C, where the contents of memory will be dog\0****, where * can be any value).

All this being said, I'm not convinced that this PR is taking the right tact.

The real issue is that MPI-3.1 does not define that info keys or values must be \0-terminated in C. Indeed, it makes very confusing/ambiguous statements about the "end of string character" (it doesn't mandate that they are present, but does insist that the valuelen does not include them). I think some improvements may have been made in MPI-4.0, but we'll still have the legacy issue about what to do for MPI-3.1.

MPI-3.1 MPI_INFO_GET says:

valuelen is the number of characters available in value. If it is less than the actual size of the value, the value is truncated. In C, valuelen should be one less than the amount of allocated space to allow for the null terminator.

Note that this doesn't mandate that \0 is there (yes, it's a mess).

Also note that in Open MPI:

OPAL_INFO_MAX_VAL is 256
MPI_INFO_MAX_VAL is 256 (i.e., it's defined to be OPAL_INFO_MAX_VAL) in C and 255 in Fortran (i.e., it's a constant equal to OPAL_INFO_MAX_VAL-1)

But with this change, we have to add one to OPAL_INFO_MAX_VAL everywhere (in C) to make it long enough. This is problematic for two reasons:

The name OPAL_INFO_MAX_VAL is a lie: it's not the "max" length. It's one less than the max length. My $0.02 is that this degrades the readability of this already-confusing code.
C MPI applications cannot use char foo[MPI_INFO_MAX_VAL] because it's too short.

The second point is what kills the approach of this PR for me.

Why don't we change the internals of OPAL info handling to be more like a memcpy with an associated length? E.g., something like:

typedef struct {
    char *key, *value;
    // MPI-3.1 does not guarantee that key or value are \0-terminated,
    // so we maintain their lengths manually.  These length values
    // will include the \0 _if it is present_.
    size_t key_len_in_bytes, value_len_in_bytes;
} opal_info_entry_t;

typedef struct {
    opal_list_t info_entries;
    // ...
} opal_info_t;

void opal_info_get(opal_info_t *info, const char *key, char *value, int value_len_in_bytes, ...)
{
    opal_info_entry_t *entry = opal_find_info_entry(info, key);
    if (NULL == entry) return;
    if (value_len_in_bytes < entry->value_len_in_bytes) {
        memcpy(value, entry->value, value_len_in_bytes);
    } else {
        memcpy(value, entry->value, entry->value_len_in_bytes);
    }
}

This way, the OPAL info implementation can store however many bytes are relevant for the info and key, and we always know exactly how long they are (including the \0, if it is present).

...I just read the changes in MPI-4.0 for info:

MPI_INFO_SET defines that C key and value strings are null-terminated.
- That clears up some of the ambiguities.
- Meaning: if the user declares char foo[MPI_INFO_MAX_VAL], that must include the \0 character.
MPI_INFO_GET was deprecated (i.e., not fixed / ambiguities not clarified).
MPI_INFO_GET_VALUELEN was deprecated (i.e., not fixed / ambiguities not clarified).
MPI_INFO_GET_STRING was introduced. It seems to have clear descriptions of string handling, including the \0 character.

I think that the implementation I proposed above will allow for both the MPI-3.1-style MPI_INFO_GET / MPI_INFO_GET_VALUELEN and the new MPI-4.0-style MPI_INFO_SET / MPI_GET_INFO_STRING. I.e., there can be a little extra logic in the top MPI layer for handling the MPI-3.1 and MPI-4.0 semantics (because the OPAL layer will be a dumb char-and-length storage/lookup layer).

Thoughts?

jsquyres · 2020-09-18T13:13:15Z

opal/util/info.c

         * We have found the element, so we can return the value
         * Set the flag and value
+         * NOTE: the interface states that "'valuelen' should be one less than
+         *       the allocated space to allow for the null terminator."


What interface are you referring to here? This is a static function in opal/util/info.c. There is no comment describing the interface to opal_info_get_nolock() as far as I can see...?

Are you referring to MPI_INFO_GET? Because if so, MPI_INFO_GET != opal_info_get_nolock().

jsquyres · 2020-09-18T13:14:23Z

opal/util/info.h

 struct opal_info_t {
  opal_list_t   super;
-  opal_mutex_t	*i_lock;
+  opal_mutex_t *i_lock;


Not to worry about this PR, but in the future, it would be nice if whitespace/style/comment typo fixes could be separated out into their own commits.

jsquyres · 2020-09-18T13:16:25Z

ompi/mca/common/ompio/common_ompio_file_open.c


   if (NULL != fh) {
-       char char_stripe[MPI_MAX_INFO_VAL];
+       char char_stripe[128];


I see your discussion of this in the PR description, but I guess I still don't understand why this is necessary...? Hard-coding array lengths is, in general, "bad".

Going from MPI_MAX_INFO_VAL (255) to 128 saves a few bytes, but is it significant? These are stack variables, after all -- it's not like we're allocating these strings and leaving them around/needlessly consuming extra memory that we know is not going to be used.

jsquyres · 2020-09-18T13:17:15Z

ompi/dpm/dpm.c

+    char stdin_target[OPAL_MAX_INFO_VAL+1];
+    char params[OPAL_MAX_INFO_VAL+1];
+    char mapper[OPAL_MAX_INFO_VAL+1];
+    char slot_list[OPAL_MAX_INFO_VAL+1];


See my main question about this PR.

lanl-ompi · 2020-10-25T21:48:55Z

Can one of the admins verify this patch?

devreal · 2020-11-22T12:59:12Z

Sorry for the long silence on this one, too many other things crossed my path. While I agree with @jsquyres that info handling in MPI in particular and string handling in C are a mess, I think both interfaces propose here are not a cure for our problem, but rather a band-aid with plenty of room for error. Even with the interface proposed by @jsquyres the problem that the call has to allocate the right amount of memory remains.

Taking a step back here: why do we copy the string in opal_info_get in the first place? From what I can see it is so that a) we maintain the string in a consistent state even if other threads set a new value; and b) to prevent callers from modifying the value and key stored in the info object. For OMPI-internal use a) is legitimate, while I believe that b) is less of a problem (given that we have full control over our code and code reviews should catch any nefarious behavior). At the same time, all of the OMPI-internal use cases I have seen are reading and comparing the info value, never modifying it. So instead of copying the string into a buffer that is provided by the caller, why not return an immutable info value object that is guaranteed to stay valid for the duration the caller uses them?

The way to do that is by storing both strings in ref-counted objects and making them const char[]:

struct opal_info_key_t {
    opal_object_t super;
    const size_t len; // length of the info key
    const char key[OPAL_MAX_INFO_KEY + 1];
};

struct opal_info_value_t {
    opal_object_t super;
    const size_t len; // length of the info key
    const char value[]; // value, flexible array member
};

struct opal_info_entry_t {
    opal_list_item_t super; /**< required for opal_list_t type */
    opal_info_value_t *ie_value; /**< value part of the (key, value) pair. */
    opal_info_key_t *ie_key; /**< "key" part of the (key, value)
                                     * pair */
};

Setting a new value for an existing info key simply becomes:

int opal_info_set_nolock (opal_info_t *info, const char *key, const char *value) {
    opal_info_entry_t *old_info;

    old_info = info_find_key (info, key);
    if (NULL != old_info) {
        size_t len = strlen(value);
        // allocate space for the object and the string
        opal_info_value_t *info_val  = malloc(sizeof(*info_val) + len);
        OBJ_CONSTRUCT(info_val, opal_info_value_t);
        // have to cast away const here
        strcpy((char*)&info_val->val, value);
        *(size_t*)(&info_val->len) = len;
        // release the old object and set the new one
        // info_old->ie_value will remain valid until the last user has released it
        OBJ_RELEASE(old_info->ie_value);
        old_info->ie_value = info_val;
    } else {
        // set a new value, similar to the above
    }

Querying an info value will return an opal_info_value_t object:

opal_info_value_t
opal_info_get_nolock (opal_info_t *info, const char *key)
{
    opal_info_entry_t *search;

    search = info_find_key (info, key);

    if (NULL == search){
        return NULL;
    } else {
        OBJ_RETAIN(search->ie_value);
        return search->ie_value;
    }
}

A caller may then look at the returned object and finally release it:

opal_info_value_t *info_val = opal_info_get(info, "info_key");
if (info_val != NULL) {
    if (0 == strcmp(info_val->value, "true")) {
        // do whatever you have to do
    }
    OBJ_RELEASE(info_val);
}

So: no temporary memory to be allocated (either on stack or heap), no getting the string terminator right, no string copying. Of course, the caller could cast away the const of info_val->value but we should be able to trust our own code not to do such nefarious things. If the caller requires a string that is mutable (e.g., to pass it to strtok) then they can strdup it into their own memory and work on that. For the vast majority of use cases, this is not actually necessary.

As for the upper MPI layer: with the provided opal_info_value_t, the upper layers can handle the string as they need to: the Fortran interface might handle it differently than the C interface and MPI_INFO_GET might handle it differently than MPI_INFO_GET_STRING. OPAL just doesn't have to care anymore and we eliminate all the sources of errors in the internal info handling.

I admit that we end up with slightly more objects that have to be allocated but info object modification should not be in the critical path so I don't think that is a major concern. The proposed interface might make info querying faster by avoiding copying of the string.

I'd like to get some freedback from @jsquyres and anyone else who wants to chime in before making such a sweeping change (plenty of places to touch throughout the code). I think we should get a fix in for 5.0 and I'm happy to work on it.

jsquyres · 2020-11-30T13:31:06Z

In general, I think @devreal's proposal sounds fine: have the OPAL info stuff do some bare-bones minimum of handling, and the MPI layer can add on additional semantics necessary for MPI_Info functionality. I think that was the original design, but the two layers were a little more dependent upon each other than we intended.

A few minor points:

I'm a little confused why there is a different treatment of char array lengths between keys and values. Why is one a hard-coded length and the other is not?
Also, I see the use of OPAL_MAX_INFO_KEY + 1 which I think is misleading. Is it the max length or not?
I have a friendly amendment proposal: what about making an opal_string_t class?
- It is largely like what @devreal proposed for his key/value classes, but with a few minor differences.
- There is no length / limit on the internal char array (it will likely just be a pointer).
- The opal_string_t class will take care of all memory allocation / reallocation / freeing, analogous to C++ std::string and string entities in other languages.
- And since the OPAL info handling will have no concept of max lengths for keys/values, the MPI layer can take care to never set a key longer than MPI_MAX_INFO_KEY or a value longer than MPI_MAX_INFO_VAL per the MPI-3.1 and MPI-4.0 definitions, as relevant.

devreal · 2021-03-12T20:23:31Z

Superseded by #8330, closing.

devreal added 4 commits July 29, 2020 18:49

Revert "info_get: ensure to copy all requested characters"

4ffc8d9

This reverts commit 9b88e60. Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

Properly copy all necessary characters of info keys/values

2bda5fe

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

Fix buffer sizes for info value handling

d5f5422

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

osc/portals4: fix info handling and remove stale code

3a7909f

Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

devreal added bug enhancement code cleanup Target: main labels Jul 29, 2020

devreal requested review from edgargabriel and jsquyres July 29, 2020 16:53

edgargabriel reviewed Jul 29, 2020

View reviewed changes

ompi/mca/sharedfp/individual/sharedfp_individual.c Show resolved Hide resolved

edgargabriel approved these changes Jul 29, 2020

View reviewed changes

jsquyres reviewed Sep 18, 2020

View reviewed changes

devreal added the ⚠️ WIP-DNM! label Nov 22, 2020

devreal mentioned this pull request Jan 2, 2021

Use reference-counted immutable strings with info objects #8330

Merged

devreal closed this Mar 12, 2021

Go back to using opal_string_copy in opal_info_get #7972

Go back to using opal_string_copy in opal_info_get #7972

Uh oh!

Conversation

devreal commented Jul 29, 2020

Uh oh!

Uh oh!

edgargabriel left a comment

Choose a reason for hiding this comment

Uh oh!

bosilca commented Jul 30, 2020

Uh oh!

devreal commented Aug 5, 2020

Uh oh!

jsquyres left a comment

Choose a reason for hiding this comment

Uh oh!

jsquyres Sep 18, 2020

Choose a reason for hiding this comment

Uh oh!

jsquyres Sep 18, 2020

Choose a reason for hiding this comment

Uh oh!

jsquyres Sep 18, 2020

Choose a reason for hiding this comment

Uh oh!

jsquyres Sep 18, 2020

Choose a reason for hiding this comment

Uh oh!

lanl-ompi commented Oct 25, 2020

Uh oh!

devreal commented Nov 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsquyres commented Nov 30, 2020

Uh oh!

devreal commented Mar 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

devreal commented Nov 22, 2020 •

edited

Loading