Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Provided MPI_type_get_content array sizes may exceed actual ones, do not use them #13131

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EmmanuelBRELLE
Copy link
Contributor

MPI 4.1 specifications, page 166, lines 24-27:
The values given for max_integers, max_addresses, max_large_counts, and max_datatypes must be at least as large as the value returned in num_integers, num_addresses, num_large_counts, and num_datatypes, respectively, in the call MPI_TYPE_GET_ENVELOPE for the same datatype argument.

…s, do not use them

MPI 4.1 specifications, page 166, lines 24-27:
The values given for max_integers, max_addresses, max_large_counts, and max_datatypes
must be at least as large as the value returned in num_integers, num_addresses,
num_large_counts, and num_datatypes, respectively, in the call MPI_TYPE_GET_ENVELOPE
for the same datatype argument.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@eviden.com>
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this ? ompi_datatype_get_args has a protection against this case, that's why the memcpy is based on the internal value and not on the value provided by the user.

@@ -65,6 +66,13 @@ int MPI_Type_get_contents(MPI_Datatype mtype,
}
}

/* Counts may exceed actual ones, we have no choice but to recompute them */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand that comment: what are the actual ones? The count argument the user provided? Or the

Suggested change
/* Counts may exceed actual ones, we have no choice but to recompute them */
/* Counts may exceed the user-provided counts, we have no choice but to recompute them */

Comment on lines +70 to +75
rc = ompi_datatype_get_args(mtype, 0, &max_integers, NULL, &max_addresses, NULL, &max_datatypes,
NULL, &type_code);
if (rc != MPI_SUCCESS) {
OMPI_ERRHANDLER_RETURN(MPI_ERR_INTERN, MPI_COMM_WORLD, MPI_ERR_INTERN, FUNC_NAME);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of querying the sizes if we don't act on that information? If the user told us the arrays are of a certain size and the type requires more then we should error out, raising MPI_ERR_ARG.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This updates the user provided max lengths before passing them to ompi_datatype_get_args. As if ompi_datatype_get_args was not correctly handling the case where user provides more space than required (which is not the case as I pointed in my prior comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be worried about the converse: the user provides insufficient space and we write past the bounds of the user buffer. That is what I thought this patch addressed. Otherwise I don't see what this patch accomplishes.

@EmmanuelBRELLE
Copy link
Contributor Author

comment

We need that because of this line

for( i = 0; i < max_datatypes; i++ ) {
. The content copy is fine but after copies we destroy datatypes based on the user max_count not the actual content stored in the derived datatype. It may go out of memory (segv)

@bosilca
Copy link
Member

bosilca commented Mar 10, 2025

Make sense, that code needs to know the correct number of datatypes in order to work properly. As we have control over ompi_datatype_get_args we can make it return the correct number of copied elements in all cases.

Also, the code you pointed at is incorrect, in case of error it only frees the current datatype leading to memory leaks. Let me take a stab at it, I'll get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants