-
Notifications
You must be signed in to change notification settings - Fork 902
v3.1.x: REF6976 Silent failure of OMPI over OFI with large messages sizes #7003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ompi/mca/mtl/ofi/mtl_ofi.h
Outdated
@@ -247,13 +247,20 @@ ompi_mtl_ofi_send_start(struct mca_mtl_base_module_t *mtl, | |||
endpoint = ompi_mtl_ofi_get_endpoint(mtl, ompi_proc); | |||
|
|||
ompi_ret = ompi_mtl_datatype_pack(convertor, &start, &length, &free_after); | |||
if (OMPI_SUCCESS != ompi_ret) return ompi_ret; | |||
if (OPAL_UNLIKELY(OMPI_SUCCESS != ompi_ret)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a very small number of style guidelines (https://github.com/open-mpi/ompi/wiki/CodingStyle), but one of them is: absolutely no tabs whatsoever. Always spaces. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! okay.
ompi/mca/mtl/ofi/mtl_ofi.h
Outdated
"message too big", false, | ||
length, endpoint->mtl_ofi_module->max_msg_size); | ||
return OMPI_ERROR; | ||
} else if (OPAL_UNLIKELY(MCA_PML_BASE_SEND_SYNCHRONOUS == mode)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why "else if" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly personal preference - I tend to prefer the form
if (A) { return fail; } else if (B) { return fail; }
to
if (A) { return fail; } if (B) { return fail; }
but that's purely cosmetic. I can change it if you like.
Note that this patch is for 3.1.x only. The patch for 4.x/5.x/master needs to be slightly different. |
@mwheinz first need to open a PR on master then we cherry pick to release branches. |
Okay. |
Are there any other issues I should look at before I submit the master branch version? |
@mwheinz, If you can get the v4.0.x PR done soon (presumably a cherry-pick -x from master if possible), we will get it into the next v4.0.2 rc. |
No, I think you're good. I should have mentioned this in my first review: we generally commit stuff to master first, and then So once the master one is merged, can you amend this commit with a cherry-picked message (like it would have been if you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good code-wise, now; thanks. Just need to get the cherry-pick notice in the commit message once #7004 is merged to master.
INTERNAL: STL-59403 The OFI (libfabric) MTL does not respect the maximum message size parameter that OFI provides in the fi_info data. This patch adds this missing max_msg_size field to the mca_ofi_module_t structure and adds a length check to the low-level send routines. (cherry-picked from commit 3aca4af) Change-Id: I65ab130794574393edab4a318365c1c125e50712 Signed-off-by: Michael Heinz <michael.william.heinz@intel.com> Reviewed-by: Adam Goldman <adam.goldman@intel.com> Reviewed-by: Brendan Cunningham <brendan.cunningham@intel.com>
Will do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Master PR #7004 merged. The cherry-pick message is now good/valid. Good to go.
@mwheinz Do you plan to make a v3.0.x PR? (we usually make v3.0.x, v3.1.x, and v4.0.x PRs unless there's a reason not to go back that far) |
I'd prefer we not merge this into release branches until my comment on the master pull request (#7004) is discussed. I don't think this was the right fix. |
Good to go for v3.1.x. |
INTERNAL: STL-59403
The OFI (libfabric) MTL does not respect the maximum message size
parameter that OFI provides in the fi_info data.
This patch adds this missing max_msg_size field to the mca_ofi_module_t
structure and adds a length check to the low-level send routines.
(cherry-picked from commit 3aca4af)
Change-Id: I65ab130794574393edab4a318365c1c125e50712
Signed-off-by: Michael Heinz michael.william.heinz@intel.com
Reviewed-by: Adam Goldman adam.goldman@intel.com
Reviewed-by: Brendan Cunningham brendan.cunningham@intel.com