-
Notifications
You must be signed in to change notification settings - Fork 934
osc/sm: Fix a bus error on MPI_WIN_{POST,START}. #1259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@miked-mellanox could you check why the |
|
@kawashima-fj @hjelmn wouldn't it be easier to have the posts at the beginning ? diff --git a/ompi/mca/osc/sm/osc_sm_component.c b/ompi/mca/osc/sm/osc_sm_component.c
index 2e2948e..91da943 100644
--- a/ompi/mca/osc/sm/osc_sm_component.c
+++ b/ompi/mca/osc/sm/osc_sm_component.c
@@ -280,9 +280,9 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit
module->posts = calloc (comm_size, sizeof (module->posts[0]));
if (NULL == module->posts) return OMPI_ERR_TEMP_OUT_OF_RESOURCE;
- module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
+ module->posts[0] = (uint64_t *) (module->segment_base);
+ module->global_state = (ompi_osc_sm_global_state_t *) (module->posts + comm_size * post_size);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
- module->posts[0] = (uint64_t *) (module->node_states + comm_size);
for (i = 0, total = state_size + posts_size ; i < comm_size ; ++i) {
if (i > 0) { |
|
Yes, it will be good. |
|
In Gilles' patch, diff --git a/ompi/mca/osc/sm/osc_sm_component.c b/ompi/mca/osc/sm/osc_sm_component.c
index 2e2948e..91da943 100644
--- a/ompi/mca/osc/sm/osc_sm_component.c
+++ b/ompi/mca/osc/sm/osc_sm_component.c
@@ -280,9 +280,9 @@ component_select(struct ompi_win_t *win, void **base, size_t size, int disp_unit
module->posts = calloc (comm_size, sizeof (module->posts[0]));
if (NULL == module->posts) return OMPI_ERR_TEMP_OUT_OF_RESOURCE;
- module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base);
+ module->posts[0] = (uint64_t *) (module->segment_base);
+ module->global_state = (ompi_osc_sm_global_state_t *) (module->posts[0] + comm_size * post_size);
module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1);
- module->posts[0] = (uint64_t *) (module->node_states + comm_size);
for (i = 0, total = state_size + posts_size ; i < comm_size ; ++i) {
if (i > 0) { |
|
@hjelmn Can you have a look at this? |
|
:bot:retest: |
|
Hmm, either I can commit the corrected patch or you can update this PR with that patch. |
A bus error occurs in sm OSC under the following conditions. - sparc64 or any other architectures which need strict alignment. - `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created by sm OSC. - The communicator size is odd and greater than 3. The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has the following code. ```c module->global_state = (ompi_osc_sm_global_state_t *) (module->segment_base); module->node_states = (ompi_osc_sm_node_state_t *) (module->global_state + 1); module->posts[0] = (uint64_t *) (module->node_states + comm_size); ``` The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not multiples of 8. So if `comm_size` is odd, `module->posts[0]` does not aligned to 8. This causes a bus error when accessing `module->posts[i][j]`. This patch fixes the alignment of `module->posts[0]` by setting `module->posts[0]` first.
50bc645 to
ad26899
Compare
|
@hjelmn Please review again. I pushed a new commit based on Gilles' patch. |
|
@hjelmn Can you have a look at this? |
|
@hjelmn what is the status of this PR ? |
|
Need to retest. If it comes back clean will merge and PR to 2.x. :bot:retest: |
osc/sm: Fix a bus error on MPI_WIN_{POST,START}.
…mmon-symbols-if-git Makefile.am: only check for common symbols on dev builds
A bus error occurs in sm OSC under the following conditions.
MPI_WIN_POSTorMPI_WIN_STARTis called for a window created by sm OSC.The lines 283-285 in current
ompi/mca/osc/sm/osc_sm_component.chas the following code.The size of
ompi_osc_sm_node_state_tis multiples of 4 but not multiples of 8. So ifcomm_sizeis odd,module->posts[0]does not aligned to 8. This causes a bus error when accessingmodule->posts[i][j].This patch fixes the alignment of
module->posts[0]by inserting a padding betweennode_statesandmodule->posts[0].@hjelmn Please review.
I don't know whether this is the best fix method.
If you have a better method, tell me or create a new PR please.
v2.x and v1.10 branches have the same problem. I'll create PRs for them if this PR is accepted.