Skip to content

opal/common/ofi: do not compute device distances if process is unbound #11711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

wenduwan
Copy link
Contributor

2nd attempt to skip device distance calculation if process is unbound. The previous change was not sufficient and we still saw pmix errors, e.g.

[queue-c5n18xlarge-st-c5n18xlarge-1:07220] PMIX ERROR: ERROR in file client/pmix_client_topology.c at line 352

wenduwan added 2 commits May 23, 2023 01:23
PMIX_CPUSET_CONSTRUCT is deprecated. Remove its usage

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
If process is unbound there is no deterministic device distance

Signed-off-by: Wenduo Wang <wenduwan@amazon.com>
@wenduwan wenduwan requested a review from lrbison May 23, 2023 01:57
@wenduwan
Copy link
Contributor Author

@rhc54 Could you take a look at this patch?

@wenduwan wenduwan added the bug label May 23, 2023
@rhc54
Copy link
Contributor

rhc54 commented May 23, 2023

@rhc54 Could you take a look at this patch?

Will try to do so later today, but may be tomorrow

@rhc54
Copy link
Contributor

rhc54 commented May 23, 2023

I can't say if the computation is correct if the distances array is NULL, but this is correct in saying that the array should be NULL if the proc isn't bound. You cannot define distances in that situation.

@wenduwan
Copy link
Contributor Author

Related to #11637

@wenduwan
Copy link
Contributor Author

@rhc54 Thanks for the review. Do you see anything amiss? I would like to get this patch in.

Meanwhile I'm working on #11689 - I might bring the bound check even earlier. But that's separate.

@wenduwan
Copy link
Contributor Author

Closing in favor of #11689

@wenduwan wenduwan closed this May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants