-
Notifications
You must be signed in to change notification settings - Fork 898
doc: Make setting MCA parameter more prominent #11459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bot:aws:retest |
bot:ibm:retest |
45d88ee
to
cb0ecbb
Compare
201fa0d
to
2ce5ac7
Compare
8872231
to
c78df2d
Compare
I also added a note in the doc that explains the deprecated --am (AMCA) option. |
c78df2d
to
096d739
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that you verified that --tune a,b,c
and --mca mca_param_files x,t,z
actually work.
Does not exist - only "tune" is recognized |
I verified the tune section, didn't check the mca param files - I can go verify and modify the PR |
I see in the code "mca_param_files" is a deprecated synonym of "mca_base_var_files"...but it doesn't seem like that works either. I can't see why it's not being picked up from the code itself without spending time debugging it. I'll just remove the mca param files section from the quickstart and create an issue about it |
IIRC, it was discussed and decided that "tune" was good enough - there was no value in having multiple names for the exact same thing. |
Issue: #11532 |
@rhc54 that's understandable, but shouldn't the parameter be removed from the code and ompi_info? |
What code are you referring to? I suspect there is some stale entry in the OMPI code base (probably down in OPAL) from the time when the RTE was an integral part of OMPI. I can find nothing in PRRTE (where those params are now read) that references it. |
In opal/mca/base/mca_base_var.c the param is registered:
and there is logic in this file for reading this file and such |
Did the entire mca_param_file concept disappear? IE. is the openmpi-mca-params.conf file now ignored? That's a significant change if so |
Let's not overgeneralize, please. I didn't say it was gone. What I said was that we had three names for the same thing - "tune", "mca_param_files", and "mca_base_param_files". We also have the system default MCA param file, which is the "openmpi-mca-params.conf", and a user-level default MCA param file. We consolidated the first three (tune, etc) in the RTE down to just "tune". Those files and the default param files (both system and user) are read by the RTE (using the PMIx code so it extends to direct launched apps) and propagated. We set an envar to indicate that we have done this so that OPAL won't duplicate the effort. This also provides for more scalable launch as we don't have every process opening these files. The code in OPAL therefore won't do anything if the app is launched via mpirun. It should, however, process the file if it the app is direct launched and the direct launcher and OMPI fail to tell the PMIx library it is an OMPI-based app (in which case the PMIx layer will take care of it). If the schizo/ompi component isn't aliasing the |
096d739
to
4282547
Compare
Updated and removed the mca_param_files options |
This reformats the MCA section to be more inline with the rest of the sections. It also adds a quickstart section for easier user visibility of setting MCA parameters. Also verifies TODO's and incorrect sections as well as adds a brief note about the deprecated --am option. Signed-off-by: William Zhang <wilzhang@amazon.com>
4282547
to
2db3828
Compare
@jsquyres can you take a look again? |
Asking since I made some changes in regards to the mca param files in response to testing + Ralph's comments |
@wckzhang It sounds like the real answer is to fix schitzo:
Sounds like it should be a pretty simple change to make (i.e., add aliases for legacy behavior) -- can you do so? I'd be a bit wary of having |
So a little word of caution. I grok why you want to do this, but here is why we made the change. Imagine the case where a user specifies tune=A.txt, mca_param_files=B.txt, and mca_base_param_files=C.txt. I know that sounds silly, but it actually happens in the wild because the user manually sticks one in their environment, not realizing (or forgetting) that another was set in a login file. Or someone will add one to their login file, forgetting that they already specify the other param elsewhere in the file. Regardless, you now have to deal with the precedence issue. If the files contain no overlap (i.e., the same param isn't set in more than one file), then you can just aggregate them. However, if they do overlap, then you have to decide what to do about it. You could error out, but that breaks some things that were previously allowed - but if you don't, then the final result depends upon the order in which you process them. Add in that just declaring an MCA synonym doesn't help - the system doesn't know what to do with the case where someone sets a param by its new name, and also sets the synonym's name (IIRC, it just keeps whichever one it sees last - worth checking). It's a non-trivial problem. 🤷♂️ |
@jsquyres note that it's not a simple alias change. The format of tune files and mca param files are different, so they can't be aliased |
Well, if it is any comfort, that statement isn't actually supposed to be true. If you look at the code, you'll see that the old aggregate MCA param file is simply deprecated back to the However, amusingly, neither the
So I guess you can do whatever you want with it from the cmd line - either fix the doc or fix the code. I have to correct myself as well. The We do pass The whole thing seems rather a mess, frankly. I can cleanup PMIx and the |
@wckzhang are you looking at this? |
@lrbison agreed to open another issue for mca params and point to this doc issue. Thanks |
There is already an issue opened for this: #11532 |
@jsquyres since we have decided against using mca_param_files, I think this PR can be merged now unless you have any issues? |
@jsquyres Do we still want this change? |
Add link to the quick start doc for more user visibility.