-
Notifications
You must be signed in to change notification settings - Fork 2
try to resync prrte fork master with upstream #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Always default the number of slots to the available cpus in the topology. Ensure that we always display some form of the resulting proces map, or else we will silently exit. Signed-off-by: Ralph Castain <rhc@pmix.org>
It should be `help-hostfile.txt`, not `help-hostfiles.txt` Signed-off-by: Ralph Castain <rhc@pmix.org>
If we use one cpu from an object, then we will get a NULL response if we ask for the next object of that type within the remaining cpuset since not all of the cpus in the object are still available. This problem resulted from the recent change to only use available cpus in PRRTE topologies. So instead scan across the cpus, check to see if it is inside the object of interest - if so, then we can bind to that cpu, if not then we keep searching. Signed-off-by: Ralph Castain <rhc@pmix.org>
Attempt to make it clearer that the binding failed due to a lack of cpus for the given map/bind policies. Signed-off-by: Ralph Castain <rhc@pmix.org>
PRRTE itself no longer requires specific resilience settings. Signed-off-by: Ralph Castain <rhc@pmix.org>
Add a new cmd line option that corresponds to this attribute. Add the attribute to the prun payload. When received, it will default to including in the job info for the spawned job. Add query support for it. Signed-off-by: Ralph Castain <rhc@pmix.org>
Homebrew has broken something and I cannot figure out how to fix it. Signed-off-by: Ralph Castain <rhc@pmix.org>
Changes will need to be made to Open MPI to parse the contents of the OMPI_MCA_mpi_memory_alloc_kinds environment variable to determine how to use the user supplied memory-alloc-kinds information. See section 11.4.3 of the MPI 4.1 standard. Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Get takes a (pmix_value_t**), so don't cast it to (void**) Signed-off-by: Ralph Castain <rhc@pmix.org>
If we haven't requested LSF support, then don't warn about not finding yp_all - we didn't ask for LSF, so no need to warn us if support cannot be built. It will show in the summary at end of configure. Signed-off-by: Ralph Castain <rhc@pmix.org>
Now that we have a broader group of contributors starting to show up, we probably need to start paying more attention to code quality of contributions. Enable devel-check by default in Git clones that are configured with enable-debug. Signed-off-by: Ralph Castain <rhc@pmix.org>
Try adding a build using latest Clang Signed-off-by: Ralph Castain <rhc@pmix.org>
When building against older PMIx Signed-off-by: Ralph Castain <rhc@pmix.org>
Need to unpack the ctrls object to maintain pack/unpack ordering. Update the client example to illustrate that all the modex info for a proc is returned upon first request for that proc's info. Signed-off-by: Ralph Castain <rhc@pmix.org>
Refs open-mpi/ompi#12540 Signed-off-by: Ralph Castain <rhc@pmix.org>
It has been reported (and confirmed) that building against one version of PMIx and then running with another version will cause PRRTE to segfault. This isn't a universal rule. For example, one can switch v5.0 and master without a problem. However, switching v5.0 and v4.2 is a definite segfault. The root cause of the problem is a change in the layout of the base pmix_object_t definition. This renders all PMIx objects binary incompatible when crossing between the v5 and v4 (and below) series. Changing the v5 definition back to match v4 is an overly complex task. The changes were required to accommodate the new shared memory support that was introduced in v5. So instead, we check the runtime version of PMIx against the build version. If the runtime version is incompatible with the build version, then we print an explanatory error message and error out. Signed-off-by: Ralph Castain <rhc@pmix.org> dd Signed-off-by: Ralph Castain <rhc@pmix.org>
We had problems in the past with quoted params, but stripping quotes also has consequences - not clear of the best solution. For now, let's try going the other way and see how many problems we encounter. Signed-off-by: Ralph Castain <rhc@pmix.org>
Fix the issues with the MacOS builds so that they work again in Github Action environments. Signed-off-by: Jeff Squyres <jeff@squyres.com>
Enables build against v1.11.8 and above. Signed-off-by: Ralph Castain <rhc@pmix.org>
If we are trying to bind to an HWLOC object type that is not defined on a given node, then (a) if the binding policy was specified by user, then error out; and (b) if we are using a default binding policy, then simply do not bind. Signed-off-by: Ralph Castain <rhc@pmix.org>
In some recent Slurm versions, the Slurm runtime is inserting custom arguments to the PRRTE launcher's `srun` cmd line without the user being aware of it. In many cases, this may not be a problem - but in some cases (where the user or the system admin needs/wants particular cmd line arguments used) this can cause problems as it happens silently, without the user being aware of it. Make this visible when it happens, and provide a mechanism by which the user/admin can override it. Provide a fairly long help message explaining what happened and offering advice on resolution, along with a param for disabling the warning. Add a param for overriding the "args" param if necessary, along with a caution as to possible consequences. Signed-off-by: Ralph Castain <rhc@pmix.org>
RTD is rolling out some changes. Per https://about.readthedocs.com/blog/2024/07/addons-by-default/, these are the changes we need to make. Port of open-mpi/ompi#12687 Signed-off-by: Ralph Castain <rhc@pmix.org>
We currently do not support the LTO optimizer as it is incompatible with our plugin component architecture. So detect it has been specified in configure and error out with an explanation. Includes suggestions from @jsquyres Signed-off-by: Ralph Castain <rhc@pmix.org>
Break the multi-loop thru loading of param files that caused us to overwrite values. Defer to the PMIx pmdl components for obtaining envars and for checking MCA param overlaps across projects. Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Luke Robison <lrbison@amazon.com>
Python 3.12 no longer allows escapes in regular expressions. Instead, use "r" strings. Signed-off-by: Ralph Castain <rhc@pmix.org>
The formatting is messed up in places, so try and fit it. Signed-off-by: Ralph Castain <rhc@pmix.org>
Cleanup mixing of index vs kernel index when calling interface matching routines. Sanitize the passing of the interface argv-array. Signed-off-by: Ralph Castain <rhc@pmix.org>
PMIx supports forward/set, unset, append, and prepend of environmental variables. However, PRRTE didn't provide cmd line parsing support for these operations. PMIx has been extended to do so - add those options to the schizo components. Forward (-x) of envars can be just the envar name (to pickup the local value and forward it), or can be envar=value to set the envar to a specific value. Unset (--unset-env) takes just the name of the envar. Append (--append-env) takes two arguments: * the name of the envar, appended with a "[c]" where the 'c' is the character to be used as the separator between envar values * the value to be appended So it looks like "--append-env FOO[:] 20" Prepend (--prepend-env) behaves exactly like append except it prepends the value to whatever current envar value it finds Multiple instances of any of these options may be present on the cmd line. Each instance will have its arguments appended to the parameter's pmix_cli_item_t's values argv-array. Fix precedence so that app's env overwrites local environment. Signed-off-by: Ralph Castain <rhc@pmix.org>
Use the flags to set the PMIx paths so we can simply use the standard compiler to test for PMIx capability flags. Signed-off-by: Ralph Castain <rhc@pmix.org>
If configure cannot find "pmixcc", then don't attempt to create the "pcc" link as that creates an infinite loop when someone attempts to resolve it. Signed-off-by: Ralph Castain <rhc@pmix.org>
Update the PRRTE submodule to track upstream master with PR, including updating PMIx submodule. Test the build for integration problems. Signed-off-by: Ralph Castain <rhc@pmix.org>
Needs a colon at the end. Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Ralph Castain <rhc@pmix.org>
PMIx master has deprecated the pmix_show_help_add_dir function, so remove it for now. Will replace it with in-memory help messages in a follow-on PR. Signed-off-by: Ralph Castain <rhc@pmix.org>
It isn't possible to install the environments required to test every launcher in PRRTE. What we can do, though, is provide a new configure option "--enable-testbuild-launchers" that will utilize shim headers to allow the components to at least build. Note that we are NOT testing the components - we only verify that they should build. Signed-off-by: Ralph Castain <rhc@pmix.org>
We no longer support solaris, so remove references to it in the configure code. Delete two m4 files that duplicated OAC functions. Signed-off-by: Ralph Castain <rhc@pmix.org>
Show the build time of the docs. Ported from open-mpi/ompi#13236 Signed-off-by: Ralph Castain <rhc@pmix.org>
Store the show-help strings in memory, thereby removing the need to find/read files to generate the full strings. Only works with PMIx versions greater than v3.x. Signed-off-by: Ralph Castain <rhc@pmix.org>
No longer supported Signed-off-by: Ralph Castain <rhc@pmix.org>
PRRTE now requires Python to build when in a Git clone for building the show-help in-memory text (prte_show_help_content.c) and the Sphinx-based documentation pages. Check for an adequate Python version for these purposes. Signed-off-by: Ralph Castain <rhc@pmix.org>
We don't support endianness mixes. However, we can hit situations where the topology is different across the allocation. This isn't just a case of different chips - for example, if a scheduler is allocating at the CPU instead of node level, it might allocate different CPUs on the various nodes. In the eyes of the runtime, this equates to a hetero node situation since the bitmap within the topology of each node will differ. Resolving this required: * fix some logic errors when handling hetero nodes so we don't hang * Add a new "--hetero-nodes" cmd line option to help optimize DVM startup in the case where allocation is being done by CPU - no point in requesting topology from every node in that case, just have each daemon send its topology * Add a new "prte_hetero_nodes" MCA param so that sys admins can declare hetero-nodes in the default param file on systems where the scheduler is allocating by CPU Update show-help and RST files to cover the new option. Signed-off-by: Ralph Castain <rhc@pmix.org>
Need "--hetero-nodes" Signed-off-by: Ralph Castain <rhc@pmix.org>
Decrease the minimum required Python version to v3.6. Note that this only applies when building from a git clone, not a tarball. Ensure we cleanup the show-help content file when doing "make clean". Signed-off-by: Ralph Castain <rhc@pmix.org>
PMIx_Notify_event is a non-blocking API, so we have to "hold" all input data until the callback is received. This includes the procID of the source, so it cannot be a local variable. Signed-off-by: Ralph Castain <rhc@pmix.org>
Correctly implement fwd-env as a runtime-options directive, marking the former "--fwd-environment" cmd line option as deprecated. Let the MCA param set the default behavior. Ensure that child jobs can inherit their parent's setting. Inherit by default unless the spawn request specifies otherwise with "noinherit" directive or provides its own fwd environment directive. Signed-off-by: Ralph Castain <rhc@pmix.org>
Preserve empty lines in the show-help array so that we retain the author's intended formatting when displayed. Signed-off-by: Ralph Castain <rhc@pmix.org>
It looks like the schizo/ompi/schizo-ompi-cli.rstxt grew a reference to the prrte-rst-content/cli-no-app-prefix.rst file in f7cc125, but this file was mistakenly not included to src/docs/prrte-rst-content/Makefile.am's dist_rst_DATA, even though cli-no-app-prefix.rst was already present in the source tree. Signed-off-by: Jeff Squyres <jeff@squyres.com>
Check that we can build OMPI with external copies of PMIx and PRRTE - ensures that documentation is correct. Signed-off-by: Ralph Castain <rhc@pmix.org>
Thanks to @sonjahapp for the report! Signed-off-by: Ralph Castain <rhc@pmix.org>
If a child inherits the fwd environment directive of its parent, then update the child's attributes as well as forwarding its environment so that any subsequent grandchildren also inherit the flag. Add a user-provided reproducer Signed-off-by: Ralph Castain <rhc@pmix.org>
We use the pthread_setaffinity_np function if it is found in the standard pthread library. Apparently, however, some folks split the definition of that function from pthreads.h into a separate header, even though they leave the function itself in the pthread library. Go figure. Port of openpmix/openpmix#3615 Signed-off-by: Ralph Castain <rhc@pmix.org>
Hello! The Git Commit Checker CI bot found a few problems with this PR: ec3f646: Add v4 news file and adjust CI workflows
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
Pickup latest changes Signed-off-by: Ralph Castain <rhc@pmix.org>
Hello! The Git Commit Checker CI bot found a few problems with this PR: ec3f646: Add v4 news file and adjust CI workflows
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
No description provided.