-
Notifications
You must be signed in to change notification settings - Fork 143
Add support for OAR Scheduler #1744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…key (short or long parameter syntax)
318bd7c to
c167c62
Compare
0cd5f6d to
bf2b368
Compare
|
Hi, thanks for contributing this. The steps to follow are the following:
If all work well, we can add an entry in the readme that point to your plugin. |
a03259d to
f66da09
Compare
|
Hello, thanks for your review and your proposal about the plugin. Here is the repository: https://github.com/ychiat35/submitit_oar. I will try to add some CI/CD actions for tests and package releases. About this point:
have you thinked about some CI tests for OAR (and Slurm), similarly to what is done for Slurm and SGE clusters on Dask-jobqueue repository: https://github.com/dask/dask-jobqueue/blob/main/ci/slurm/docker-compose.yml ? maybe it will be a good way to test real jobs launched on OAR/Slurm clusters. |
|
Hello, We'd like to inform you that we have successfully integrated the submitit_oar plugin into the Grid5000 repositories, at this link: Grid5000/submitit_oar. Additionally, we have released a new version of the plugin on PyPi, accessible here: submitit_oar 1.1.1. The integration of the submitit_oar plugin has been smooth, and it seamlessly aligns with the Submitit's plugin system. To finalize the pull request, we'd like to confirm if you're still fine with us submitting a PR to update the readme to mention our plugin. Thanks a lot for your feedback. |
The Oar scheduler is widely used in France, including mesocentre supercomputers (e.g., GRICAD), INRIA supercomputers, Grid5000 testbed and other plateforms.
This PR adds support for the OAR Scheduler as a plugin. Four main classes have been implemented in
oar.py(following the previous implementation made for slurm):oarstatcommand (similar to thesinfocommand on the Slurm scheduler).Unit tests were created in
test_oar.pyandtest_auto.pyto ensure that the OAR plugin offers the same basic functionalities as the Slurm plugin.A few notes about the implementation:
_equivalence_dictdictionary). Additional OAR parameters can be set with theadditional_parametersdictionary._make_submission_commandmethod in the OarExecutor class is overridden from PicklingExecutor. The content of the file is read and the job is submitted using the OAR "inline command" instead of using the submission file.scontrol(i.e.,oarsub) is not available on nodes. To automatically requeue the job after preemption, the original job must be submitted with theidempotenttype and be exited with the99code.Our implemented OAR plugin covers most of submitit features (e.g., job submission, checkpointing, job array). The only feature that we did not address is the task submission. Indeed, contrary to Slurm, OAR does not provide such a feature. We believe a workaround could be implemented in another iteration. Meanwhile, we raise a "NotImplemeted" error if a user attempts to use such a feature.