-
Notifications
You must be signed in to change notification settings - Fork 1
Add support to validate canonical jobspec and walk resources #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add support for validating canonical jobspecs in YAML or JSON format. The Flux Jobspec class has a function that validates canonical jobspec and throws errors with specific reasons why an input jobspec is invalid. Integrate this functionality into the flux-validator. Also add support for walking a canonical jobspec and validating the resource counts via `.resource_walk()`. Outputting the counts will provide feedback for an agent to correct a generated canonical jobspec.
Add instructions for validating and counting resources in a canonical jobspec, including an example for overriding the entrypoint.
vsoch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! The count is really cool. A few comments below.
|
|
||
| #### Canonical jobspecs in YAML or JSON format | ||
|
|
||
| ##### Valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow the structure above, let's put this directly as another example under Valid. A comment that it is for a canonical jobspec in json/yaml will suffice to categorize it.
|
|
||
| ##### Valid | ||
| ```bash | ||
| $ docker run -it -v $(pwd):/data ghcr.io/compspec/fractale:flux-validator /data/docker/flux-validator/implicit-slot.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add implicit-slot.yaml to the repository here as an example (and remove from the README below).
|
|
||
| ##### Invalid | ||
| ```bash | ||
| $ docker run -it -v $(pwd):/data ghcr.io/compspec/fractale:flux-validator /data/docker/flux-validator/implicit-slot-invalid.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add implicit-slot-invalid.yaml to the repository too. Feel free to create additional structure for these data files if you think it will better organize.
| self._validate_resource(res) | ||
| File "/usr/lib/python3.10/site-packages/flux/job/Jobspec.py", line 306, in _validate_resource | ||
| raise ValueError("slots must have labels") | ||
| ValueError: slots must have labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the output going to an agent, a few thoughts to consider:
- Are we going to be able to control stdout vs. stdin to only provide one to the agent?
- If not, do we want to hide the bulk of the traceback and only show the ValueError: slots must have labels?
- Can we give the agent any more context? (e.g., imagine if there is more than one slot - it will need to deduce which one was missing a label).
I am also getting the exit of the broker for the output:
Nov 03 07:44:12.177820 UTC 2025 broker.err[0]: rc2.0: python3 /code/docker/flux-validator/validate.py validate /data/docker/flux-validator/implicit-slot.yaml Exited (rc=1) 0.1sThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it validates when I have a label but I change the name (e.g., default is defined, but then in the resources I called it something else). I don't know if flux checks for that.
| Note: need to override the entrypoint. | ||
|
|
||
| ```bash | ||
| $ docker run --entrypoint flux -it -v $(pwd):/data ghcr.io/compspec/fractale:flux-validator start python3 /code/docker/flux-validator/validate.py count /data/docker/flux-validator/implicit-slot.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is cool!
One, two, three, core... ah ah ah.
I am the count, I love to count! 🦇
| except Exception as e: | ||
| display_error(content, str(e)) | ||
| sys.exit(1) | ||
| yaml_content = yaml.safe_load(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: run pre-commit run --all-files to fix isort, etc. I know, it should be in CI, and it's not. :)
This PR adds capability to validate Flux canonical jobspec via the Flux
Jobspecmember functionvalidate_jobspec(). It also adds support for walking a canonical jobspec tree and counting and outputting the summed resource counts by type. This output can be used in an agentic framework to correct a generated canonical jobspec.