Skip to content

Conversation

@milroy
Copy link

@milroy milroy commented Nov 3, 2025

This PR adds capability to validate Flux canonical jobspec via the Flux Jobspecmember function validate_jobspec(). It also adds support for walking a canonical jobspec tree and counting and outputting the summed resource counts by type. This output can be used in an agentic framework to correct a generated canonical jobspec.

Add support for validating canonical jobspecs in YAML or JSON format.
The Flux Jobspec class has a function that validates canonical jobspec
and throws errors with specific reasons why an input jobspec is
invalid. Integrate this functionality into the flux-validator.

Also add support for walking a canonical jobspec and validating the
resource counts via `.resource_walk()`. Outputting the counts will
provide feedback for an agent to correct a generated canonical
jobspec.
Add instructions for validating and counting resources in a canonical
jobspec, including an example for overriding the entrypoint.
Copy link
Member

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! The count is really cool. A few comments below.


#### Canonical jobspecs in YAML or JSON format

##### Valid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow the structure above, let's put this directly as another example under Valid. A comment that it is for a canonical jobspec in json/yaml will suffice to categorize it.


##### Valid
```bash
$ docker run -it -v $(pwd):/data ghcr.io/compspec/fractale:flux-validator /data/docker/flux-validator/implicit-slot.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add implicit-slot.yaml to the repository here as an example (and remove from the README below).


##### Invalid
```bash
$ docker run -it -v $(pwd):/data ghcr.io/compspec/fractale:flux-validator /data/docker/flux-validator/implicit-slot-invalid.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add implicit-slot-invalid.yaml to the repository too. Feel free to create additional structure for these data files if you think it will better organize.

self._validate_resource(res)
File "/usr/lib/python3.10/site-packages/flux/job/Jobspec.py", line 306, in _validate_resource
raise ValueError("slots must have labels")
ValueError: slots must have labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the output going to an agent, a few thoughts to consider:

  • Are we going to be able to control stdout vs. stdin to only provide one to the agent?
  • If not, do we want to hide the bulk of the traceback and only show the ValueError: slots must have labels?
  • Can we give the agent any more context? (e.g., imagine if there is more than one slot - it will need to deduce which one was missing a label).

I am also getting the exit of the broker for the output:

Nov 03 07:44:12.177820 UTC 2025 broker.err[0]: rc2.0: python3 /code/docker/flux-validator/validate.py validate /data/docker/flux-validator/implicit-slot.yaml Exited (rc=1) 0.1s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it validates when I have a label but I change the name (e.g., default is defined, but then in the resources I called it something else). I don't know if flux checks for that.

Note: need to override the entrypoint.

```bash
$ docker run --entrypoint flux -it -v $(pwd):/data ghcr.io/compspec/fractale:flux-validator start python3 /code/docker/flux-validator/validate.py count /data/docker/flux-validator/implicit-slot.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is cool!

One, two, three, core... ah ah ah.

I am the count, I love to count! 🦇

except Exception as e:
display_error(content, str(e))
sys.exit(1)
yaml_content = yaml.safe_load(content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: run pre-commit run --all-files to fix isort, etc. I know, it should be in CI, and it's not. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants