Skip to content

Conversation

vsoch
Copy link
Collaborator

@vsoch vsoch commented Jan 30, 2024

This proposal defines a simple, easy to read and understand compatibility spec that describes metadata attributes for compatibility. It can be paired with a compatibility schema that is maintained by a compatibility interest group, for which the goal is to define the namespace of allowed metadata attributes and relationships between them for the artifact. For the latter, the format is JGF "Json Graph Format" and so no new structure needs to be proposed by the working group. These two documents (the schema and artifact) are complementary and would work together to allow for validation and understanding of relationships between terms, but without adding complexity to the compatibility artifact directly. The expected use cases are image selection and scheduling, both of which I am prototyping and actively running experiments for.

@vsoch vsoch force-pushed the proposal-d branch 3 times, most recently from bd7fe02 to 24fbc14 Compare January 30, 2024 02:57
@mfranczy
Copy link
Collaborator

I will wait to review further until you update the proposal as you mentioned in the Slack channel.

@mfranczy
Copy link
Collaborator

Although I have one more question... Where the schema and plugins provided by organisation would live? Do you consider a central repo under OCI or dedicated for specific organisations?

@vsoch
Copy link
Collaborator Author

vsoch commented Jan 30, 2024

Although I have one more question... Where the schema and plugins provided by organisation would live? Do you consider a central repo under OCI or dedicated for specific organisations?

Either of these cases - right now I'm storing them at https://github.com/supercontainers/compspec and they are referenced in the generated artifacts shown here.

Copy link
Collaborator

@sudo-bmitch sudo-bmitch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely concerned that this won't work for runtimes.

At a higher level, I worry that multiple conflicting graphs could be used to deploy workloads on nodes that are unexpected. If different tools parse different parts of the spec, ignoring the parts they don't understand, an attacker could leverage that to sneak a workload onto a cluster bypassing various scanners and checks. This exists with all the proposals, but increases in risk with complexity.

I'd also avoid including the schema in the generated json if it's not needed to parse the input. And if it is needed to parse it, then runtimes cannot work when airgapped, and images will break when a 3rd party service has an outage.

@vsoch
Copy link
Collaborator Author

vsoch commented Jan 30, 2024

I'm definitely concerned that this won't work for runtimes.

I know this wasn't liked, but I do think we need two separate things here.

@vsoch
Copy link
Collaborator Author

vsoch commented Feb 2, 2024

Proposal is updated! This is a round 1 update because I have not yet considered the TODO in this issue, needing to represent relationships for preferences in the spec itself.

- [x] As a system runtime administrator, I want to check whether a container is compatible with the nodes I am going to run it on using the provided tool.
- [x] As a system runtime administrator, I would like to fetch additional documentation for understanding specific settings in the compatibility spec.
- [x] As a system runtime administrator, selecting which image to run should only require pulling the Index manifest, and parsing the descriptors listed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this one is mutually exclusive with line 407.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copy pasted the contents from the requirements file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a reference to checking the item. If we say "I want to update compatibility independently without having to re-release and re-distribute my image" is provided by this implementation, then I don't think we can also say a runtime can select an image with only the Index manifest. Runtimes would need to pull the associated referrers to support that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked it because you still technically could - it would work as it does now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sudo-bmitch I removed this box, because the implication is "I want to get compatibility information only using the index" and not "I want to still be able to select an image" (how I read it).

### Security Administrator
- [x] As a security administrator, I want predictable behavior from runtimes, which does not change based on unsigned content.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require the compatibility artifact to also be signed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copy pasted the contents from the requirements file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the runtime behavior if the image is signed, but the compatibility artifact is not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the two are pushed from the same build CI I think this case would be unlikely. But if it happened, likely the runtime would not use it. Thankfully in HPC land we rarely do proper signing and checking of things, it's an ideal more than anything else.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the items here that generated discussion, I think it's useful to capture the thought process around a check (or lack there of) with a comment for those viewing the merged proposal later. In the other proposals, we've been placing those _(inside parenthesis and with italics)_.

For this item, I'd add "runtimes should ignore unsigned or untrusted artifacts if signed images are required, even if the image itself is signed by a trusted authority".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You got it!

Copy link
Collaborator

@mfranczy mfranczy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal is definetely interesting. I think we have to discuss image selection influenced by compatibility defined in the artifact. There are many concerns around that.

@vsoch
Copy link
Collaborator Author

vsoch commented Feb 13, 2024

Proposal is updated to include plugin design (not required, but introspection for future work) and an explicit answer to the question about about needing graphs.

"cpu.vendor": "GenuineIntel"
}
},
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are mutiple compatibilities list here, should meet all compatibilities, or just meet one of the compaibility, or some of them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is up to the tool using the artifact. The metadata is provided with flexibility in mind.

@vsoch
Copy link
Collaborator Author

vsoch commented Feb 20, 2024

Note for the working group that I started a (more properly plugin based) Python module tonight, and moved compspec-go there as well: https://github.com/compspec/. I'll eventually put more about the specification we decided upon under spec, and likely write some nice tools to make graphs and other visualizations (web and static) there.

It doesn't well belong under supercontainers because a compatibility specification can be used to describe other kinds of applications (binaries for the HPC use case). I'll be developing this library more this week, but for an example, let's say we have application metadata about I/O needs via IOR. My plan would be to allow to install the plugin and main library:

pip install compspec
pip install compspec-ior

And then the extraction UI would be similar to go, something like:

compspec-py extract --name ior ...

And the main library would discover the modules akin to names, like how we did in snakemake. Any library could write a simple interface (that would be well defined) to work with the main library to plug-in to (likely) still compspec-go that can be used in "all the go places that containers like to be."

For some background on that original compspec, before I started the converged computing work at the lab I was a bit bored and got hugely into answer set programming (ASP) and wrote this generic library (that used it) "to compare things." ASP (with clingo) is actually the base of the solver in spack. it's not the speediest thing (If I needed to write one I'd use rust) but it's kind of fun as an exercise to write these little programs.

I'm off to bed - more on this in the coming weeks.

@vsoch vsoch mentioned this pull request Feb 20, 2024
8 tasks
@mfranczy
Copy link
Collaborator

mfranczy commented Feb 20, 2024

Is that only information about experiment or should we treat that as something related to the proposal itself?

For now, I am assuming the latter.

Any library could write a simple interface (that would be well defined) to work with the main library to plug-in to (likely) still compspec-go

Do I understand correctly that you want to allow plugins to be developed in any language? If yes, then a few questions (some of them I also got in my proposal)

  • Who would maintain the libraries?
  • In the example you developed plugins with Python. What if I cannot install Python on the host because of very strict environment and still want to use plugins developed by some org?
  • How do you make sure that plugins are secure for consumers?
  • In the example you used pip to distribute plugins, if I use different language, the main library would have to find a plugin over executable name that has to be added to the $PATH? Additionally, how do I verify the plugins?
  • Do you also plan to use plugins (for instance extractors) to generate node labels that can be later matched for scheduler?

@vsoch
Copy link
Collaborator Author

vsoch commented Feb 20, 2024

Is #9 (comment) only information about experiment or should we treat that as something related to the proposal itself?

It's a quick and easy example that I am empowered to build things using it!

Do I understand correctly that you want to allow plugins to be developed in any language?

Given that the artifacts can be used for off table use cases, I don't see why not.

Who would maintain the libraries?

Whomever has a vested interest to, communities, companies, individuals, it doesn't matter.

In the example you developed plugins with Python. What if I cannot install Python on the host because of very strict environment and still want to use plugins developed by some org?

You could ask this about any language. Generally speaking if something isn't allowed there needs to be a creative way to still run it (e.g., somewhere else) or go to leadership and argue the case and inspire change. This was the container story in HPC - nobody let us run them on clusters at first.

How do you make sure that plugins are secure for consumers?

How do you make sure any software is secure for consumers?

In the example you used pip to distribute plugins, if I use different language, the main library would have to find a plugin over executable name that has to be added to the $PATH?

You use whatever package manager makes sense. If you need that level of checking you'd probably want to verify the sha. And for the second question, every language has a different strategy for plugins - Python's just happens to be more flexible. Nushell is a cool example that allows for any language.

Additionally, how do I verify the plugins?

However you decide to.

I don't know what a lot of these questions have to do with the proposal, or any proposal here. We design the compatibility artifact, and people are empowered to build things with it. The things they build are not under the control / decision of us here, but have to grow organically to adopt good practices.

Do you also plan to use plugins (for instance extractors) to generate node labels that can be later matched for scheduler?

I already am.

@mfranczy
Copy link
Collaborator

mfranczy commented Feb 20, 2024

I don't know what a lot of these questions have to do with the proposal, or any proposal here. We design the compatibility artifact, and people are empowered to build things with it. The things they build are not under the control / decision of us here, but have to grow organically to adopt good practices.

This has a lot to do with the proposals presented here. We don't only design the compatibility artifact, but also the way how it can be used later if we decide to release an official OCI tool or libraries for that. Especially, if we think to enable that for container runtimes for image selection use case. That's why those were asked to find out if you have some nice ideas for that.

My questions were only to find out about some methods how we could build stuff around the tool. Not to jeopardize the proposal. If you propose that users can do anything then I have no more questions. Let it be.

mfranczy
mfranczy previously approved these changes Feb 20, 2024
@vsoch
Copy link
Collaborator Author

vsoch commented Feb 20, 2024

This has a lot to do with the proposals presented here. We don't only design the compatibility artifact, but also the way how it can be used later if we decide to release an official OCI tool or libraries for that. Especially, if we think to enable that for container runtimes for image selection use case. That's why those were asked to find out if you have some nice ideas for that.

I think we should scope plugin discussion to some phase II of our group work - arguably if we make plugins for any artifact (or non-artifact) spec we will have similar questions. The plugin design here was an attempt to think through some of my ideas and anticipate that, but doesn't need to be considered formally part of the proposal. My main reason was that I was going to start working on tools / plugins and wanted to write down the design. It might help to scope initial discussion to just the "json parts" and acknowledge the desire for plugins (and come back to it).

"type": "compspec",
"label": "compatibilities",
"nodes": {
"mpi": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another question about the schema and compatibility artifact separation. because compatibilities are relied on the schema, so if lots of compatibility artifact have been delivered to production environment, which refer to the aleady published schema version. But unfortunately, if someday a big issue found in schema, and fixed with a new version, then all big issues(the relationships) are still used in the deliverered compatibility artifacts, if we want to fix the issue, all compatibility artifacts have to be re-released using new schema.

From this perspective, self-contained schema (customizable node relationships) and compatiblity will reduce the issue propagation, and reduce the fix cost

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's generally how it works with software too though. It's better to have versioning than not I think. Technically speaking, if you just ignore the schema (and version) you could have a "self-contained" schema. Having the entire schema within the artifact is not reasonable from a practicality standpoint. For an example, here is just the start of IOR, it doesn't even include output types yet. compspec/schemas@764520d

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that's just one namespace (I/O) there would be multiple defined for one file. It's huge redundancy for very little benefit IMHO. And if there were some issue with the schema, instead of updating one place you'd still need to update the many (thousands?) of artifacts instead. We likely just need some way to patch / give directive to those using the old schema.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChaoyiHuang you make a good point for why we don't want to embed logic for the actual choice in the artifact, because there could be some "big issue" - the implication being that it is with the logic of the selection. That is why I advocate for an approach where the compatibility specification is just that - an artifact with information, and the way that information is used to decide on image selection (the algorithm / logic) is not hard coded there. That's the main way you'd run into some issue like you are describing. If it's just adding / removing fields that is much less likely to warrant some crisis.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to fix the issue, all compatibility artifacts have to be re-released using new schema.

That's also the case with any of these proposals. If there is a change to anything (field, logic, etc) that warrants a change to the artifact or image manifest. I would argue my approach is more flexible to that because often you can change the schema and then have some way to say "support previous versions" and there is no need to touch the artifacts. The other proposals, if they require everything hard coded into the artifact, cannot support that. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the question is about to build the artifact compatibilities specific relationship in schema:

if first io.archspec is about CPU, and the second one is about mpi and GPU, how to express the compatibility requirements: 1) cpu GenuineIntel amd64 + mpi v1.1 + nvidida GPU + (nvidia infiniband or arista infiband), 2) cpu GenuineAMD amd64 + mpiv1.2 + AMD GPU + arista infiniband)

That is also up to the tool. The relationships between things are defined by the upper level schema, if that is desired, but it doesn't have to be used

These kind of comaptibilities and/or relationship and combinations is image specific, and easy to change than standard. If they were built into schema, the problem is what I mentioned in this question.

Copy link
Collaborator Author

@vsoch vsoch Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can disagree then. Thanks for your feedback.

I'll note that the metadata values themselves are still in the artifact. It's just the namespace, declarations, and relationships that are in the schema. There is exactly the same metadata in the artifact here than there is in, for example, proposal A, but it's extended to be much more useful in scenarios beyond "Match this one tag."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your argument is also akin to saying we should put the schema for an sbom in every sbom because it might change. That doesn't make sense to me.

mfranczy
mfranczy previously approved these changes Feb 21, 2024
Proposal D is an extension to Proposal C. Proposal C defines an explicit
example of a compatibility artifact, meaning what a single artifact would
look like paired alongside an image in a registry (in some way) to describe
its compatibility for image selection or similar. Proposal D defines a
compatibility schema that is maintained by a compatibility interest group,
for which the goal is to define the namespace of allowed metadata attributes
and relationships between them. These two proposals are complementary and
would work together to allow for validation and understanding of relationships
between terms, but without adding complexity to the compatibility artifact
(Proposal C) directly.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@mfranczy mfranczy merged commit 1fddd9d into opencontainers:main Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants