Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Lineage Support Layered Media #424

Closed
WhisperingChaos opened this issue Oct 28, 2016 · 8 comments
Closed

File Lineage Support Layered Media #424

WhisperingChaos opened this issue Oct 28, 2016 · 8 comments

Comments

@WhisperingChaos
Copy link

I performed an issue search on the keyword "lineage" and it detected only one closed issue in this repository that referenced it. Lineage is a concept akin to a "family tree" which tracks the evolution of component and its offspring. Since an image layer captures a component's state at a "moment in time", its position relative to other layers may reflect its position in a given component's family tree. Therefore, is there an effort to more directly represent this notion of lineage into this image spec to potentially assist forensics which would benefit, for example, security or an ability to gauge the variation of a component vs its reliability?

Thanks!

@wking
Copy link
Contributor

wking commented Oct 28, 2016

On Fri, Oct 28, 2016 at 10:13:12AM -0700, Rich Moyse wrote:

… is there an effort to more directly represent this notion of
lineage into this image spec…

You don't need a structure for lineage to use an image, so there isn't
a structured field for it at the moment. There was some discussion of
using parent manifests in the child's ‘layers’ (e.g. 1), but the
consensus was that you could accomplish the same thing with less work
by inlining the parent's layers directly 2. So currently folks who
want to distribute this sort of information should use ‘history’ 3
or add their own annotation keys along the lines of 4.

If you want to leverage CAS, you could also define a new commit-like
media type and have a commit-DAG pointing at manifests (or manifest
lists, or whatever) as the payload. With type-map-based type-handling
logic like #403, plugging that sort of third-party type into the OCI
tooling should be fairly straightforward.

@WhisperingChaos
Copy link
Author

@wking

Thanks for your thoughtful answer and references! It might be interesting to use custom properties, mentioned by one of your provided references, to store lineage info.

@stevvooe
Copy link
Contributor

@WhisperingChaos "Classically", history has been a component of container images. The implementations embedded lineage directly in the format. However, these features come at great cost in distribution, security and runtime (I can elaborate on these costs if you don't agree the premise).

A much better system would maintain lineage externally to artifacts as they are built. Such a system would be much more secure without incurring the distribution and runtime costs.

The history field is maintained to provide the notion of lineage, until such metadata systems exist.

@WhisperingChaos
Copy link
Author

@stevvooe

Yes - I agree, detailed lineage information should be available as a separate artifact from the resulting image, however, it would be helpful to encode a form of DNA (densely encoded, and small) within an image to identify each artifact in an image and the "mutations" between parents and offspring in runtime images.

As you discuss in your post, build time history incurs "great cost" to the runtime image and system. For this reason, many have built pipelines whose resulting runtime images no longer contain any build time artifacts or history. Given this desire to eliminate build time artifacts, there's probably a large number of images now and to be manufactured in the future whose lineage won't be, at least easily, traceable. Therefore, it would be beneficial to define a means to record lineage in a runtime image.

@stevvooe
Copy link
Contributor

@WhisperingChaos We already have this DNA: the components of an image are content addressable. Links between components use these content addresses to maintain these relationships. Fields like ChainID, Parent, and DiffID provide this. To the casual observer, these may not look useful but, in fact, they provide the aspects of lineage that affect runtime.

However, as I stated above, encoding lineage isn't free and different people have different ideas about the granularity of data required. Even here, with the tools for lineage existing already, you have made the conclusion that there are no tools for lineage built into the image specification, when, in fact, they already exist.

Why burden the format when users' will disagree on the level of granularity of lineage required? Why make all users pay this cost when only some users will need it? It seems much more prudent to leave this decision to packaging systems, as we have done in the past. The set of images can then be curated within this packaging system, providing a level of guarantee (and curation) fit for purpose.

That said, I'm not sure how constructive we can be speaking in the abstract. Let's focus on the following:

  1. What specific use cases for lineage are not possible with the current specification?
  2. What additions are you proposing to address the missing functionality?

@vbatts
Copy link
Member

vbatts commented Mar 9, 2017

@stevvooe related to #600 ?

@stevvooe
Copy link
Contributor

stevvooe commented Mar 9, 2017

@vbatts I did not intend for these to be related. #600 is just "append a string to keep us all sane".

I still attest that we have the necessary structural information to recover lineage, even if there aren't explicit pointers. Albeit, the general problem of label propagation (and "lifting") can be used to address this issue. My general aversion to heading down that path are labels that may introduce hash instability, although that is less of a problem at the manifest/config level, that it is for layers.

@sudo-bmitch
Copy link
Contributor

I believe this was answered in the thread above (the answer being that it's something that could be inferred but not directly part of the spec). I'm closing this out since it is rather stale, but feel free to follow up if there's more needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants