Description
Background
The current model-spec (as outlined in docs/spec.md) supports bundling model artifacts in various formats, including .tar and potentially .raw. For .tar archives, file metadata such as file mode, ownership, and other attributes can be embedded in the tar header, which is useful for runtimes that rely on this information to properly handle the files (e.g., setting executable permissions or preserving ownership).
However, the .raw type, being a flat, uncompressed file format, does not inherently provide a mechanism to store such metadata. This poses a challenge for runtimes that may depend on these attributes to function correctly, especially in scenarios where permissions or other file properties are critical (e.g., executable scripts or binaries within a model artifact).
Problem
When using the .raw type, there is no clear way to preserve or convey file metadata that was previously available in .tar headers. This limitation could lead to:
- Loss of critical information (e.g., file modes like 755 for executables).
- Inconsistent behavior across runtimes that expect this metadata.
- Potential security or operational issues if the runtime cannot infer the intended file attributes.
For example, a model artifact might include a script or binary that needs to be executable, but without metadata, the runtime would either need to assume defaults (potentially incorrect) or fail to execute it properly.
Questions for Discussion
- Should .raw artifacts be required to preserve the same metadata as .tar artifacts? If so, what’s the minimum set of attributes we need (e.g., mode, ownership, timestamps)?
- How should runtimes handle cases where metadata is missing for .raw files? Should there be a fallback mechanism?
- Are there existing standards or tools (e.g., OCI image spec) we can borrow from to address this?
What would be the best way to store this metadata for .raw files while aligning with the goals of simplicity and portability in the model-spec?