Fix non-deterministic OCI digests from time.Now() in creation timestamp#654
Conversation
Replace time.Now() with deterministic timestamps in the builder and HuggingFace model packages so that pulling the same model revision always produces the same OCI digest. For HuggingFace models, the repo's lastModified date is fetched via the API and used as the creation timestamp. For the builder APIs, new optional WithCreated/WithCreatedTime options allow callers to supply a fixed timestamp. Fixes docker#647
Manual test resultsAll distribution tests pass: New test details:
Backward compatibility: |
Summary of ChangesHello @veeceey, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves the problem of non-deterministic OCI digests when pulling HuggingFace models. By incorporating the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
GetRepoInfo, you rely on the defaulttime.TimeJSON decoding forlastModified; consider unmarshalling into a string and parsing with an explicit layout (matching HuggingFace’s format) so you have a controlled failure mode if their timestamp format changes. - The logic for selecting the
createdtimestamp is now duplicated infromFormatandFromDirectory; consider factoring this into a small helper or reusing the same options type so future changes to creation-time handling stay consistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `GetRepoInfo`, you rely on the default `time.Time` JSON decoding for `lastModified`; consider unmarshalling into a string and parsing with an explicit layout (matching HuggingFace’s format) so you have a controlled failure mode if their timestamp format changes.
- The logic for selecting the `created` timestamp is now duplicated in `fromFormat` and `FromDirectory`; consider factoring this into a small helper or reusing the same options type so future changes to creation-time handling stay consistent.
## Individual Comments
### Comment 1
<location> `pkg/distribution/huggingface/client.go:186-191` </location>
<code_context>
+// GetRepoInfo fetches repository metadata from the HuggingFace API.
+// This returns information such as the last modified timestamp, which is useful
+// for producing deterministic OCI digests.
+func (c *Client) GetRepoInfo(ctx context.Context, repo, revision string) (*RepoInfo, error) {
+ if revision == "" {
+ revision = "main"
+ }
+
+ url := fmt.Sprintf("%s/api/models/%s/revision/%s", c.baseURL, repo, revision)
+
+ req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
</code_context>
<issue_to_address>
**issue (bug_risk):** Revision (and possibly repo) should be URL-escaped to handle slashes and special characters correctly.
Since `revision` is interpolated directly into the path, values like `refs/pr/1` will be interpreted as extra path segments instead of a single revision. Percent-encode the revision (and likely `repo`) using `url.URL`/`path.Join` and `url.PathEscape`, consistent with other endpoints, so repos/revisions containing slashes or special characters work correctly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the issue of non-deterministic OCI digests by introducing a mechanism to provide a fixed creation timestamp. The implementation uses the functional options pattern cleanly in the builder, and gracefully falls back to the old behavior if a timestamp cannot be fetched from the HuggingFace API. The changes are well-tested. I have a couple of minor suggestions to improve consistency and observability.
| // WithCreatedTime sets a specific creation timestamp for the model artifact | ||
| // built from a directory. When not set, the current time (time.Now()) is used. | ||
| // This is useful for producing deterministic OCI digests when the same directory | ||
| // content should always yield the same artifact regardless of when it was built. | ||
| func WithCreatedTime(t time.Time) DirectoryOption { | ||
| return func(opts *DirectoryOptions) { | ||
| opts.Created = &t | ||
| } |
There was a problem hiding this comment.
For consistency with the BuildOption defined in pkg/distribution/builder/builder.go, it would be better to name this function WithCreated instead of WithCreatedTime. Both functions serve the same purpose of setting a creation timestamp, and using the same name improves code clarity and maintainability. You will need to update the call sites in pkg/distribution/builder/from_directory_test.go as well.
| // WithCreatedTime sets a specific creation timestamp for the model artifact | |
| // built from a directory. When not set, the current time (time.Now()) is used. | |
| // This is useful for producing deterministic OCI digests when the same directory | |
| // content should always yield the same artifact regardless of when it was built. | |
| func WithCreatedTime(t time.Time) DirectoryOption { | |
| return func(opts *DirectoryOptions) { | |
| opts.Created = &t | |
| } | |
| // WithCreated sets a specific creation timestamp for the model artifact | |
| // built from a directory. When not set, the current time (time.Now()) is used. | |
| // This is useful for producing deterministic OCI digests when the same directory | |
| // content should always yield the same artifact regardless of when it was built. | |
| func WithCreated(t time.Time) DirectoryOption { | |
| return func(opts *DirectoryOptions) { | |
| opts.Created = &t | |
| } | |
| } |
| var createdTime *time.Time | ||
| repoInfo, err := client.GetRepoInfo(ctx, repo, revision) | ||
| if err == nil && !repoInfo.LastModified.IsZero() { | ||
| createdTime = &repoInfo.LastModified | ||
| } |
There was a problem hiding this comment.
The error from client.GetRepoInfo is silently ignored. While falling back to time.Now() is the correct behavior, logging this error at a warning level would improve observability. This would help users understand why they might not be getting deterministic OCI digests if they were expecting them.
| var createdTime *time.Time | |
| repoInfo, err := client.GetRepoInfo(ctx, repo, revision) | |
| if err == nil && !repoInfo.LastModified.IsZero() { | |
| createdTime = &repoInfo.LastModified | |
| } | |
| var createdTime *time.Time | |
| repoInfo, err := client.GetRepoInfo(ctx, repo, revision) | |
| if err != nil { | |
| log.Printf("Warning: could not fetch repo info for deterministic timestamp: %v. Falling back to current time.", err) | |
| } else if !repoInfo.LastModified.IsZero() { | |
| createdTime = &repoInfo.LastModified | |
| } |
|
This looks ok, so far. Could you go through the Gemini comments @veeceey ? |
- URL-escape the revision parameter in GetRepoInfo to handle slashes and special characters (e.g. refs/pr/1) correctly - Log a warning when GetRepoInfo fails instead of silently falling back to time.Now(), improving observability for deterministic digest issues
|
Thanks @ericcurtin! Addressed the Gemini and Sourcery review comments:
|
|
Thank you @veeceey while testing this PR I noticed we have (at least) another non-deterministic field that causes different manifest hash: |
|
I’ll take a look at it and fix :)
…________________________________
From: Ignasi ***@***.***>
Sent: Friday, February 13, 2026 2:00:49 AM
To: docker/model-runner ***@***.***>
Cc: Varun Chawla ***@***.***>; Mention ***@***.***>
Subject: Re: [docker/model-runner] Fix non-deterministic OCI digests from time.Now() in creation timestamp (PR #654)
[https://avatars.githubusercontent.com/u/1451887?s=20&v=4]ilopezluna left a comment (docker/model-runner#654)<#654 (comment)>
This is the follow up issue: #669<#669> in case you are interested @veeceey<https://github.com/veeceey> its all yours :)
—
Reply to this email directly, view it on GitHub<#654 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIE72BC7I663IBUIUWTI5AT4LWOFDAVCNFSM6AAAAACUXC5XFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQOJWGIYDGMRTHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
Nice catch on the ModTime in layer annotations! I'll pick up #669 — should be a straightforward extension of the same approach, passing the timestamp through to the annotation layer as well. |
Pulling the same HuggingFace model at different times produces different OCI digests because
time.Now()is used as the creation timestamp in the config. This makes content-addressability unreliable since identical model content yields different manifests.Changes
WithCreated(time.Time)build option toFromPath/FromPathsso callers can supply a deterministic timestamp. Usestime.Now()as fallback when not specified, keeping backward compatibility via variadic params.WithCreatedTime(time.Time)directory option for the same purpose.GetRepoInfo()to fetch repo metadata includinglastModified.lastModifiedfrom the HF API after downloading files and pass it as the creation timestamp. Falls back gracefully totime.Now()if the API call fails.Test plan
GetRepoInfo(success, default revision, 404)go test ./pkg/distribution/...)Fixes #647