Skip to content

Commit cce2cb3

Browse files
committed
Implement image pull retry for tink-worker image
There could be races where linuxkit network or dns may not have been fully set up and functional yet and image pull fails because of that. Signed-off-by: Pooja Trivedi <tripooja@amazon.com>
1 parent bb4bad8 commit cce2cb3

File tree

1 file changed

+23
-3
lines changed

1 file changed

+23
-3
lines changed

bootkit/main.go

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,9 @@ type tinkConfig struct {
4747
tinkServerTLS string
4848
}
4949

50+
const imagePullRetryAttempts = 10
51+
const retrySleepSeconds = 5
52+
5053
func main() {
5154
fmt.Println("Starting BootKit")
5255

@@ -146,11 +149,28 @@ func main() {
146149

147150
fmt.Printf("Pulling image [%s]", imageName)
148151

149-
out, err := cli.ImagePull(ctx, imageName, pullOpts)
150-
if err != nil {
151-
panic(err)
152+
// TODO: Ideally if this function becomes a loop that runs forever and keeps retrying
153+
// anything that failed, this retry would not be needed. For now, this addresses the specific
154+
// race condition case of when the linuxkit network or dns is in the process of, but not quite
155+
// fully set up yet.
156+
157+
failedImagePull := true
158+
159+
var out io.ReadCloser
160+
for i := 0; i < imagePullRetryAttempts; i++ {
161+
out, err = cli.ImagePull(ctx, imageName, pullOpts)
162+
if err == nil {
163+
failedImagePull = false
164+
break
165+
}
166+
fmt.Printf("Error pulling image [%s] [%v]. Retrying after %d seconds...\n", imageName, err, retrySleepSeconds)
167+
time.Sleep(time.Second * retrySleepSeconds)
152168
}
153169

170+
if failedImagePull {
171+
panic(err)
172+
}
173+
154174
_, err = io.Copy(os.Stdout, out)
155175
if err != nil {
156176
panic(err)

0 commit comments

Comments
 (0)