Skip to content

Commit c4da6b1

Browse files
committed
Implement image pull retry for tink-worker image
There could be races where linuxkit network or dns may not have been fully set up and functional yet and image pull fails because of that. Signed-off-by: Pooja Trivedi <tripooja@amazon.com>
1 parent bb4bad8 commit c4da6b1

File tree

1 file changed

+22
-2
lines changed

1 file changed

+22
-2
lines changed

bootkit/main.go

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,9 @@ type tinkConfig struct {
4747
tinkServerTLS string
4848
}
4949

50+
const imagePullRetryAttempts = 10
51+
const retrySleepSeconds = 5
52+
5053
func main() {
5154
fmt.Println("Starting BootKit")
5255

@@ -146,8 +149,25 @@ func main() {
146149

147150
fmt.Printf("Pulling image [%s]", imageName)
148151

149-
out, err := cli.ImagePull(ctx, imageName, pullOpts)
150-
if err != nil {
152+
// TODO: Ideally if this function becomes a loop that runs forever and keeps retrying
153+
// anything that failed, this retry would not be needed. For now, this addresses the specific
154+
// race condition case of when the linuxkit network or dns is in the process of, but not quite
155+
// fully set up yet.
156+
157+
failedImagePull := true
158+
159+
var out io.ReadCloser
160+
for i := 0; i < imagePullRetryAttempts; i++ {
161+
out, err = cli.ImagePull(ctx, imageName, pullOpts)
162+
if err == nil {
163+
failedImagePull = false
164+
break
165+
}
166+
fmt.Printf("Error pulling image [%s] [%v]. Retrying after %d seconds...\n", imageName, err, retrySleepSeconds)
167+
time.Sleep(time.Second * retrySleepSeconds)
168+
}
169+
170+
if failedImagePull {
151171
panic(err)
152172
}
153173

0 commit comments

Comments
 (0)