Skip to content

Conversation

Copy link

Copilot AI commented Nov 3, 2025

The install-slices helper script fails intermittently when Ubuntu archive manifests change mid-test, causing digest mismatches like error: cannot fetch from archive: expected digest 053d..., got 58b.... This was already fixed for the Python CI workflow in canonical#721 but not for Spread tests.

Changes

  • tests/spread/lib/install-slices: Wrap chisel cut in retry loop
    • Retry up to 3 attempts on archive fetch errors only
    • Exit immediately on non-fetch errors (prevents infinite loops)
    • Log retry attempts to stderr
    • Fix argument quoting: $@"$@"
while [ "$attempt" -le "$max_attempts" ]; do
    output=$(chisel cut --release "$PROJECT_PATH" --root "$tmpdir" "$@" 2>&1)
    exit_code=$?
    
    if [ "$exit_code" -eq 0 ]; then
        exit 0
    fi
    
    if echo "$output" | grep -q "$fetch_error"; then
        if [ "$attempt" -lt "$max_attempts" ]; then
            echo "Fetch error on attempt $attempt/$max_attempts, retrying..." >&2
            attempt=$((attempt + 1))
            continue
        fi
    fi
    
    echo "$output" >&2
    exit $exit_code
done

Application

This change needs to be applied to tests/spread/lib/install-slices on all Ubuntu release branches: 20.04, 22.04, 24.04, 24.10, 25.04, 25.10.

Original prompt

This section details on the original issue you should resolve

<issue_title>ci: add retry mechanism to install-slices helper in spread tests</issue_title>
<issue_description>## Problem

The install-slices helper function used in Spread integration tests still experiences manifest digest errors, even though this was fixed for the CI workflow in cjdcordeiro/chisel-releases#721.

Error Example

From https://github.com/canonical/chisel-releases/actions/runs/18732102302/job/53431205076?pr=648:

error: cannot fetch from archive: expected digest 053d09654f2bbeec59605a7d948f2e6ee80b9218978211c53c8aa2879a13d7ea, got 58b46c55d471ac98b59bb31555ef9d87dd5754956472c34a70f3bd11e65fd7ab

Root Cause

The issue occurs in tests/spread/lib/install-slices, which is a bash helper script used by Spread integration tests:

#!/bin/bash -ex

# Installs one or more slices into a dynamically created temporary path.
# Usage: install-slices [<slice>...]
# Returns the path of the chiselled rootfs

tmpdir="$(mktemp -d)"

echo "${tmpdir}"
chisel cut --release "$PROJECT_PATH" --root "$tmpdir" $@

The problem: This script calls chisel cut directly without any retry mechanism.

When Ubuntu archives are updated, there's a race condition where the manifest digest changes between when Chisel fetches the Release file and when it fetches the package indexes. This causes transient failures.

What Was Already Fixed

PR canonical#721 fixed this issue for the Python CI script (.github/scripts/install-slices/install_slices.py) by:

  1. Creating a chisel_cut() wrapper function
  2. Adding retry logic (up to 3 attempts)
  3. Detecting the specific error: "error: cannot fetch from archive"

However, this fix only applies to CI workflows, not to Spread integration tests.

Proposed Solution

Apply the same retry mechanism to tests/spread/lib/install-slices. The script should:

  1. Wrap the chisel cut call in a retry loop (3 attempts)
  2. Only retry on fetch errors from archives
  3. Add logging for retry attempts

Example Implementation

#!/bin/bash -ex

# Installs one or more slices into a dynamically created temporary path.
# Usage: install-slices [<slice>...]
# Returns the path of the chiselled rootfs

tmpdir="$(mktemp -d)"
echo "${tmpdir}"

max_attempts=3
attempt=1
fetch_error="error: cannot fetch from archive"

while [ $attempt -le $max_attempts ]; do
    output=$(chisel cut --release "$PROJECT_PATH" --root "$tmpdir" "$@" 2>&1)
    exit_code=$?
    
    if [ $exit_code -eq 0 ]; then
        exit 0
    fi
    
    # Check if it's a fetch error that we should retry
    if echo "$output" | grep -q "$fetch_error"; then
        if [ $attempt -lt $max_attempts ]; then
            echo "Fetch error on attempt $attempt/$max_attempts, retrying..." >&2
            ((attempt++))
            continue
        fi
    fi
    
    # If we get here, either it's not a fetch error or we're out of retries
    echo "$output" >&2
    exit $exit_code
done

Impact

This will make Spread integration tests more resilient to transient archive update race conditions, matching the behavior already implemented in the CI workflow.

References

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 6 commits November 3, 2025 10:58
Co-authored-by: cjdcordeiro <4047767+cjdcordeiro@users.noreply.github.com>
Co-authored-by: cjdcordeiro <4047767+cjdcordeiro@users.noreply.github.com>
Co-authored-by: cjdcordeiro <4047767+cjdcordeiro@users.noreply.github.com>
Co-authored-by: cjdcordeiro <4047767+cjdcordeiro@users.noreply.github.com>
Co-authored-by: cjdcordeiro <4047767+cjdcordeiro@users.noreply.github.com>
Co-authored-by: cjdcordeiro <4047767+cjdcordeiro@users.noreply.github.com>
Copilot AI changed the title [WIP] ci: add retry mechanism to install-slices helper in spread tests ci: add retry mechanism to install-slices helper in spread tests Nov 3, 2025
Copilot AI requested a review from cjdcordeiro November 3, 2025 11:18
@cjdcordeiro cjdcordeiro marked this pull request as ready for review November 3, 2025 11:19
@lczyk
Copy link

lczyk commented Nov 21, 2025

also closing this particular one as we already have canonical#743 to track the issue and this PR is not even targeting the correct branches

EDIT: this is not canonical/chisel-releases 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: add retry mechanism to install-slices helper in spread tests

3 participants