Fix/end to end tests #9

Damien-DAGORN · 2025-12-29T17:18:10Z

Provider SQL: Default OCI packaging + stabilized E2E on Kind

Summary

This PR standardizes provider installation via OCI package (tag latest) and hardens local end-to-end tests on a Kind cluster. It removes the temporary MariaDB skip option and ensures both MariaDB (no-TLS & TLS) and PostgreSQL suites run reliably with clear diagnostics and robust Helm-based setup.

Key Changes

Default OCI packaging flow:
- Build and push the .xpkg to an in-cluster registry.
- Install the Provider from the OCI reference using the :latest tag.
Retained local cache fallback (PVC + .gz) when USE_OCI=false.
Hardened integration scripts with stricter shell behavior and clearer logs:
- set -euo pipefail, step-by-step logging, extended waits, and better error messages.
- helm upgrade --install --wait --timeout and repo add/update for deterministic installs.
MariaDB:
- Fixed Kubernetes secret creation syntax and improved install flow.
- Always run both no-TLS and TLS tests (removed SKIP_MARIADB).
- Using Bitnami chart version 24.0.2 (MariaDB 12.1.2).
PostgreSQL:
- Switched chart values to auth.postgresPassword, disabled persistence for Kind, and added readiness checks.
- Port-forward readiness loop and password generation hardened under pipefail.
- Using Bitnami chart version 18.2.0 (PostgreSQL 18.1.0).
Cleanup reliability:
- Guarded deletion where variables may not be defined (e.g., ProviderConfig during teardown).

Implementation Details

OCI mode is the default path: an in-cluster registry is launched, the provider .xpkg is pushed with both version and latest tags, and Crossplane installs the Provider from the OCI reference.
Local cache mode remains supported for offline/debug scenarios by setting USE_OCI=false.
Bitnami Helm charts are used for database deployments:
- MariaDB: version 24.0.2 (MariaDB 12.1.2)
- PostgreSQL: version 18.2.0 (PostgreSQL 18.1.0)
- Persistence disabled for Kind cluster compatibility
Provider installation includes additional health checks and timeouts to avoid flakiness.

Relevant scripts:

Test Plan

Run the full local end-to-end suite on Kind.

Prerequisites:

Docker, kind, kubectl, and helm available on your machine.

Examples:

# Default: OCI mode (in-cluster registry, provider installed via :latest)
bash cluster/local/integration_tests.sh

# Explicit OCI mode
env USE_OCI=true bash cluster/local/integration_tests.sh

# Local cache mode (PVC + .gz) for offline/debugging
env USE_OCI=false bash cluster/local/integration_tests.sh

Expected outcome:

Kind cluster is created, Crossplane installed via Helm.
Provider package is pushed to the local registry and installed.
MariaDB no-TLS & TLS tests run and pass.
PostgreSQL tests run and pass.
Cleanup completes without errors.

Flags & Configuration

USE_OCI (default: true):
- true: in-cluster registry + OCI install with :latest tag.
- false: local cache path via PVC referencing the .gz package.
Optional environment defaults exist for Kind and node image (e.g., KIND_VERSION, KIND_NODE_IMAGE_TAG).

CI Impact

Aligns CI with the default OCI packaging flow for consistency.
Reduces transient failures via deterministic Helm operations and clearer readiness checks.

Breaking Changes

None. The temporary SKIP_MARIADB flag was removed; the suite now always runs both MariaDB and PostgreSQL as intended.

Checklist

Provider installed via OCI :latest by default
Local cache fallback supported (USE_OCI=false)
MariaDB (no-TLS & TLS) tests passing
PostgreSQL tests passing
Cleanup is robust

Notes

If you encounter chart/image pull issues on Kind, ensure the Bitnami repos are added/updated and avoid pinning to stale chart versions. The integration tests use the following verified chart versions:

MariaDB: 24.0.2 (latest stable as of December 2025)
PostgreSQL: 18.2.0 (latest stable as of December 2025)

Increased waits and explicit readiness checks are in place, but logs from the integration script will guide further troubleshooting.

Signed-off-by: fahed dorgaa <fahed.dorgaa@gmail.com>

… 240 seconds Signed-off-by: fahed dorgaa <fahed.dorgaa@gmail.com>

…ion timeout

Copilot

Pull request overview

This PR refactors the end-to-end testing infrastructure to improve reliability and debugging capabilities. The changes introduce OCI-based package delivery as the default mode, add a flag to skip MariaDB tests for focused debugging, and harden PostgreSQL test setup with better error handling and diagnostics.

Key Changes:

Switched to OCI mode by default for provider package delivery, using an in-cluster registry instead of mounted cache volumes
Added SKIP_MARIADB flag to enable PostgreSQL-only test runs
Improved PostgreSQL installation robustness with helm upgrade pattern, disabled persistence, and enhanced readiness checks

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 15 comments.

File	Description
cmd/provider/main.go	Added configurable cache-sync-timeout flag for controller startup
cluster/local/postgresdb_functions.sh	Hardened PostgreSQL setup with pipefail-safe password generation, helm upgrade pattern, port-forward readiness checks, and safer cleanup
cluster/local/integration_tests.sh	Implemented OCI mode with in-cluster registry, added SKIP_MARIADB flag, improved provider health checks, and removed hardcoded Helm chart versions
.github/workflows/ci.yml	Added tmate debugging session for interactive troubleshooting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cluster/local/integration_tests.sh

cluster/local/postgresdb_functions.sh

cluster/local/integration_tests.sh

Copilot · 2025-12-29T17:23:17Z

cluster/local/postgresdb_functions.sh

+    "${KUBECTL}" get all -A || true
+    "${KUBECTL}" get events --sort-by=.lastTimestamp | tail -n 200 || true
+    "${HELM}" status postgresdb || true
+    echo_error "Helm install/upgrade of PostgresDB failed"


The error message uses echo_error but then doesn't exit. After checking for helm install failure and dumping diagnostics, the script should exit with a non-zero status code since the installation failed. Currently, the script continues executing even after the Helm install has failed.

Suggested change

echo_error "Helm install/upgrade of PostgresDB failed"

echo_error "Helm install/upgrade of PostgresDB failed"

return $rc

Copilot · 2025-12-29T17:23:17Z

cluster/local/postgresdb_functions.sh

+
+  # Start port-forward; ignore output filtering failures under pipefail
+  echo_sub_step "Start port-forward to svc/postgresdb-postgresql:5432"
+  ( "${KUBECTL}" port-forward --namespace default svc/postgresdb-postgresql 5432:5432 | grep -v "Handling connection for" || true ) &


The port-forward command is being launched in a subshell with error suppression that could hide issues. The grep filter failure is silenced with '|| true', and any kubectl errors are redirected but not logged. If the port-forward fails to start, the script will proceed to the readiness check loop which will eventually timeout without clear indication of what went wrong. Consider capturing and logging stderr from kubectl separately before discarding it.

Copilot · 2025-12-29T17:23:18Z

cluster/local/postgresdb_functions.sh

+  # Wait for local port to be ready
+  for i in {1..20}; do
+    if nc -z 127.0.0.1 5432 >/dev/null 2>&1; then
+      echo_sub_step "Port-forward ready (PID: ${PORT_FORWARD_PID})"
+      echo_step_completed
+      break
+    fi
+    sleep 0.5
+    if [ $i -eq 20 ]; then


The hardcoded timeout of 10 seconds (20 iterations × 0.5 seconds) may be too short for slower environments or loaded systems. The port-forward can take longer to establish in resource-constrained CI environments. Consider making this configurable or increasing the timeout to a more conservative value like 30 seconds.

Suggested change

# Wait for local port to be ready

for i in {1..20}; do

if nc -z 127.0.0.1 5432 >/dev/null 2>&1; then

echo_sub_step "Port-forward ready (PID: ${PORT_FORWARD_PID})"

echo_step_completed

break

fi

sleep 0.5

if [ $i -eq 20 ]; then

# Wait for local port to be ready; allow overriding max attempts via env var

PORT_FORWARD_MAX_ATTEMPTS="${POSTGRES_PORT_FORWARD_MAX_ATTEMPTS:-60}"

for ((i=1; i<=PORT_FORWARD_MAX_ATTEMPTS; i++)); do

if nc -z 127.0.0.1 5432 >/dev/null 2>&1; then

echo_sub_step "Port-forward ready (PID: ${PORT_FORWARD_PID})"

echo_step_completed

break

fi

sleep 0.5

if [ "$i" -eq "$PORT_FORWARD_MAX_ATTEMPTS" ]; then

Copilot · 2025-12-29T17:23:18Z

cluster/local/integration_tests.sh

+  mkdir -p "${projectdir}/.work"
+  ( kubectl -n crossplane-system port-forward svc/registry 5000:5000 >/dev/null 2>&1 & echo $! >"${projectdir}/.work/registry-pf.pid" )
+  for i in {1..20}; do nc -z localhost 5000 && break || sleep 0.5; done
+}


The registry port-forward process is launched in a background subshell with all output redirected to /dev/null, making it difficult to diagnose failures. The PID is captured, but if the port-forward fails immediately after starting, the subsequent nc check loop will timeout without useful diagnostic information about why the port-forward failed.

Suggested change

}

( kubectl -n crossplane-system port-forward svc/registry 5000:5000 >"${projectdir}/.work/registry-pf.log" 2>&1 & echo $! >"${projectdir}/.work/registry-pf.pid" )

cluster/local/integration_tests.sh

Copilot · 2025-12-29T17:23:19Z

cluster/local/postgresdb_functions.sh

+  set +o pipefail
+  postgres_root_pw=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32 || true)
+  set -o pipefail


The pipefail handling for password generation is overly complex. Temporarily disabling pipefail with 'set +o pipefail' and then re-enabling it just to handle the 'head' command's SIGPIPE is a workaround that adds complexity. A cleaner approach would be to check if the password was generated successfully after the command, or use a different method that doesn't trigger SIGPIPE.

Suggested change

set +o pipefail

postgres_root_pw=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32 || true)

set -o pipefail

postgres_root_pw=$(LC_ALL=C tr -cd "A-Za-z0-9" </dev/urandom | head -c 32 || true)

cluster/local/integration_tests.sh

fahedouch and others added 7 commits December 29, 2025 12:08

feat: add cache sync timeout configuration

a651a02

Signed-off-by: fahed dorgaa <fahed.dorgaa@gmail.com>

feat: add environment variable support for cache sync timeout

5e7d2ad

Signed-off-by: fahed dorgaa <fahed.dorgaa@gmail.com>

fix(ci): increase provider installation timeout

d714c25

refactor(main): remove hardcoded QPS and Burst values from config

18774f0

Signed-off-by: fahed dorgaa <fahed.dorgaa@gmail.com>

fix(integration-tests): increase timeout for provider installation to…

ed3513a

… 240 seconds Signed-off-by: fahed dorgaa <fahed.dorgaa@gmail.com>

fix(integration-tests): improve error handling for provider installat…

7d978cf

…ion timeout

debug(ci): use tmate for debuging

3d841e3

Copilot AI review requested due to automatic review settings December 29, 2025 17:18

Copilot started reviewing on behalf of Damien-DAGORN December 29, 2025 17:18 View session

Damien-DAGORN force-pushed the fix/end-to-end-tests branch from fca9635 to 98ec952 Compare December 29, 2025 17:22

Copilot AI reviewed Dec 29, 2025

View reviewed changes

Damien-DAGORN force-pushed the fix/end-to-end-tests branch 2 times, most recently from f938334 to f8266a6 Compare December 29, 2025 17:57

fix(ci): use OCI by default and add the possibility to skip mariadb test

458a8d0

Damien-DAGORN force-pushed the fix/end-to-end-tests branch from f8266a6 to 458a8d0 Compare December 30, 2025 08:34

Damien-DAGORN merged commit ad0b78c into master Dec 30, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/end to end tests #9

Fix/end to end tests #9

Uh oh!

Damien-DAGORN commented Dec 29, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	echo_error "Helm install/upgrade of PostgresDB failed"
	echo_error "Helm install/upgrade of PostgresDB failed"
	return $rc

	}
	( kubectl -n crossplane-system port-forward svc/registry 5000:5000 >"${projectdir}/.work/registry-pf.log" 2>&1 & echo $! >"${projectdir}/.work/registry-pf.pid" )

Fix/end to end tests #9

Fix/end to end tests #9

Uh oh!

Conversation

Damien-DAGORN commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Provider SQL: Default OCI packaging + stabilized E2E on Kind

Summary

Key Changes

Implementation Details

Test Plan

Flags & Configuration

CI Impact

Breaking Changes

Checklist

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Damien-DAGORN commented Dec 29, 2025 •

edited

Loading