-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
The Examples Test workflow is experiencing a complete failure where all curl-based tests fail with exit code 35 (CURLE_SSL_CONNECT_ERROR) after merging PR #524.
Failed Run Details
- Run: 21726883581
- Workflow: Examples Test
- Commit: 54fd26e
- PR: fix: remove HTTP_PROXY/HTTPS_PROXY env vars from agent container #524 - "fix: remove HTTP_PROXY/HTTPS_PROXY env vars from agent container"
- Date: 2026-02-05T20:13:44Z
Failure Pattern
ALL curl tests failing with exit code 35:
- ✅ basic-curl.sh setup - containers start successfully
- ❌ basic-curl.sh execution -
curl -s https://api.github.com→ exit code 35 - ✅ using-domains-file.sh setup - containers start successfully
- ❌ using-domains-file.sh execution -
curl -s https://api.github.com→ exit code 35 - ✅ debugging.sh setup - containers start successfully
- ❌ debugging.sh execution -
curl -s https://api.github.com/zen→ exit code 35 - ⏭️ blocked-domains.sh - skipped due to previous failures
Key Observations
✅ What Works
- Container creation and startup
- Squid healthcheck passes
- iptables rules applied successfully
- Network configuration (172.30.0.0/24)
- Docker Compose orchestration
❌ What Fails
- curl SSL/TLS connection to HTTPS endpoints
- 100% failure rate across all example tests
- Deterministic failure (not intermittent)
🔍 Container Logs Analysis
From the debugging.sh test logs:
[entrypoint] Proxy configuration:
[entrypoint] HTTP_PROXY=
[entrypoint] HTTPS_PROXY=
[entrypoint] Network information:
[entrypoint] IP address: 172.30.0.20
[entrypoint] Hostname: 83c0d2a68e4f
[entrypoint] Executing command: /bin/bash -c curl -s https://api.github.com/zen
[DEBUG] Agent exit code: 35
``````
## Root Cause Analysis
### Curl Exit Code 35 Meaning
From curl documentation:
``````
CURLE_SSL_CONNECT_ERROR (35)
A problem occurred somewhere in the SSL/TLS handshake.
You really want the error buffer and read the message there
as it pinpoints the problem slightly more.
``````
### PR #524 Changes
The merged PR removed HTTP_PROXY/HTTPS_PROXY environment variables:
**Rationale from PR**:
- "Intercept mode (iptables DNAT 80/443 → squid:3129) handles all routing transparently"
- "Port 3128 is unreachable from the agent container, causing Codex (Rust/reqwest) to fail"
**Changes made**:
1. Removed `HTTP_PROXY` and `HTTPS_PROXY` from agent container environment
2. Added proxy vars to `EXCLUDED_ENV_VARS` to prevent leaking via `--env-all`
3. Updated entrypoint.sh logging to show empty proxy vars
### Hypothesis
The SSL connection failure suggests one of these issues:
1. **iptables DNAT not working correctly**: Traffic to port 443 may not be redirecting to Squid's intercept port (3129)
2. **Squid intercept mode misconfiguration**: Squid may not be properly handling intercepted HTTPS (CONNECT) traffic
3. **Certificate verification issue**: curl in the agent container may not trust the connection without explicit proxy env vars
4. **Squid → External SSL handshake failure**: Squid may be failing to establish the outbound SSL connection
## Evidence Timeline
``````
20:14:47.6048977Z Container awf-agent Started
20:14:47.6247851Z [entrypoint] Agentic Workflow Firewall - Agent Container
20:14:47.6251074Z [iptables] NOTE: Host-level DOCKER-USER chain handles egress filtering
20:14:47.6252390Z [iptables] Squid proxy: squid-proxy:3128 (intercept: 3129)
20:14:47.6733845Z [iptables] Redirect HTTP (80) and HTTPS (443) to Squid intercept port...
20:14:47.6906418Z [iptables] NAT rules applied successfully
20:14:47.7058208Z [entrypoint] Executing command: /bin/bash -c curl -s https://api.github.com/zen
20:14:47.8345512Z [DEBUG] Agent exit code: 35
Only 0.1 seconds between command execution and failure - suggests immediate SSL handshake failure.
Impact
- 🔴 CRITICAL - All example tests blocked
- Cannot verify basic firewall functionality
- CI pipeline broken for main branch
- User-facing examples do not work
Recommended Investigation Steps
-
Check Squid access logs for connection attempts:
sudo cat /tmp/squid-logs-1770322481586/access.log
-
Verify iptables DNAT rules are redirecting 443 traffic:
docker exec awf-agent iptables -t nat -L -n -v -
Test curl with verbose output to see SSL handshake details:
sudo awf --allow-domains api.github.com --keep-containers -- curl -v https://api.github.com/zen
-
Compare with pre-PR fix: remove HTTP_PROXY/HTTPS_PROXY env vars from agent container #524 behavior: Check if containers built from parent commit (769a6f5) work correctly
-
Verify Squid intercept port configuration:
docker exec awf-squid grep -E '(http_port|ssl_bump)' /etc/squid/squid.conf
Related Issues
- PR fix: remove HTTP_PROXY/HTTPS_PROXY env vars from agent container #524 - The change that introduced this regression
- Issue Codex smoke test fails: HTTP_PROXY points to port 3128 but explicit proxy is unreachable from agent container #523 - Original issue that PR fix: remove HTTP_PROXY/HTTPS_PROXY env vars from agent container #524 was trying to fix (Codex/Rust connection refused)
- Issue 🏥 CI Failuresquid container intermittent startup failure in debugging.sh test #505 - Previous Squid startup failures (different failure mode)
Files to Review
containers/agent/setup-iptables.sh- NAT redirection rulessrc/docker-manager.ts- Container environment configurationsrc/squid-config.ts- Squid proxy configurationcontainers/squid/squid.conf- Squid intercept port setup
🏥 Automatically investigated by CI Doctor
AI generated by CI Doctor