Skip to content

Custom CA in pki setting occasionally fails #4556

@brent-at-aam

Description

@brent-at-aam

Image I'm using:

Bottlerocket OS 1.39.0 (aws-k8s-1.32)

What I expected to happen:

We use a bootstrap container to download and configure an internal ca certificate along with proxy settings so we can pull all container images through a caching proxy.

#!/bin/bash
apt-get update
apt-get install -y curl
ca=$(curl -s http://our.internal.proxy.com/ca.crt | base64 -w 0)
echo "$ca"
/.bottlerocket/rootfs/bin/apiclient set --json "{\"pki\": {\"internal-ca\": {\"data\": \"$ca\", \"trusted\": true}}, \"network\": {\"https-proxy\": \"https://our.internal.proxy.com/\", \"no-proxy\": [\"172.20.0.0/16\", \"localhost\", \"127.0.0.1\", \".amazonaws.com\", \"169.254.169.254\", \"10.0.0.0/8\", \".internal\"]}}" 

Most of the time, this works without issue. The ca cert is configured and images pull from our proxy successfully!

What actually happened:

Occasionally, the above process "works", but image pulls fail with a bad issuer cert error message. If we inspect the settings on the bottlerocket node, we do see that the ca configuration is in place, but it doesn't seem to be working. The bootstrap container itself is set as essential. so any failures in the above script should fail the node, but it doesn't seem to matter in this case since we can visually confirm the settings are in place on the node.

I am not sure what steps I can take to troubleshoot the issue, but I am happy to collect further information the next time we get a node in this state.

How to reproduce the problem:

Wait patiently for one of the nodes to trip on the error condition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    status/needs-triagePending triage or re-evaluationtype/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions