Skip to content

Elastic CI agent fails to start when AWS Linux kernel is updated #992

@gueletk-affirm

Description

@gueletk-affirm

Describe the bug
When using an AMI where all the security fixes have been run (goes up to 4.14.262-200.489), the template fails during the AgentAutoScaleGroup creation.

Steps To Reproduce
Steps to reproduce the behavior:

  1. Launch an EC2 instance using the BuildKite AMI (with a public IP)
  2. SSH into the server and follow these instructions to update the kernel with the security patches
  3. Make a new AMI from that server
  4. Launch the Elastic CI stack using the new AMI

Expected behavior
The stack is created successfully

Actual behaviour
The stack fails during the AgentAutoScaleGroup creation.

Screen Shot 2022-03-01 at 10 01 58 AM

Stack parameters (please complete the following information):

  • AWS Region: us-east-1
  • Version: Buildkite stack v5.7.2

Additional context

  • Instance screenshot from the failing AutoScalingGroup

Screen Shot 2022-03-01 at 10 17 30 AM

  • Output of dmesg | grep docker:

    • [ 63.925260] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
  • Version of buildkite-agent inside instance: version 3.33.3, build 4013

  • have tried running the BK template with stable, beta, and edge values for BuildkiteAgentRelease

  • the closest thing I could find to this error was here, but I'm not sure how to change those settings in the AMI/template

Would really appreciate any guidance here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions