All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
v5.7.0 (2021-09-29)
- Support for storing builds, git-mirrors, and Docker on NVMe Instance Storage #557 (@lox)
- Retried login for ECR and generic Docker registries #930
- Experimental CloudFormation service role, listing the IAM Actions required to create, update, and delete the template #926
- A README feature matrix for Linux and Windows #910
- qemu and binfmt hooks for cross-architecture Docker image builds #903
- Tag pins for the included plugin #906 (@nitrocode)
- Support for AWS SSM sessions #905 (@xiaket)
- Included buildkite-agent from v3.32.3 to v3.33.3 #932
EnableDockerExperimental
also enables Docker CLI experimental mode #911
- A frequent source of build interruption caused by scale-in #923
- A resource ordering issue preventing instances from self terminating when a stack #928
- Support for
BuildkiteAdditionalSudoPermissions
with spaces #916 (@twunderlich-grapl) - Finish the git lfs install #912 (@pauldraper)
v5.6.1 (2021-09-02)
- Missed parameter
BuildkiteAgentTokenParameterStoreKMSKey
inAutoscaling
nested cloudformation template #901
v5.6.0 (2021-08-31)
- Cross-region secrets bucket support to git-credentials-s3-secrets elastic-ci-stack-s3-secrets-hooks#48
- AssumeRole support in the ECR Login plug-in ecr-buildkite-plugin#69
- Instance IAM Profile role permissions to be more tightly scoped #800 (@nitrocode)
- Import buildkite-lambda-scaler from the Severless Application Repository #685
- The built-in environment hook no longer overwrites
AWS_REGION
andAWS_DEFAULT_REGION
if already present #892 (@toothbrush) - Included buildkite-agent from 3.32.1 to 3.32.3
- Hourly disk check script on Linux #898
- git-credentials-s3-secrets on Windows elastic-ci-stack-s3-secrets-hooks#47
- PowerShell hook support on Windows agent#1497
v5.5.1 (2021-08-06)
- Included buildkite-agent from 3.32.0 to 3.32.1
- A source of unexpected instance termination causing build failures #888
v5.5.0 (2021-07-30)
- Template validation rules for the Buildkite Agent token #873
- Secret redaction in build logs agent#1452
- Support for the
pre-bootstrap
Buildkite Agent Lifecycle Hook agent#1456
- Included buildkite-agent from 3.30.0 to 3.32.0 #876 (keithduncan)
- Remove logging of the Buildkite Agent token to CloudWatch Logs #879
- Cross-region S3 bucket access for secrets #875
- An error when handling zero length
environment
files elastic-ci-stack-s3-secrets-hooks#42 - A hang when loading ssh keys without a trailing newline elastic-ci-stack-s3-secrets-hooks#44
v5.4.0 (2021-06-30)
- Update Buildkite Agent to version 3.30.0 #868 (@esalter)
- The HttpPutResponseHopLimit from 1 to 2 #858
- The default cost allocation tag value #859
v5.3.2 (2021-06-11)
- Fix s3secrets-helper for Windows #846 (DuBistKomisch)
- Pin Docker systemd configuration to the same Docker version #849 (cmanou)
- Excessive instance scaling while waiting for instances to boot
v5.3.1 (2021-05-05)
- Allow dashes and multiple forward slashes (/) in BuildkiteAgentTokenParameterStorePath #835 #837 (nitrocode)
v5.3.0 (2021-04-28)
- Replace awslogs with the cloudwatch-agent #811 (yob)
- Avoid scaling down too aggressively when there are pending jobs in certain conditions #823 (yob)
- Bump docker from 19.03.x to 20.10.x #826 (yob)
- Bump docker-compose on all operating systems to 1.28.x #825 (yob)
- Bump agent from 3.27.0 to 3.29.0 #827 (yob)
- Bump lifecycled from 3.0.2 to 3.2.0 #824 (yob)
- Bump git on windows from 2.22.0 to 2.31.0 #819 (yob)
- Bump ECR plugin to v2.3.0 #816 (chloeruka)
- Documentation improvements #815 #810 (acaire)
v5.2.0 (2021-02-08)
-
agent names use client-side
%spawn
not server-side%n
for numbering #794 (pda) -
IMDSv2Tokens
parameter: optional / required #786 (holmesjr) → #788 & #789 (pda)
- Default to
gp3
volumes, previouslygp2
#784 (yob)
c6gn.*
instances recognized as ARM #785 (yob)s3secrets-helper
installation more resilient #783 (shevaun)
v5.1.0 (2020-12-11)
- Experimental support for ARM instance types (linux only) #758 (yob)
- Support up to four instance types and mixed combinations of Spot/OnDemand instances #710 (yob)
- The
InstanceType
stack parameter can now be a CSV with up to 4 types - The new
OnDemandPercentage
stack parameter can be reduced from 100% (the default) to allow some Spot instances
- The
- Update Buildkite Agent to v3.26.0 #778 (JuanitoFatas)
- Speed up secret downloads from S3 (from ~8 seconds to under 1 second) #772 (pda)
- ECR plugin now has its own log group header to make run time visible #773 (pda)
- Avoid IAM changes for some kinds of stack updates (like changing InstanceType) #781 (yob)
- Improved documentation
v5.0.1 (2020-11-09)
- Retreive agent token from parameter store on windows agents #762 (chrisfowles)
v5.0.0 (2020-10-26)
- Our previously experimental blazing fast lambda scaler is now the default which makes for much faster scaling in response to pending jobs #575 (@lox)
- EXPERIMENTAL Windows support on a new Windows Server 2019 based image #546, #632, #595, #628, #614, #633 (jeremiahsnapp) #670 (pda) #600 (tduffield)
- There is a known issue with graceful handling of spot instances under windows. The agent may not disconnect gracefully, and may appear in the Buildkite UI for a few minutes after they terminate #752
- Support for buildkite/image-builder which can enable you to customize AMIs based off the ones we ship #692 (keithduncan)
- Support for multiple security groups on instances #667 (jdub)
- AMI and Lambda Scaler support more regions: ap-east-1 (Hong Kong), me-south-1 (Bahrain), af-south-1 (Cape Town), eu-south-1 (Milan) #718 (JuanitoFatas)
- Support for loading BuildkiteAgentTokenPath from AWS Parameter Store #601 (jradtilbrook), #625 (jradtilbrook)
- Docker configuration is now isolated per-step #678 (patrobinson) #756 (yob)
- Use EC2 LaunchTemplate instead of a LaunchConfiguration #589 (lox)
- InstanceType default is now
t3.large
(wast2.nano
) #699 (pda) - Made ECR hook an
environment
hook (waspre-command
). #677 (pda) - Mappings file format has changed to list both Linux and Windows AMIs #569 (lox)
- We now warn instead of hard-fail when there's no configured SSH keys #669 (pda)
- We now only set git-mirrors-path when EnableAgentGitMirrorsExperiment is set #698 (pda)
- Set RootVolumeName appropriately and allow it to be overridden #593 (jeremiahsnapp)
- Disable AZRebalancing to prevent running instances being terminated unnecessarily #751
- Stop trying to call poweroff after the agent shuts down #728 (yob)
- Update agent config to use
tags-from-ec2-meta-data
#727 (yob) - Set correct content-type on YAML template files shipped to S3 #683 (kyledecot)
- Fixed introduced issue with SSM permissions #657 (kushmansingh)
- Add correct cost tags to S3 #602 (hawkowl)
- Fix incorrect yaml syntax for spot instances #591 (lox)
- Bump Buildkite Agent to v3.25.0 #749 (JuanitoFatas)
- Bump Buildkite Agent Scaler to v1.0.2 #724 (JuanitoFatas) 4fafd8e (JuanitoFatas)
- Bump docker to v19.03.13 (linux) and v19.03.12 (windows) and docker-compose to v1.27.4 (linux, windows uses latest choco version) #719 (yob) #723 (JuanitoFatas)
- Bump bundled plugins to the latest versions secrets ecr docker login
- Remove AWS autoscaling in favor of buildkite-agent-scaler #575 (lox) #588 (jeremiahsnapp)
- Multiple parameters! See below
The following parameters have been removed or reworked:
EnableExperimentalLambdaBasedAutoscaling
was removed (it's the default now)BuildkiteOrgSlug
was removed – the statistics reported by buildkite-agent-scaler make it redundant, but consider buildkite-agent-metrics if you need more detailed metric monitoringBuildkiteTerminateInstanceAfterJobTimeout
is replaced by the more conciseScaleInIdlePeriod
#586 (jeremiahsnapp)BuildkiteTerminateInstanceAfterJobDecreaseDesiredCapacity
andScaleDownAdjustment
were removed - instances will now always try to decrement the ASG desired count when their waiting period for new jobs has elapsedScaleUpAdjustment
is replaced byScaleOutFactor
as the new lambda scaler calculates how many agents are needed at the timeScaleDownPeriod
andScaleCooldownPeriod
are replaced byScaleInIdlePeriod
The following other parameters have been added:
ScaleOutFactor
(default:1.0
) is a multiplier that allows you to add extra agents when scaling up is neededScaleInIdlePeriod
(default:600
seconds) is used for scale-in by letting idle agents remove themselves from the ASGInstanceOperatingSystem
(default:linux
) can be used to specify Windows if you need Windows Server 2019 instances- Windows-only
BuildkiteWindowsAdministrator
(default:true
) adds the local "buildkite-agent" user account to the local Windows Administrator group - optional
BuildkiteAgentTokenParameterStorePath
andBuildkiteAgentTokenParameterStoreKMSKey
are for storing your token in SSM Parameter Store and are an alternative toBuildkiteAgentToken
- optional
ScaleOutForWaitingJobs
(default:false
) can help anticipate future job load and get your instances ready ahead of time
v4.5.0 (2020-07-10)
- Added ImageIdParameter CloudFormation parameter for SSM Parameter Store image lookup #691 (@keithduncan)
v4.4.0 (2020-05-21)
- Increase the threshold for disk cleanup to 5GB free for 4.3 #646 (@huonw)
- Updated buildkite-agent to version 3.21.1 #687 (@denbeigh2000)
- Updated docker-compose to version 1.25.1 #660 (@dreyks)
- Updated git lfs to 2.10.0 #668 (@kushmansingh)
v4.3.5 (2019-11-01)
- Bump buildkite-agent to v3.13.2 #644 (@lox)
- Prune docker builder cache in cleanup #642 (@sj26)
- Power off immediately if cloud-init fails #638 (@dbaggerman)
- Replaced Linux fixed AMI source with source AMI filter #636 (@cawilson)
- Bump docker version to 19.03.2 #634 (@PaulLiang1)
- Add cloudformation output exports #616 (@jradtilbrook)
- Add python3 and future lib to allow prepping for Python2 EOL #583 (@GreyKn)
- Add missing eu-north-1 to lambda mapping #613 (@lox)
- Docker experimental needs boolean not string #611 (@lox)
- Update ArtifactBucketPolicy to match docs #607 (@gough)
v4.3.4 (2019-07-28)
- Bump agent to v3.13.2, docker to 19.03 and compose to 1.24.1 #609 (@lox)
- Docker experimental needs boolean not string #610 (@lox)
v4.3.3 (2019-06-01)
- Bump agent to 3.12.0 #594 (@lox)
v4.3.2 (2019-04-16)
- Bump agent scaler to support newer regions #566 (@lox)
v4.3.1 (2019-04-09)
- Add back us-east-1 to regions #563 (@ksindi)
v4.3.0 (2019-04-06)
- Add EnableAgentGitMirrorsExperiment parameter #555 (@lox)
- Remove temporary packer key #551 (@lox)
- Updated experimental lambda-based auto-scaler, respect ScaleDownPeriod #559 (@lox)
- Bump agent to 3.10.3 #558 (@lox)
- Install pigz for parallel decompression in docker pull #560 (@lox)
- Use spawn vs multiple systemd units #552 (@lox)
- Write cloudwatch metrics from lambda scaler #541 (@lox)
- Bump docker-login, ecr and secrets plugins to latest #550 (@lox)
- Bump lifecycled to v3.0.2 #548 (@lox)
- Restart agent on SIGPIPE (journald restart) #545 (@lox)
- Set the priority of the agent to its instance integer #539 (@tduffield)
v4.2.0 (2019-02-25)
- Add an experimental lambda scaler #529 (@lox)
- Add helpers to Makefile for building packer image #535 (@tduffield)
- Allow users to configure the root block device #534 (@tduffield)
- Fix typo in CF setting #537 (@tduffield)
- Make sure we reload the systemd unit files #533 (@tduffield)
v4.1.0 (2019-02-11)
- Bump docker to 18.09.2 to fix CVE-2019-5736 #532 (@lox)
- Fix typo in docker experimental config #528 (@lox)
- Allow users to specify additional sudo permissions #527 (@tduffield)
- Add new "TerminateInstanceAfterJob" configuration #523 (@tduffield)
- Add Buildkite Org to Cloudwatch Metrics as a Dimension to support multiple orgs per AWS account #510 (@lox)
v4.0.4 (2019-01-29)
- Fix bug where lifecycled logs aren't flushed to cloudwatch logs #524 (@lox)
- Prevent systemd from killing agent process group #521 (@lox)
- Expose AgentLifecycleTopic for programatic scaling #522 (@tduffield)
v4.0.3 (2019-01-18)
- Bump docker to 18.09.1 #516 (@lox)
- Bump agent to 3.8.2 #514 (@lox)
- Tunable knob for ASG Cooldown period #495 (@prateek)
v4.0.2 (2018-12-20)
- Set a region for awslogsd #508 (@dgarbus)
- Fix bug where lifecycled didn't pick up handler script #507 (@lox)
v4.0.1 (2018-11-30)
- Show correct stack version in log output #503 (@lox)
- Remove duplicate AssociatePublicIpAddress
v4.0.0 (2018-11-28)
No changes from v4.0.0-rc3.
v4.0.0-rc3 (2018-11-05)
- Use rsyslogd+awslogs for logs #498 (@lox)
- Remove the dash in description to be consistent with v3 #499 (@lox)
- Goss specs #497 (@lox)
- Bump lifecycled to v3.0.0 #496 (@lox)
- Support timestamp-lines #494 (@raylu)
- Add docs for using the bootstrap script #493 (@toolmantim)
- Start logging daemons as soon as possile during bootstrap #492 (@zsims)
- Merge template files into a single file #487 (@lox)
- Move AMI copy into a dedicated step #486 (@lox)
- Update AMI to latest packages #480 (@lox)
v4.0.0-rc2 (2018-09-04)
- Install Git LFS #468 (@lox)
- Update to the very latest aws-cli #478 (@lox)
- Bump lifecycled to 2.0.2 #475 (@lox)
- Default BuildkiteAgentRelease to stable #474 (@lox)
- Added InstanceCreationTimeout as parameter #476 (@RexChenjq)
- Update README.md to reflect Amazon Linux 2#470 (@alexjurkiewicz)
- Clean up docker login hooks #466 (@lox)
- Rename the log group name we are using for elastic-stack.log file so we are consistent #463 (@arturopie)
- Update to latest Amazon Linux 2 LTS #462 (@lox)
v4.0.0-rc1 (2018-07-18)
- Use Amazon Linux 2 as base AMI #363 (@lox)
- Bump docker-login and ecr plugin to latest #454 (@lox)
- Bump docker to 18.03.1-ce and docker-compose to 1.22.0 #455 (@lox)
- Support attaching multiple policies via the parameter #446 (@zsims)
- Make KeyName optional #444 (@zsims)
- Provide InstanceRoleName as Output #438 (@lox)
v3.3.1 (2018-09-13)
- Bump lifecycled to v2.1.1 #488 (@lox)
v3.3.0 (2018-09-04)
- Bump Amazon Linux to 2018.03 #471 (@lox)
- Bump docker to 18.03.1-ce and docker-compose to 1.22.0 #455 (@lox)
- Support attaching multiple policies via the parameter #446 (@zsims)
- Set correct variable to pass to upstream ecr plugin #453 (@bshelton229)
- Use exit instead of return in bk-check-disk-space.sh script #440 (@arturopie)
- Move cleanup cron jobs to run hourly #429 (@arturopie)
v3.2.1 (2018-05-24)
- Support enabling agent experiments #423 (@lox)
- Use the docker directory to check for disk space #418 (@arturopie)
- Set InstanceRoleName as stack template output #421 (@dblandin)
v3.2.0 (2018-05-17)
- Updated stable agent to buildkite-agent v3.1.2
- Default EnableDockerUserNamespaceRemap to true #417 (@lox)
- Bump the minimum inodes to 250K to allow for big docker images #416 (@lox)
- Update to the new secrets hooks repo URL #414 (@toolmantim)
v3.1.1 (2018-05-02)
- Updated stable agent to buildkite-agent v3.1.1
- Bump docker to 18.03.0-ce and docker-compose to 1.21.1 #411 (@lox)
v3.1.0 (2018-04-30)
- Allow userns remapping to be disabled #410 (@lox)
- Update lifecycled to 2.0.1 #407 (@lox)
- Fix cfn stack instance profile name #395 (@chandanadesilva)
v3.0.0 (2018-04-18)
v3.0.0-rc1 (2018-04-18)
- Use new Metrics API, drop requirement for org-slug and api-token #405 (@lox)
- Bump Lifecycled to v2.0.0 #404 (@lox)
- Add support for billing tags #398 (@tduffield)
- Drop support for buildkite-agent v2, stable is 3.0.0 #400 (@lox)
- Don't blow up when no plugins are enabled #394 (@haines)
- Fail install if docker hasn't started #387 (@lox)
- Update docker to stable 17.12.1-ce #391 (@lox)
- Configure docker before it starts to avoid corruption #377
- Show elastic stack logs in Instance Terminal for easier debugging
- Collect cron output in elastic-stack.log
- Check (and free) diskspace before builds
- Amazon Linux 2017.09.1 (to mitigate Meltdown/Spectre)
- Docker 17.12.0-ce and Compose 1.18.0
- Bump metrics lambda version to v2.0.2
- Bump ECR plugin to 1.1.3
- Updated to latest buildkite-metrics lambda version (v2.0.0) that respects rate limiting headers #357
- Added a new parameter for adding extra buildkite-agent tags/metadata #359
- Autoscaling is suspended when lifecycled crashes #344
- Optimize the permissions check script to only fix the current pipeline’s build dir #340 (@toolmantim)
- CloudWatch Logs namespaced #342
- Docker 17.09.0-ce #350 (@lox)
- Buildkite Agent v2.6.6 and v3.0.0-beta34
- Optionally run docker as buildkite agent with userns-remap #341 (@lox)
- Bump buildkite-metrics to v1.5.0 (retry on error)
- Replace shudder with new lifecycled that supports spot notifications
- Re-added deprecated DOCKER_HUB_USER variables
- Move ecr, secrets and docker-login to plugins
- Add a signature llama to the environment hook
- Show stack version in the environment hook
- Move pipeline to yaml, json version is deprecated
- Use Shudder tool to handle autoscaling events and spot notifications
- Docker 17.06.0-ce
- Remove deprecated DOCKER_HUB_USER variables
- Buildkite Agents v3.0.0-beta28
- Edge agent version is downloaded when instances boot rather than baked in AMI
- Added SECRETS_PLUGIN_ENABLED to allow secrets downloading to be disabled
- Updated to latest Amazon Linux 2017.03.1 (see security advisory AWS-2017-007)
- Updated docker-compose to 1.14.0
- Using an env secrets bucket hook caused builds to fail with an undefined variable error
- 🐳 Docker-Compose 1.14.0-r2 (with support for cache_from directive)
- Buildkite Agents v2.6.3 and v3.0.0-beta27
- Agent version defaults to beta rather than stable
- Using git-credentials was broken (#290)
- Managed secrets bucket failed to create (#282)
- A secrets bucket is created automatically if left blank
- Git over HTTPS is supported via a git-credentials file
- A customisable ScaleDownPeriod parameter is available to prevent rapid scale downs
🐳 Docker 17.05.0-ce and Docker-Compose 1.13.0
- Buildkite Agents v2.6.3 and v3.0.0-beta23
- Latest aws-cli
- Autoscaling group is replaced on update, for smoother updates in large groups
- Fixed a bug where the stack would scale up faster than instances were launching
- 🕷 Avoid restarting docker whilst it's initializing to try and avoid corrupting it (#236)
- 🆙 Includes new Buildkite Agent v2.5.1 (stable) and v3.0-beta.19 (beta)
- ⏰ Increase the polling duration for scale down events to prevent hitting api limits (#263)
- Docker 17.03.0-ce and Docker-Compose 1.11.2
- Metrics are collected by a Lambda function, so no more metrics sub-stack 🎉
- Secrets bucket uses KMS-backed SSE by default
- Support authenticated S3 paths for BootstrapScriptUrl and AuthorizedUsersUrl
- New regions (US Ohio)
- ECRAccessPolicy parameter for easy Amazon ECR configuration
- Fixed size stacks are possible, and don't create auto-scaling resources
- Added version number to stack description and agent metadata
- Optionally non-public agent instances
- Improved scale-up/scale-down logic
- Cloudwatch logs are sent to correct region
- Fixed size stacks are support
- Correct release names for beta and edge agent
- Better error handling for when fetching env or private-key fails
- Regions that require v4 signatures are better handled
- Working docker-gc script
- Autoscaling is suspended during stack updates
- Breaking changes
- Initialization logs have moved to /var/log/elastic-stack.log
- ManagedPolicyARNs has been removed, a singular version exists now: ManagedPolicyARN
- 👭 If you run multiple agents per instance, chmod during build environment setup no longer clashes (#143)
- 🔐 The AWS_ECR_LOGIN_REGISTRY_IDS option has been fixed, so it now calls aws ecr get-login --registry-ids correctly (#141)
- 📡 Buildkite Agent has been updated to the latest builds
- 🐳 Docker has been upgraded to 1.12.1
- 🐳 Docker Compose has been upgraded to 1.8.0
- 🔒 Can now add a custom managed policy ARN to apply to instances to add extra permissions to agents
- 📦 You can now specify a BootstrapScriptUrl to a bash script in an S3 bucket for performing your own setup and install tasks on agent machine boot
- 🔑 Added support for a single SSH key at the root of the secrets bucket (and SSH keys have been renamed)
- 🐳 Added support for logging into any Docker registry, and built-in support for logging into AWS ECR (N.B. the docker login environment variables have been - renamed)
- 📄 Docker, cloud-init and CloudFormation logs are sent to CloudWatch logs
- 📛 Instances now have nicer names
- ⚡ Updating stack parameters now triggers instances to update, no need for deleting and recreating the stack
- 🚥 The "queue" parameter is now "default" by default, to make it easier and less confusing to get started. Make sure to update it to "elastic" if you want to continue using that queue name.
- 🐳 Jobs sometimes starting before Docker had started has been fixed
- ⏰ Rolling upgrades and stack updates are now more reliable, no longer should you get stack timeouts
- Initial release! 🎂🎉