Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace awslogs with the cloudwatch-agent #811

Merged
merged 1 commit into from
Mar 14, 2021
Merged

Replace awslogs with the cloudwatch-agent #811

merged 1 commit into from
Mar 14, 2021

Conversation

yob
Copy link
Contributor

@yob yob commented Mar 14, 2021

We've long used awslogs to send logs from elastic stack instances to cloudwatch logs. However, it's deprecated and AWS now recommend using the "cloudwatch agent". For example, there's currently a banner in the awslogs docs that says:

This reference is for the older CloudWatch Logs agent, which is on the
path to deprecation. We strongly recommend that you use the unified
CloudWatch agent instead

Our windows AMIs already use the cloudwatch agent, and this finally updates the two linux AMIs (amd64/arm64) to use it as well. Although windows was already using cloudwatch logs, I have renamed a few files to keep them consistent across linux and windows.

The new linux config file was generated by booting the 5.2.0 linux/amd64 stack AMI, installing the agent via yum, and then running the wizard to convert the legacy awslogs.conf:

/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

The new agent can apparently do do tracing and metrics (including running a local statsd compatible interface). I've left them disabled for now, and focused just on replacing the deprecated awslogs.

We believe this might also resolve a known issue (#709) where the awslogs tool calls the CreateLogGroup API endpoint over and over. For large Buildkite customers, this can result in the regional quota for CreateLogGroup being hit and some logs not being recorded.

Fixes #713
Fixes #709

We've long used awslogs to send logs from elastic stack instances to
cloudwatch logs. However, it's deprecated and AWS now recommend using
the cloudwatch agent[2]. For example, there's currently a banner in the
awslogs docs[1] that says:

> This reference is for the older CloudWatch Logs agent, which is on the
> path to deprecation. We strongly recommend that you use the unified
> CloudWatch agent instead

Our windows AMIs already use the cloudwatch agent, and this finally
updates the two linux AMIs (amd64/arm64) to use it as well. Although
windows was already using cloudwatch logs, I have renamed a few files to
keep them consistent across linux and windows.

The new linux config file was generated by booting the 5.2.0 linux/amd64
stack AMI, installing the agent via yum, and then running the wizard to
convert the legacy awslogs.conf:

    /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

The new agent can apparently do do tracing and metrics (including
running a local statsd compatible interface). I've left them disabled
for now, and focused just on replacing the deprecated awslogs.

We believe this might also resolve a known issue (#709) where the
awslogs tool calls the CreateLogGroup API endpoint over and over. For
large Buildkite customers, this can result in the regional quota for
CreateLogGroup being hit and some logs not being recorded.

Fixes #713
Fixes #709

[1] https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html
[2] https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html
@yob yob force-pushed the cloudwatch-agent branch from 50e0703 to 4a1f223 Compare March 14, 2021 20:52
@yob yob marked this pull request as ready for review March 14, 2021 22:01
@yob
Copy link
Contributor Author

yob commented Mar 14, 2021

While developing this, I created a new elastic stack based on this branch and confirmed the 7 log groups are still being updated. I wasn't sure if extra permissions would be required on the instance IAM role, but apparently not.

I assume we might need extra permissions if we wanted to experiment with tracing or metrics.

Copy link
Contributor

@chloeruka chloeruka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looks good to me! 🚀


# Start logging daemons as soon as possible to ensure failures in this script get sent
systemctl restart rsyslog
systemctl restart awslogsd
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all this on-boot stuff because the cloudwatch-agent is installed and configured in the AMI, and there should be no further config required at boot time.

@yob yob merged commit 9ac9cdc into master Mar 14, 2021
@yob yob deleted the cloudwatch-agent branch March 14, 2021 22:38
@pda
Copy link
Member

pda commented Mar 16, 2021

Nice! 👌🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CreateLogGroup service limits Migrate from awslogs to CloudWatch Agent
3 participants