-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace awslogs with the cloudwatch-agent #811
Conversation
We've long used awslogs to send logs from elastic stack instances to cloudwatch logs. However, it's deprecated and AWS now recommend using the cloudwatch agent[2]. For example, there's currently a banner in the awslogs docs[1] that says: > This reference is for the older CloudWatch Logs agent, which is on the > path to deprecation. We strongly recommend that you use the unified > CloudWatch agent instead Our windows AMIs already use the cloudwatch agent, and this finally updates the two linux AMIs (amd64/arm64) to use it as well. Although windows was already using cloudwatch logs, I have renamed a few files to keep them consistent across linux and windows. The new linux config file was generated by booting the 5.2.0 linux/amd64 stack AMI, installing the agent via yum, and then running the wizard to convert the legacy awslogs.conf: /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard The new agent can apparently do do tracing and metrics (including running a local statsd compatible interface). I've left them disabled for now, and focused just on replacing the deprecated awslogs. We believe this might also resolve a known issue (#709) where the awslogs tool calls the CreateLogGroup API endpoint over and over. For large Buildkite customers, this can result in the regional quota for CreateLogGroup being hit and some logs not being recorded. Fixes #713 Fixes #709 [1] https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html [2] https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html
While developing this, I created a new elastic stack based on this branch and confirmed the 7 log groups are still being updated. I wasn't sure if extra permissions would be required on the instance IAM role, but apparently not. I assume we might need extra permissions if we wanted to experiment with tracing or metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Looks good to me! 🚀
|
||
# Start logging daemons as soon as possible to ensure failures in this script get sent | ||
systemctl restart rsyslog | ||
systemctl restart awslogsd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed all this on-boot stuff because the cloudwatch-agent is installed and configured in the AMI, and there should be no further config required at boot time.
Nice! 👌🏼 |
We've long used awslogs to send logs from elastic stack instances to cloudwatch logs. However, it's deprecated and AWS now recommend using the "cloudwatch agent". For example, there's currently a banner in the awslogs docs that says:
Our windows AMIs already use the cloudwatch agent, and this finally updates the two linux AMIs (amd64/arm64) to use it as well. Although windows was already using cloudwatch logs, I have renamed a few files to keep them consistent across linux and windows.
The new linux config file was generated by booting the 5.2.0 linux/amd64 stack AMI, installing the agent via yum, and then running the wizard to convert the legacy awslogs.conf:
The new agent can apparently do do tracing and metrics (including running a local statsd compatible interface). I've left them disabled for now, and focused just on replacing the deprecated awslogs.
We believe this might also resolve a known issue (#709) where the awslogs tool calls the CreateLogGroup API endpoint over and over. For large Buildkite customers, this can result in the regional quota for CreateLogGroup being hit and some logs not being recorded.
Fixes #713
Fixes #709