-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Cloudwatch output plugin Connect() #3335
Improved Cloudwatch output plugin Connect() #3335
Conversation
Fixes influxdata#3316. Use STS to validate the provided credentials work, rather than `ListMetricData` which requires a separate permission, and document the required permission. - [x] Signed [CLA](https://influxdata.com/community/cla/). - [x] Associated README.md updated. - [x] Has appropriate unit tests. I'm not sure about unit tests since none exist using mock AWS, but I did some manual testing. I made a test IAM user with only `cloudwatch:PutMetricData` permission, policy: ``` { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData" ], "Resource": [ "*" ] } ] } ``` I then used its access key as environment variables locally and ran telegraf in this environment, with this config: ```ini [agent] debug = "true" [[outputs.cloudwatch]] region = "eu-west-1" namespace = "InfluxData/Telegraf" [[inputs.cpu]] ``` I then successfully saw data flushed when running telegraf like so: ``` $ ./telegraf --config test.conf 2017-10-12T22:23:07Z D! Attempting connection to output: cloudwatch 2017-10-12T22:23:07Z D! Successfully connected to output: cloudwatch 2017-10-12T22:23:07Z I! Starting Telegraf v1.5.0~c74c29b1 2017-10-12T22:23:07Z I! Loaded outputs: cloudwatch 2017-10-12T22:23:07Z I! Loaded inputs: inputs.cpu 2017-10-12T22:23:07Z I! Tags enabled: host=Adams-MacBook-Pro-2.local 2017-10-12T22:23:07Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"Adams-MacBook-Pro-2.local", Flush Interval:10s 2017-10-12T22:23:20Z D! Output [cloudwatch] buffer fullness: 9 / 10000 metrics. 2017-10-12T22:23:21Z D! Output [cloudwatch] wrote batch of 9 metrics in 976.776485ms ^C2017-10-12T22:23:26Z I! Hang on, flushing any cached metrics before shutdown 2017-10-12T22:23:26Z D! Output [cloudwatch] buffer fullness: 0 / 10000 metrics. ``` And I also checked with bad credentials and saw the error message: ``` $ AWS_ACCESS_KEY_ID=foobar ./telegraf --config test.conf 2017-10-12T22:24:50Z E! cloudwatch: Cannot use credentials to connect to AWS : InvalidClientTokenId: The security token included in the request is invalid. status code: 403, request id: 2824d227-af9c-11e7-aa9b-f5e6ab3e0c4c 2017-10-12T22:24:50Z E! Failed to connect to output cloudwatch, retrying in 15s, error was 'InvalidClientTokenId: The security token included in the request is invalid. status code: 403, request id: 2824d227-af9c-11e7-aa9b-f5e6ab3e0c4c' ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one thing that comes to mind is that previously we knew if we had ListMetricData authorization, but now we only know if we can authenticate. Granted, knowing about this permission was not helpful because it was the wrong one, but I wonder if there is a way to test up front if the user has the correct permissions setup.
Ultimately it will be reported on write so I don't think it is crucial.
There is no way to test for specific permissions on AWS except to try the action. Some EC2 endpoints have a "dry run" mode for checking but they seem to have ditched this idea. |
Thank you! |
Fixes #3319. Use STS to validate the provided credentials work, rather than
ListMetricData
which requires a separate permission, and document the required permission.I'm not sure about unit tests since none exist using mock AWS, but I did some manual testing.
I made a test IAM user with only
cloudwatch:PutMetricData
permission, policy:I then used its access key as environment variables locally and ran telegraf in this environment, with this config:
I then successfully saw data flushed when running telegraf like so:
And I also checked with bad credentials and saw the error message: