Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: AWS Cloudwatch Logs Receiver #14449

Closed
2 tasks
cpheps opened this issue Sep 22, 2022 · 9 comments · Fixed by #14730
Closed
2 tasks

New component: AWS Cloudwatch Logs Receiver #14449

cpheps opened this issue Sep 22, 2022 · 9 comments · Fixed by #14730
Labels
Accepted Component New component has been sponsored

Comments

@cpheps
Copy link
Contributor

cpheps commented Sep 22, 2022

The purpose and use-cases of the new component

Collect Log Events from AWS Cloudwatch and transpose them into OTel pdata Logs. A use case for this is for users with multi-cloud environments they can export logs from AWS into their desired log platform.

Example configuration for the component

receivers:
    awscloudwatch:
        region: <aws_region>
        profile: <aws_profile> # optional if omitted will use default for current if the AWS credential environment variables are not set
        logs:
            polling_interval: 60s # interval to poll for logs
            event_limit: 10000 # number of logs events to process at a time. Valid values between 1-10,000

            # These two fields can be used in conjunction with each other or separately to allow flexibility
            log_groups: [] # explicit list of log groups to filter on
            log_group_prefix: [] # list of log group prefixes that can be used to search for log groups

            # Optional stream filtering fields. These are mutually exclusive
            log_stream_names: [] # explicit list of log streams
            log_stream_name_prefix: <log_stream_name_prefix> # logs streams whose name has this prefix will be collected

The configuration has a logs sub config as cloudwatch also supports metrics. This leaves the config flexible to add metric collection later if desired. Cloudwatch metric scraping is currently feature request in #6226.

Also, just including a link on AWS Profiles.

Telemetry data types supported

logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

No response

@cpheps cpheps added the needs triage New item requiring triage label Sep 22, 2022
@djaglowski
Copy link
Member

I'll sponsor this.

@djaglowski djaglowski added the Accepted Component New component has been sponsored label Sep 23, 2022
@codeboten codeboten removed the needs triage New item requiring triage label Sep 23, 2022
@schmikei
Copy link
Contributor

schmikei commented Oct 4, 2022

For the config, I have some suggestions after looking deeper.

Cloudwatch appears to have hierarchy when it comes to log_groups and log_streams. Because of this, if you take a flat matching of log streams and apply it to all log groups you give the user less control.

Example Hierarchy image

Taking that EKS example further, if I had two dev clusters:

  • dev-eks-0
  • dev-eks-1

And I only care about getting the log streams prefixed with kube-api-server of only dev-eks-0. I would have no way of doing that with what is currently proposed right? Please correct me if I am missing something I believe this would be a use case that customers would run up against.

This leads me to believe the more correct solution would be to have the array of log group configs specify the log_stream_names in an array format as to better emulate the Cloudwatch current hierarchy.

A config proposal I think is more appropriate would be:

region: ""  # required
profile: ""  # optional to specify AWS creds
logs:
    poll_interval: 1m0s
    event_limit: 1000  # limit the number of events we process per poll
    log_group_limit: 50.  # limit the number of log groups we spin up goroutines for and limit autodiscovery
    log_group_prefix: ""  # if no log groups are specified, this can limit discovery of log groups to match a prefix
    log_groups: 
    - name: "" # name of the log group, required if not using autodiscover
       stream_prefix: ""  #  prefix to filter log streams of the log group to those that match
       log_streams: [""]  # manually specify the names of the streams desired to poll from 
       #### if both stream_prefix and log_streams are empty, collect all
   ### if both log_groups and log_group_prefix are empty, collect all groups and all subsequent streams until 
   ### limits are reached. 

For that example I think a config like this is preferable as it can properly contextualize which streams are coming from which groups.

receivers:
  awscloudwatch:
    region: us-east-1
    logs:
      log_groups:
      - name: /aws/eks/dev-eks-0/cluster
        log_streams: [kube-api-server]
      - name: /aws/eks/dev-eks-1/cluster
         # collect all streams

Just making sure that this is the preferred approach, do either of you have insight @djaglowski @cpheps ?

@cpheps
Copy link
Contributor Author

cpheps commented Oct 5, 2022

@schmikei I think your proposal makes sense. I think the only way to achieve the control you described with the proposed solution is to create multiple instances of the receiver which is not user friendly. I think it makes sense to change the config to feel more natural for an Cloudwatch user.

@djaglowski
Copy link
Member

I generally agree with organizing streams under groups. A couple related thoughts:

  1. It's all under the context of logs already, so can we drop log_ from the keys?
  2. When defining the groups or streams, the prefix and limit options are directly related to each other. Can these be a single struct?
  3. When limiting, is there an implicit ordering? If not, do we need to provide a way to order?
  4. Can a user specify both names and prefixes? Could we have groups and streams be slices that allow both names and prefixes to be specified? If not, should we have group_names, group_prefixes, stream_names, stream_prefixes?

With most of these suggestions applied:

  region: ""  # required
  profile: ""  # optional to specify AWS creds
  logs:
    poll_interval: 1m0s
    event_limit: 1000
    groups:
      - name: group1
        streams:
          - name: foo1
          - name: foo2
      - prefix: group_foo_*
        limit: 50
        order_by: name_desc
        streams:
          - name: stream1
          - prefix: stream_*
            limit: 40
            order_by: name_desc

@schmikei
Copy link
Contributor

schmikei commented Oct 5, 2022

1. It's all under the context of `logs` already, so can we drop `log_` from the keys?

Agreed probably could simplify things 👍

2. When defining the groups or streams, the `prefix` and `limit` options are directly related to each other. Can these be a single struct?

Probably a good idea! Thinking an embedded struct that decodes consistently is probably the preference?

3. When limiting, is there an implicit ordering? If not, do we need to provide a way to order?

AFAIK the API gives you a window of start time and end time and the results are sent back with oldest entries first with no control on how to order_by. Here is the struct we have to work with for how to request information. https://github.com/aws/aws-sdk-go/blob/2389894af92747d3ccd38b691ab977789ad489b1/service/cloudwatchlogs/api.go#L7039-L7096

As far as attaching limits, it appears we have 3 things we want to limit

  1. The number of log groups (I imagine this should go under the Logs section, should only be relevant for autodiscovery)

  2. The number of streams (per log group)

  3. The number of events per stream (this would get rid of the top level event_limit parameter as a side)

4. Can a user specify both names and prefixes? Could we have `groups` and `streams` be slices that allow both names and prefixes to be specified? If not, should we have `group_names`, `group_prefixes`, `stream_names`, `stream_prefixes`?

One thing is that prefixes are not regex based, it appears to be a fuzzy string match based off my experience. So we'll probably need to document that. As well as the API documents

// If you specify a value for both logStreamNamePrefix and logStreamNames, the
// action returns an InvalidParameterException error.

As well as we would be exponentially increasing the number of requests per string if possible I'd like to keep number of requests made per stream to the minimum since the API has a somewhat opinionated way of using prefixes to grab multiple strings. Just to say I am a tad hesitant if this is a good thing to allow users to do, but will look further. Let's be sure to readdress this in the upcoming code review to see if we can come up with something that makes sense.

Overall I think this config is probably cleaner so I'll start spiking out something closely coupled with what you suggest.

@lewis262626
Copy link
Member

Having a config property for start time might be useful? Sometimes you don't always want logs from the start of time and want it from say year 2020.

@djaglowski
Copy link
Member

Having a config property for start time might be useful? Sometimes you don't always want logs from the start of time and want it from say year 2020.

This is a good point.

@schmikei, correct me if I'm wrong, but I think the way this should work is that logs are collected for the window between each poll interval. Basically:

func Start() {
    // Initialize the end time from scratch
    endTime := time.Now() - pollInterval

    // Check if a checkpoint was previous stored 
    // (eg. the collector was down for some time and we need to catch up)
    if end, ok := storage.Get("last_checkpoint"); ok {
        endTime = end
    }
}

func poll() {
    startTime = endTime // from last poll interval
    endTime = time.Now()
}

To @lewis262626's point, it would be useful to have the ability to pull logs from earlier time windows. I would consider this an alternate mode of operation though, something that could be added later. An optional config section could allow the user to specify the historical time window that should be pulled, and possible some settings to control how aggressively the logs are pulled.

@schmikei
Copy link
Contributor

Yes I believe something like this could work something like this could work logically, however I have some reservations.

One thing I'm struggling with is how a parameter like this would look in the config? Would it be preferred to have some kind of start_time that would allow the user to specify a timestamp?

awscloudwatch:
  polling_interval: 5m
  region: us-west-1
  logs:
    start_time: "2022-10-07T16:50:52.664-0400"

This doesn't seem super user friendly, we could provide a time.Duration as an alternative. However that doesn't feel great either. So I'm not sure what kind of direction I should take at providing these levers. Perhaps I may be misunderstanding what you are suggesting @djaglowski, so could you perhaps give me an example config parameter of how a config would look on the first run?


@lewis262626

I also am looking into your suggestions here and I feel like we could expose the endpoint as a config parameter! Will be looking into adding this

@djaglowski
Copy link
Member

@schmikei, what I'm suggesting primarily is that this should be a future enhancement. I think the appropriate next step for that enhancement would be for someone to open an issue proposing a specific config format and any other relevant details about the capability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored
Projects
None yet
5 participants