Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add S3 input to retrieve logs from AWS S3 buckets #12640

Merged
merged 59 commits into from
Aug 2, 2019
Merged

Add S3 input to retrieve logs from AWS S3 buckets #12640

merged 59 commits into from
Aug 2, 2019

Conversation

kaiyan-sheng
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng commented Jun 21, 2019

Different logs from different services can be stored in S3. For example:

  • S3 server access logs: provides detailed records for the requests that are made to a bucket.
  • VPC flow logs: records for all of the monitored network interfaces are published to a series of log file objects that are stored in the bucket.
  • ELB access logs: capture detailed information about requests sent to the load balancer. Each log contains information such as the time the request was received, the client's IP address, latencies, request paths, and server responses.
  • Cloudwatch logs: users can choose to export all data from an amazon cloudwatch log group to a specific s3 bucket.

With all the different logs in S3 from different services, it will be good to have a dedicated Filebeat input to retrieve raw lines from S3 objects. To avoid significant lagging with a polling-based s3 only input, we agree on a combination of notification-based and polling-based approach: s3-sqs filebeat input. This requires extra setups in AWS S3 and SQS to add a notification configuration requesting S3 to publish specific type of events to SQS queue.

Right now with this PR, when s3 input is enabled, you can start seeing events in ES for log messages that are retrieved from S3 buckets:

{
  "_index": "filebeat-8.0.0-2019.07.12-000001",
  "_type": "_doc",
  "_id": "efb4287666",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2019-07-12T19:50:30.106Z",
    "aws": {
      "s3": {
        "bucket": {
          "name": "test-s3-ks-2",
          "arn": "arn:aws:s3:::test-s3-ks-2"
        },
        "object.key": "test-log-12.txt"
      }
    },
    "cloud": {
      "region": "ap-southeast-1",
      "provider": "aws"
    },
    "input": {
      "type": "s3"
    },
    "ecs": {
      "version": "1.0.1"
    },
    "host": {
      "architecture": "x86_64",
      "name": "KaiyanMacBookPro",
      "os": {
        "kernel": "17.7.0",
        "build": "17G7024",
        "platform": "darwin",
        "version": "10.13.6",
        "family": "darwin",
        "name": "Mac OS X"
      },
      "id": "9C7FAB7B-29D1-5926-8E84-158A9CA3E25D",
      "hostname": "KaiyanMacBookPro"
    },
    "agent": {
      "hostname": "KaiyanMacBookPro",
      "id": "7578d49c-6588-4843-85cc-ad3859f99ed1",
      "version": "8.0.0",
      "type": "filebeat",
      "ephemeral_id": "6b32568f-d728-49b4-b5bb-db0157fd3102"
    },
    "message": "test12\n",
    "log": {
      "offset": 7,
      "file.path": "https://test-s3-ks-2.s3-ap-southeast-1.amazonaws.com/test-log-12.txt"
    }
  },
  "fields": {
    "@timestamp": [
      "2019-07-12T19:50:30.106Z"
    ],
    "suricata.eve.timestamp": [
      "2019-07-12T19:50:30.106Z"
    ]
  },
  "sort": [
    1562961030106
  ]
}

For configuration:

filebeat.inputs:
- type: s3
  queue_url: https://sqs.ap-southeast-1.amazonaws.com/627959692251/test-s3-notification

Ideally with this config, s3 input will go to the specified queueURLs to get messages and read them. If the message has eventSource == aws:s3 && eventName == ObjectCreated:Put and bucket.name is included by bucketNames from the config, then read the S3 object that's specified in this SQS message.

@kaiyan-sheng kaiyan-sheng self-assigned this Jun 21, 2019
@kaiyan-sheng kaiyan-sheng added Team:Integrations Label for the Integrations team Filebeat Filebeat [zube]: In Progress labels Jun 21, 2019
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this going on! I left a few comments & questions

x-pack/filebeat/input/s3/_meta/fields.yml Show resolved Hide resolved
x-pack/filebeat/input/s3/_meta/fields.yml Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/config.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
@kaiyan-sheng
Copy link
Contributor Author

Question: When sqs message points to a s3 object that is not readable or does not exist anymore, we get error message s3/input.go:220 s3 get object request failed: NoSuchKey: The specified key does not exist..
Do we still report an event with this error message or skip it when error occurs?

@exekias
Copy link
Contributor

exekias commented Jun 26, 2019

Question: When sqs message points to a s3 object that is not readable or does not exist anymore, we get error message s3/input.go:220 s3 get object request failed: NoSuchKey: The specified key does not exist..
Do we still report an event with this error message or skip it when error occurs?

This sounds like the kind of error I would like to be notified about, probably something is not configured well if this happens?

@kaiyan-sheng
Copy link
Contributor Author

Question: When sqs message points to a s3 object that is not readable or does not exist anymore, we get error message s3/input.go:220 s3 get object request failed: NoSuchKey: The specified key does not exist..
Do we still report an event with this error message or skip it when error occurs?

This sounds like the kind of error I would like to be notified about, probably something is not configured well if this happens?

I saw it happen when:

  1. I didn't delete the old sqs messages but did delete the old objects from S3. So sqs message is pointing the s3 filebeat input to go read an object/file that doesn't exist anymore. This should not happen(hopefully) in reality because I was just doing cleaning manually.
  2. This also happened when I manually uploaded a .png file to s3 bucket. Then sqs got an message saying there is a new object created in s3. But when filebeat goes to read that object, it's not readable because it's a .png.

Maybe I should have a WARN log message when this happens and report an event with empty message field?

@exekias
Copy link
Contributor

exekias commented Jun 26, 2019

It sounds like this should be at ERROR level, as it's really something the user should look into. As we don't have any message retrieved I don't think we need to send anything to the output.

@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Jun 26, 2019

It sounds like this should be at ERROR level, as it's really something the user should look into. As we don't have any message retrieved I don't think we need to send anything to the output.

Sounds good to me! Thanks! This is what I currently have so I will keep the same behavior :-)

@kaiyan-sheng kaiyan-sheng marked this pull request as ready for review June 26, 2019 22:11
@kaiyan-sheng kaiyan-sheng requested a review from a team as a code owner June 26, 2019 22:11
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
@Conky5
Copy link

Conky5 commented Jul 8, 2019

jenkins test this please

@kaiyan-sheng
Copy link
Contributor Author

Jenkins, test this please

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited to see this feature. I left a few minor issues and some questions.

x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/config.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Outdated Show resolved Hide resolved
@kaiyan-sheng kaiyan-sheng requested a review from a team as a code owner August 2, 2019 13:29
@kaiyan-sheng kaiyan-sheng merged commit e00271b into elastic:master Aug 2, 2019
@kaiyan-sheng kaiyan-sheng deleted the s3_sqs_input branch August 2, 2019 18:02
@kaiyan-sheng kaiyan-sheng mentioned this pull request Aug 7, 2019
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat review Team:Integrations Label for the Integrations team v7.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants