Skip to content

Conversation

remeh
Copy link
Contributor

@remeh remeh commented Nov 26, 2020

What does this PR do?

This PR adds support for Logs collection in the Datadog Agent Extension.

Log collection

In order to collect the logs from the environment, the extension is adding a route /lambda/logs into the extension HTTP server (running on port 8124) on which it is receiving logs from the AWS environment, these logs are the logs of the function, of the extensions and of the platform.

The pkg/serverless/aws package has been introduced with methods helping to get the ARN of the running function + the parsing of the log message received on the HTTP server.

Logs aggregation / processing

An instance of the Logs Agent is started in the extension if the DD_LOGS_ENABLED flag is set. This Logs Agent use the regular pipeline except that:

  • it is using a newly introduced NullAuditor (not writing any registry).
  • a new input source has been introduced capable of receiving logs message from AWS in a channel (see pkg/logs/input/channel). Note that it is specialized for AWS logs message but could be more generic with additional code if needed.
  • The JSON encoder has been modified to support providing a timestamp to messages, and, using the build tag serverless, it is not appending any hostname to the logs.

Synchronous flush of the pipeline

The Extension needs to do a synchronous flush of the buffered data on a signal. Today, it is used to naively flush at the end of the execution of the function, however, we already work on adding a smarter flush mechanism.

In order to do so, I had to adapt different parts of the pipeline:

  • Add Pipeline.Flush, triggering a flush of the Processor and of the Sender.
  • Add Processor.Flush, sending buffered message to the processor / encoder
  • Add Sender.Flush, sending the buffered data to the BatchStrategy sender
  • Add BatchStrategy.Flush to synchronously send the HTTP request to the intake to flush the data.

A mutex has been introduced and the select were all modified to support a signal for the flush. This may have some performances impact for which I want to add benchmarks.

Miscellaneous changes

  • Add message.NewMessageWithTime to send in the pipeline a log with a known/given timestamp
  • Auditor has been renamed RegistryAuditor and an interface Auditor has been introduced instead (to create the NullAuditor)
  • Some of the message contains useful information that we turn into metrics and directly into the aggregator: see here.

remeh added 30 commits October 12, 2020 15:11
…s to the logs agent

Introducing the `logs/input/channel` implementation supporting receiving logs through
a Go channel.
…ssages.

This has been introduced to not change the original pipeline process.
The original `Auditor` has been renamed `RegistryAuditor`.
`Auditor` is now an interface for both `RegistryAuditor` and `NullAuditor`.
This commit is also adding the reading of the function ARN from the invoke event.
…sing the DogStatsD server.

Currently supported enhanced metrics: billed_duration, duration, init_duration, max_memory_used.
This commit also adds the parsing of the time in the AWS log.
…d ARN support.

This way, we can use the timestamp from the AWS LogMesage while sending the logs
to the intake. This is what this commit is doing.
@remeh remeh requested a review from a team as a code owner November 26, 2020 15:24
…rics and logs.

For feature-parity with the Datadog Forwarder, we want to use this
configuration entry / environment variable to append tags to every
metric and log sent by the extension.

// Loop terminates when the channel is closed.
for logline := range t.inputChan {
origin := message.NewOrigin(t.source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also update t.source.BytesRead to be consistent with the other tailers.

@remeh remeh modified the milestones: 7.25.0, 7.26.0 Dec 9, 2020
Copy link
Contributor

@prognant prognant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me and fits pretty well within the existing Logs Agent code.

Instead of spawning a new HTTP server (using yet another tcp port), we are re-using
the already spawned HTTP server which is already used to communicate with the libraries.

It also means there is no more configuration field to change the HTTP. This may have
to be addressed if 8124 is not available for some reasons.
@remeh remeh force-pushed the remeh/serverless-logs branch from e1ac48c to 7f24782 Compare January 7, 2021 09:21
@remeh remeh requested a review from a team as a code owner January 7, 2021 12:57
Copy link
Contributor

@prognant prognant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the logs agent part it looks good to me. I think it will works pretty well.

Comment on lines +12 to +15
// buildCommonFormat returns the log common format seelog string
func buildCommonFormat(loggerName LoggerName) string {
return fmt.Sprintf("%%Date(%s) | %s | %%LEVEL | %%Msg%%n", getLogDateFormat(), loggerName)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious, why do we remove contextual information from agent log when running in the serverless context ?
I guess here it's not a problem not to check for JMXFetch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are mimicking the logs that users are used to have while running functions in AWS Lambda (through our clients libraries for instance).

Comment on lines -56 to -58
if loggerName == "JMXFETCH" {
return `%Msg%n`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be included in pkg/config/log_format.go, for both buildCommonFormat and buildJSONFormat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice catch, this must be a merge error or merge artifact.

@KSerrania KSerrania removed the request for review from a team January 11, 2021 10:51
Copy link
Contributor

@prognant prognant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a quick question about tagging, will be there any tag to specifically isolate one lambda execution stream ?

@remeh
Copy link
Contributor Author

remeh commented Jan 12, 2021

@prognant thanks for the reviews... 🙇

will be there any tag to specifically isolate one lambda execution stream ?

You mean to identify that data sent by the Agent has been sent through a Lambda using the Extension? The functionname tag with the function name as the value is set on logs and on the enhanced metrics, but for now, not much more to differentiate logs emitted from a regular agent from logs emitted from the extension. Is this what you were wondering?

@prognant
Copy link
Contributor

@prognant thanks for the reviews... 🙇

will be there any tag to specifically isolate one lambda execution stream ?

You mean to identify that data sent by the Agent has been sent through a Lambda using the Extension? The functionname tag with the function name as the value is set on logs and on the enhanced metrics, but for now, not much more to differentiate logs emitted from a regular agent from logs emitted from the extension. Is this what you were wondering?

I was thinking about isolating/filtering logs (could be metrics) emitted by one single execution of a given lambda function. I.e. if a lambda function has a huge number of concurrent invocation, having a tag (like a unique uuid for each lambda execution) per lambda execution would be useful, WDYT ?

@remeh
Copy link
Contributor Author

remeh commented Jan 12, 2021

I was thinking about isolating/filtering logs (could be metrics) emitted by one single execution of a given lambda function. I.e. if a lambda function has a huge number of concurrent invocation, having a tag (like a unique uuid for each lambda execution) per lambda execution would be useful, WDYT ?

Oh sorry, I've read your question too fast! We already support that with the request ID which is set here. This request ID is then sent in the JSON and it is available in the Datadog app, for instance in the log explorer, or to filter by a given request id, etc. 👍

@remeh remeh merged commit 519c2cc into master Jan 12, 2021
@remeh remeh deleted the remeh/serverless-logs branch January 12, 2021 11:17
sergei-deliveroo pushed a commit to deliveroo/datadog-agent that referenced this pull request Mar 24, 2021
… Extension (DataDog#6861)

* Add the Serverless Datadog Agent implementation.

See https://docs.datadoghq.com/serverless/datadog_lambda_library/extension/

* Fix 3rd party license list since new pkg are imported from the AWS SDK.

* forwarder: adapt the SyncDefaultForwarder to recent changes in the Forwarder interface.

* aggregator: fix unit tests with Flush now being a public function.

* serverless: linter compliance.

* serverless: errcheck compliance.

* serverless: unused code linter.

* serverless: linter compliance.

* serverless: add http server listening for server logs and sending logs to the logs agent

Introducing the `logs/input/channel` implementation supporting receiving logs through
a Go channel.

* serverless/logs: introduce a `NullAuditor` not doing anything with messages.

This has been introduced to not change the original pipeline process.
The original `Auditor` has been renamed `RegistryAuditor`.
`Auditor` is now an interface for both `RegistryAuditor` and `NullAuditor`.

* serverless: remove debug messages

* serverless/logs: add a sync `Flush()` method to the logs pipeline.

This commit is also adding the reading of the function ARN from the invoke event.

* serverless/logs: remove unwanted debug lines.

* serverless/logs: read REPORT messages and send AWS enhanced metrics using the DogStatsD server.

Currently supported enhanced metrics: billed_duration, duration, init_duration, max_memory_used.

* serverless/logs: enhanced metrics are distribution.

* serverless/logs: comment on the sync flush of logs then statsd.

* serverless/logs: configurable logs type we're subscribing to.

This commit also adds the parsing of the time in the AWS log.

* serverless/arn: remove version from the ARN string if any.

* serverless/logs: create aws package containing both AWS LogMessage and ARN support.

This way, we can use the timestamp from the AWS LogMesage while sending the logs
to the intake. This is what this commit is doing.

* serverless/logs: parse time up to the millisecond.

* serverless/logs: send the enhanced metrics with the proper timestamp.

* serverless/logs: comments and send the report log to the intake.

* serverless/logs: send aws.lambda.enhanced.memorysize

* serverless/logs: send formated platform logs.

* serverless/logs: cleaning comments here and there.

* config: fix log conflict while adding the serverless format.

* serverless/logs: linter compliance.

* cmd/serverless: start -> run

* dogstatsd: add an unit test in serverless mode.

* tasks: remove serverless test tag

* forwarder: rename `SyncDefaultForwarder` to `SyncForwarder` since it's not a default forwarder.

* [cmd/serverless] change default port for the HTTP server collecting logs

* serverless: support using DD_TAGS to add extra tags while sending metrics and logs.

For feature-parity with the Datadog Forwarder, we want to use this
configuration entry / environment variable to append tags to every
metric and log sent by the extension.

* serverless/logs: send the logs with the ARN and RequestId information.

* serverless: linter compliance

* serverless/logs: use the function name if available as the "service".

* serverless: hit the correct domain using the `AWS_LAMBDA_RUNTIME_API` env var.

* serverless: re-use the daemon http server to receive the logs from AWS.

Instead of spawning a new HTTP server (using yet another tcp port), we are re-using
the already spawned HTTP server which is already used to communicate with the libraries.

It also means there is no more configuration field to change the HTTP. This may have
to be addressed if 8124 is not available for some reasons.

* dogstatsd: fix a unit test.

* dogstatsd: fix a unit test.

* tasks/dogstatsd: slighlty increase the maximum valid size for the dogstatsd binary.

* misc: use the correct log-format for non-serverless logs.

This must be a merge error/artifact.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog [deprecated] team/agent-core Deprecated. Use metrics-logs / shared-components labels instead..
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants