Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing error metrics on datadog_logs sink #18296

Closed
bosouza opened this issue Aug 17, 2023 · 3 comments
Closed

Missing error metrics on datadog_logs sink #18296

bosouza opened this issue Aug 17, 2023 · 3 comments
Labels
domain: external docs Anything related to Vector's external, public documentation sink: datadog_logs Anything `datadog_logs` sink related type: bug A code related bug.

Comments

@bosouza
Copy link

bosouza commented Aug 17, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I was trying to use Vector sink metrics to determine a "sink health", but when looking at the metrics emitted by the datadog_logs sink when configured with incorrect credentials I didn't find component_errors_total or component_discarded_events_total.

Some other metrics for this sink are working as expected though: component_received_events_total increases while component_sent_events_total is not emitted at all (as none of the logs are actually being sent to Datadog).

There are also some metrics which are not documented in the telemetry section for the sink: http_client_requests_sent_total and http_client_responses_total.

Please run the config provided in this report to reproduce the behavior I'm seeing.

Configuration

{
    "sources": {
        "demo-logs": {
            "type": "demo_logs",
            "format": "json"
        },
        "internal-metrics": {
            "type": "internal_metrics"
        }
    },
    "transforms": {
        "internal-metrics-as-logs": {
            "type": "metric_to_log",
            "inputs": ["internal-metrics"]
        }
    },
    "sinks": {
        "datadog-logs": {
            "default_api_key": "test-api-key",
            "inputs": ["demo-logs"],
            "site": "datadoghq.com",
            "type": "datadog_logs"
          },
        "sink-metrics": {
            "type": "file",
            "encoding": {"codec": "json"},
            "path": "./datadog-sink-metrics.log",
            "inputs": ["internal-metrics-as-logs"]
        }
    }
}

Version

vector 0.32.0 (x86_64-unknown-linux-gnu 1b403e1 2023-08-15 14:56:36.089460954)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@bosouza bosouza added the type: bug A code related bug. label Aug 17, 2023
@pront pront added sink: datadog_logs Anything `datadog_logs` sink related domain: external docs Anything related to Vector's external, public documentation labels Aug 17, 2023
@pront
Copy link
Contributor

pront commented Aug 17, 2023

Hi @bosouza, thanks for raising this issue.

Regarding component_errors_total or component_discarded_events_total:

  • Requests are retried (not dropped completely) and by default we will retry indefinitely. Assuming you didn't change this.
  • If you passed --require-heahtly these metrics won't be emitted from the sink because the healthchecks failed.

Regarding http_client_responses_total, this is documented here.

Regarding http_client_requests_sent_total, this is not documented indeed. It is emitted by our internal HTTP client.

@bosouza
Copy link
Author

bosouza commented Aug 17, 2023

Requests are retried (not dropped completely) and by default we will retry indefinitely. Assuming you didn't change this.

I see, by explicitly setting a low number of retries I can now see the missing metrics reporting errors and discarded events. This is a bit different from other sinks tho, like elasticsearch, cloudwatch and grafana that seem to recognize the non-retriable 403 error and emit the component_errors_total and component_discarded_events_total.

Overall seems like I can indeed rely on component_errors_total to track the sink health, as long as retry_attempts is configured with a lower value.

@jszwedko
Copy link
Member

I see, by explicitly setting a low number of retries I can now see the missing metrics reporting errors and discarded events. This is a bit different from other sinks tho, like elasticsearch, cloudwatch and grafana that seem to recognize the non-retriable 403 error and emit the component_errors_total and component_discarded_events_total.

Yeah, this inconsistency is source of a good amount of confusion. #10870 is intended to improve that situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation sink: datadog_logs Anything `datadog_logs` sink related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants