Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Log-based metrics processor #18269

Closed
2 tasks
weyert opened this issue Feb 3, 2023 · 19 comments
Closed
2 tasks

New component: Log-based metrics processor #18269

weyert opened this issue Feb 3, 2023 · 19 comments
Labels

Comments

@weyert
Copy link

weyert commented Feb 3, 2023

The purpose and use-cases of the new component

Log-based Metrics (logmetrics) analyses the received log records, and generates metrics from them.

Example configuration for the component

processors:
   logmetrics:
         pg_permissions_errors:
               type: counter
               filter:
                  -   from_attribute: textPayload
                       match: permission denied for table (?P<tableName>.*)
                       action: 
                             - type: add_attribute
                                name: db.table.name
                                value: $tableName

Telemetry data types supported

This processor would accept logs and create metrics

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

The idea behind this proposed processor is to allow generating metrics from logs that have been sent to the collector using the new logs functionality. For example, in Google Cloud you have the ability to generate metrics based on the logs. I would love to have a similar solution for the collector.

A potential use case would be the ability to send all the logs to dedicated logs-collector which passthrough the logs to the appropriate logs backend (e.g. Google Cloud Logging) but at the same time generate log-bas metrics which then via the Prometheus Remote Write exporter get send to Prometheus.

@weyert weyert added the needs triage New item requiring triage label Feb 3, 2023
@atoulme atoulme added Sponsor Needed New component seeking sponsor and removed needs triage New item requiring triage labels Mar 7, 2023
@atoulme
Copy link
Contributor

atoulme commented Mar 7, 2023

You might want to look at connectors, as a way to achieve this effect.

@atoulme
Copy link
Contributor

atoulme commented Mar 21, 2023

Closing as connectors perform this work now, please take a look there. Please comment or reopen for clarifications or if I missed something.

@atoulme atoulme closed this as completed Mar 21, 2023
@cwegener
Copy link
Contributor

Only the spanmetricsconnector exists at the moment.
Has a new logmetricsconnector been discussed at all before?

@atoulme
Copy link
Contributor

atoulme commented Jan 18, 2024

depends what you're looking for, countconnector has log use cases.

@cwegener
Copy link
Contributor

depends what you're looking for, countconnector has log use cases.

Yes, the counting of logs by attributes is covered in that connector and is the most widely applicable use case.

One additional feature that the spanmetricsconnector does is calculate duration.

To compare this to the logs signal, take the example of web servers and application servers (Websphere, Tomcat, HTTPd, IIS, etc.)

Such server logs oftentimes have duration measurements included in the log body as well. And those logs will also often have additional attributes associated with those duration measurements that otherwise would be unavailable from other sources. Examples of such attributes would be "client IP", "user name".

The metrics generated from the same web servers and application servers via a respective otelcol receive would typically not have these attributes included.

So, very similar to the spanmetrics, logmetrics would also provide the opportunity to create (request) duration histograms with these useful attribute labels given above, although span attributes are probably unlikely to include "client ip" and "user name". So, these examples would be exclusive to log attributes.

@cwegener
Copy link
Contributor

Hmm .. so, I just found that there is an even older issue for this exact topic here: #13530

And in the discussion in that other issue nobody ever create a 'New component' issue.

But it looks like this issue is actually the 'New Component' issue. Looks like maybe some wires got crossed.

@djaglowski
Copy link
Member

I'm reopening this to represent the formal "New Component" issue requested on #13530. Please continue conversation about this proposal here.

@djaglowski djaglowski reopened this Jan 18, 2024
@verejoel
Copy link

Hi folks, we're currently interested in building a logs -> metrics connector. We have a few specific use cases where we want to first parse values from logs, and then generate completely new metrics. Some examples include:

  • extract the request duration from HTTP request logs and build a histogram
  • extract the request size in bytes from an HTTP request log and increment a bytes_total counter
  • build CPU/Memory gauges from legacy systems that report these data as log lines

I'm currently working on the design for our specific use-cases, but I can see a need for a generic connector that can generate metrics from any telemetry signal.

I think a good approach would be to require that parsing and filtering of telemetry should be handled by dedicated processors (i.e. the transform and filter processors). Therefore, the connector will only build metrics based on attributes and resource attributes present in the telemetry payload. This will reduce the scope of the connector to manipulating attributes, and emitting the configured metrics, and not to be concerned with parsing or filtering of telemetry.

In our specific use case, we would like aggregated metrics to be flushed periodically through to a prometheusremotewrite endpoint, which then ships metrics into Thanos in our particular setup. However, I think a more useful approach would be to have the connector emit delta metrics like the count connector, so that it is inherently stateless, and then rely on the introduction of the accepted deltatocumulative and metricaggregation processors (#29300 and #29461) to convert the metrics into a Prometheus compatible format. In this way, the combination of this connector and those two processors should meet a wide range of potential use cases.

@weyert
Copy link
Author

weyert commented Jan 19, 2024

I would be interesting in a counter for matching strings of a log record. E.g. to count the number of errors in the postgres log file etc. So it can trigger alert manager alerts.

@atoulme
Copy link
Contributor

atoulme commented Jan 19, 2024

You could have a log pipeline where you filter the logs down to what you want to alert on, and then use the countconnector.

@cwegener
Copy link
Contributor

* extract the request duration from HTTP request logs and build a histogram

* extract the request size in bytes from an HTTP request log and increment a `bytes_total` counter

* build CPU/Memory gauges from legacy systems that report these data as log lines

Those describe the use cases that I am after as well. And I think the first two will have broad appeal. And variations of the third use case will occur quite often in my world as well.

@djaglowski
Copy link
Member

djaglowski commented Jan 19, 2024

You could have a log pipeline where you filter the logs down to what you want to alert on, and then use the countconnector.

I don't think the count connector can handle many of the use cases suggested here. It really is only useful for counting the number of instances of telemetry items that match some criteria. What I'm understanding from the mentioned use cases is that we need the ability to aggregate values within the telemetry.

Edit: I see that likely the suggestion was towards this comment:

I would be interesting in a counter for matching strings of a log record. E.g. to count the number of errors in the postgres log file etc. So it can trigger alert manager alerts.

Count connector should be able to support this today, and you don't need to pre-filter the data, just specify matching criteria for the count metric you want to generate.

@djaglowski
Copy link
Member

I think a good approach would be to require that parsing and filtering of telemetry should be handled by dedicated processors (i.e. the transform and filter processors).

This probably goes a little further than necessary and may actually complicate the problem. Specifically, filtering of telemetry should not be necessary ahead of time, since any criteria which would be used to filter can also be used to select the appropriate telemetry. This can be done with OTTL in the same way as count connector.

OTTL can also help us with accessing fields, and I don't think we necessarily need to place constraints on where the field is found. Instead, we should only constrain the type of value we expect to find at any field accessible by OTTL. So for example I may have a numeric field in the body of my logs and just want to refer to it using OTTL.

@manojksardana
Copy link

also interested in this. specially when there is no official support for events in open telemetry. we get events from various sources which reports sales, orders etc. This events are now getting ingested as log entries and there is a need to create metrics like total sales or orders over a period of time. Such a connector will help achieving the goal.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants