Add isolating log record processor #4062

pellared · 2024-05-28T07:15:26Z

Why

Fixes #4010 (read the issue for more details)

The SDK SHOULD offer a way to make independent log processing. I find it as a missing feature that should be covered by by the specification.

What

This PR tries to achieve it in the following way:

SDK SHOULD provide an isolating processor allowing independent log processing pipelines. Source: Logs: Allow the registered processor to be an independent pipeline #4010 (comment)

This approach should be solving the issue at least for Java, .NET, Python, JavaScript, Go. This proposal should not make any existing log SDKs not compliant with the specification.

I think that the same approach could be also adopted in tracing for span processors (but I have not yet investigated it in depth).

Remarks

I created a separate issue for specifying whether (by default) a change to a record made in a processor is propagated to next registered processor. See #4065

pellared · 2024-05-28T15:24:34Z

SIG meeting notes.
TODOs based on quick review during the meeting:

Ensure that there is no conflict with Add ability for SpanProcessor to mutate spans on end #4024
Expand the description (e.g. describe the reasons, alternatives) to make reviewing easier.

yurishkuro

High level feedback:

(1) I think so far the specification for the SDKs was far more opinionated than this PR, by describing not only capabilities but the desired architecture, i.e. standard components like processor, exporter, and standard ways to wire them together.

I think "multiple pipelines" is a significant capability change that requires a description of the architecture. E.g. it may require bifurcating the processing chain, so how would that be achieved? I don't think this is enough to just say "allow making a copy of log record".

(2) Whose responsibility should it be to decide on parallel pipelines? Should it be happening just because of the implementation of specific components? Or should it be user-driven decision by setting up the parallel pipelines accordingly?

(3) Does this capability only make sense to logs, or to other signals as well? Why or why not? I think we should be consistent.

pellared · 2024-05-28T15:41:02Z

Regarding the relation to #4024

I would say that #4024 is a "retrofit" as the original OnEnd does not allow mutation of the span when it occurs that it is needed in some cases. The OnEmit already allows mutation of the log records. I do not see how this PR is in conflict with #4024.

CC @jmacd

specification/logs/sdk.md

pellared · 2024-05-28T16:52:03Z

High level feedback:

👍

(1) I think so far the specification for the SDKs was far more opinionated than this PR, by describing not only capabilities but the desired architecture, i.e. standard components like processor, exporter, and standard ways to wire them together.

I think "multiple pipelines" is a significant capability change that requires a description of the architecture. E.g. it may require bifurcating the processing chain, so how would that be achieved? I don't think this is enough to just say "allow making a copy of log record".

The biggest problem here is that almost stable Logs SDKs have already different architectures (and different ways of processing logs). I am not sure if e.g. Java, JavaScript .NET would like to say e.g. "before making a change in a processor, make a copy so that other processors are not affected". E.g. in Go we have a comment like this: "Before modifying a Record, the implementation must use Record.Clone to create a copy that shares no state with the original".

(2) Whose responsibility should it be to decide on parallel pipelines? Should it be happening just because of the implementation of specific components? Or should it be user-driven decision by setting up the parallel pipelines accordingly?

I am not sure if I follow. I think it can be both of "because of the implementation of specific components" and "user-driven decision by setting up the parallel pipelines" at the same time 😉

(3) Does this capability only make sense to logs, or to other signals as well? Why or why not? I think we should be consistent.

I think it makes sense for all signals. I will look into this.

At last, I think that this PR needs more feedback from other SDK maintainers.

jmacd

@pellared and @JonasKunz I'm trying to bring together your two proposals. It seems after reading this and #4024, there is a desire for OTel SDKs to support automatic "fanout" processors. I think I would rather see us specify that SDKs should support an idiomatic fanout processor which hides details related to avoiding shared-mutable references, copy-on-write protections, and so on.

specification/logs/sdk.md

CodeBlanch · 2024-05-29T22:26:27Z

My thoughts and perspective from .NET...

Originally we had an immutable structure LogRecord. Users asked for it to become mutable so we did that. .NET has a chained processor pipeline so all processors are called in the order added and all see the same mutable LogRecord.

No one that I can recall has complained about that. But one request which has come up a bunch is the ability to filter.

ProcessorA(Export records matching ABC) -> ProcessorB(Export all records)

There is no good way to accomplish that currently in .NET.

I always envisioned some kind of fork processor might help with that. Thinking out loud and combining it with this idea to copy the LogRecord it could look like:

LogRecordA emitted
  -> ForkProcessor(for log records matching ABC, LogRecordB = LogRecordA.Copy()) -> Export(LogRecordB)
  -> Processor(add some data to LogRecordA)
  -> ForkProcessor(for all log records, LogRecordC = LogRecordA.Copy()) -> Processor(redact LogRecordC) -> Export(LogRecordC)
  -> Processor(add more data to LogRecordA)
  -> Export(LogRecordA)

PS: When I say "Copy" we do have a pool so it isn't necessarily an allocation.

pellared · 2024-05-31T07:35:40Z

@CodeBlanch

When I say "Copy" we do have a pool so it isn't necessarily an allocation.

I just want to share that making pools for log records may be "dangerous" as the user-defines processor/exporter may want to try to use the log records after OnEmit has returned (e.g. asynchronous processing, batching etc). I did my best to describe it here: https://github.com/open-telemetry/opentelemetry-go/blob/main/log/DESIGN.md#record-attributes-as-slice

pellared · 2024-06-04T16:51:44Z

I just want to add that even for implementation that does not "fully" share the log record between processors (see: #4067) this processor may still be handy. E.g. currently in Go most of the log record data is not shared. But there is an exception: log attributes. This is done for performance optimization to reduce the heap allocations (allows zero-heap allocation).

github-actions · 2024-06-13T03:17:20Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2024-06-26T03:17:23Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

carlosalberto · 2024-06-27T13:58:48Z

This has been open for a little while, and albeit many people are on holidays, we are safe as this is a Development addition. Merging.

Follows #4062 ## Why 1. There is a need to make a deep copy of a `ReadWriteRecord` in order to implement an experimental isolating processor outside of the SDK (as it is experimental, we prefer to land it to contrib repository first). 2. Allow fine-grained control for the users to make a deep copy only when necessary. This allows users to have more control on the processing e.g. pass a clone to asynchronous processor to avoid race conditions or make performance improvements.

## Why Fixes open-telemetry#4010 (read the issue for more details) The SDK SHOULD offer a way to make independent log processing. I find it as a missing feature that should be covered by by the specification. ## What This PR tries to achieve it in the following way: - SDK SHOULD provide an isolating processor allowing independent log processing pipelines. Source: open-telemetry#4010 (comment) This approach should be solving the issue at least for Java, .NET, Python, JavaScript, Go. This proposal should not make any existing log SDKs not compliant with the specification. I think that the same approach could be also adopted in tracing for span processors (but I have not yet investigated it in depth). ## Remarks I created a separate issue for specifying whether (by default) a change to a record made in a processor is propagated to next registered processor. See open-telemetry#4065

Follows open-telemetry#4062 ## Why 1. There is a need to make a deep copy of a `ReadWriteRecord` in order to implement an experimental isolating processor outside of the SDK (as it is experimental, we prefer to land it to contrib repository first). 2. Allow fine-grained control for the users to make a deep copy only when necessary. This allows users to have more control on the processing e.g. pass a clone to asynchronous processor to avoid race conditions or make performance improvements.

pellared added 7 commits May 28, 2024 09:00

Allow independent log processing pipelines

3cc32ab

Clarify

e405c6d

Refactor

91f47dc

Rephrase

ee539c5

Add changelog

05e6061

Format

721f185

Remove junk

a6ee1e5

pellared marked this pull request as ready for review May 28, 2024 07:49

pellared requested review from a team May 28, 2024 07:49

pellared added the spec:logs Related to the specification/logs directory label May 28, 2024

github-actions bot assigned jmacd May 28, 2024

pellared mentioned this pull request May 28, 2024

Logs: Allow the registered processor to be an independent pipeline #4010

Closed

jmacd mentioned this pull request May 28, 2024

Add ability for SpanProcessor to mutate spans on end #4024

Merged

5 tasks

reyang previously approved these changes May 28, 2024

View reviewed changes

yurishkuro reviewed May 28, 2024

View reviewed changes

jmacd reviewed May 28, 2024

View reviewed changes

specification/logs/sdk.md Outdated Show resolved Hide resolved

specification/logs/sdk.md Outdated Show resolved Hide resolved

specification/logs/sdk.md Outdated Show resolved Hide resolved

pellared requested a review from jmacd May 28, 2024 16:28

Merge branch 'main' into fix-4010

bab1025

jmacd reviewed May 29, 2024

View reviewed changes

specification/logs/sdk.md Outdated Show resolved Hide resolved

specification/logs/sdk.md Outdated Show resolved Hide resolved

specification/logs/sdk.md Outdated Show resolved Hide resolved

pellared changed the title ~~Allow independent log processing pipelines~~ [WIP] Allow independent log processing pipelines May 31, 2024

pellared added 3 commits May 31, 2024 09:50

Update sdk.md

f3ced5c

Update sdk.md

ddeddbe

Update sdk.md

d086e7c

pellared requested review from yurishkuro and reyang June 4, 2024 14:42

Format

dfd03e9

Update CHANGELOG.md

1b911cc

lmolkova approved these changes Jun 5, 2024

View reviewed changes

pellared added 3 commits June 5, 2024 07:50

Merge branch 'main' into fix-4010

320bf6e

experimental to development

6bacc83

Fix typo

d3954d5

github-actions bot added the Stale label Jun 13, 2024

Merge branch 'main' into fix-4010

e9925dd

pellared removed the Stale label Jun 18, 2024

MSNev approved these changes Jun 18, 2024

View reviewed changes

github-actions bot added the Stale label Jun 26, 2024

pellared removed the Stale label Jun 26, 2024

pellared mentioned this pull request Jun 27, 2024

Log record mutations are visible in next registered processors #4067

Merged

Merge branch 'main' into fix-4010

e95c895

carlosalberto merged commit 3266af0 into open-telemetry:main Jun 27, 2024
6 checks passed

pellared deleted the fix-4010 branch June 27, 2024 14:07

This was referenced Jun 28, 2024

Add isolating log record processor open-telemetry/opentelemetry-go-contrib#5823

Open

Allow cloning of ReadWriteLogRecord #4090

Merged

carlosalberto mentioned this pull request Jul 9, 2024

Release 1.35.0 #4136

Merged

pellared mentioned this pull request Jul 11, 2024

Refine only logs reads are permitted section #4141

Open

pellared mentioned this pull request Oct 2, 2024

Specify how Logs SDK implements Enabled #4207

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add isolating log record processor #4062

Add isolating log record processor #4062

pellared commented May 28, 2024 •

edited

Loading

pellared commented May 28, 2024

yurishkuro left a comment •

edited

Loading

pellared commented May 28, 2024

pellared commented May 28, 2024 •

edited

Loading

jmacd left a comment

CodeBlanch commented May 29, 2024

pellared commented May 31, 2024 •

edited

Loading

pellared commented Jun 4, 2024

github-actions bot commented Jun 13, 2024

github-actions bot commented Jun 26, 2024

carlosalberto commented Jun 27, 2024

Add isolating log record processor #4062

Add isolating log record processor #4062

Conversation

pellared commented May 28, 2024 • edited Loading

Why

What

Remarks

pellared commented May 28, 2024

yurishkuro left a comment • edited Loading

Choose a reason for hiding this comment

pellared commented May 28, 2024

pellared commented May 28, 2024 • edited Loading

jmacd left a comment

Choose a reason for hiding this comment

CodeBlanch commented May 29, 2024

pellared commented May 31, 2024 • edited Loading

pellared commented Jun 4, 2024

github-actions bot commented Jun 13, 2024

github-actions bot commented Jun 26, 2024

carlosalberto commented Jun 27, 2024

pellared commented May 28, 2024 •

edited

Loading

yurishkuro left a comment •

edited

Loading

pellared commented May 28, 2024 •

edited

Loading

pellared commented May 31, 2024 •

edited

Loading