Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: Routing connector #19738

Closed
2 tasks
djaglowski opened this issue Mar 15, 2023 · 22 comments · Fixed by #36143
Closed
2 tasks

New component: Routing connector #19738

djaglowski opened this issue Mar 15, 2023 · 22 comments · Fixed by #36143
Assignees
Labels
Accepted Component New component has been sponsored

Comments

@djaglowski
Copy link
Member

djaglowski commented Mar 15, 2023

The purpose and use-cases of the new component

We should develop a connector version of the routingprocessor (and deprecate the processor version).

The connector implementation would differ only in a couple minor ways:

  • Where the processor configuration references exporters, the connector would instead reference pipelines.
  • Ideally the connector would depend only on OTTL conditions in the routing table.
  • In order to support all of OTTL's contexts, it may be necessary to replace the table key with context-specific keys.

Example configuration for the component

receivers:
  foo:

processors:
  sample:
  filter:
  batch:

exporters:
  bar/0:
  bar/1:
  bar/2:

connectors:
 routing:
    default_pipelines: [traces/default]
    error_mode: ignore
    spans:
      - statement: resource.attributes["env"] == 1
        pipelines: [traces/1]
      - statement: attributes["id"] == 2
        pipelines: [traces/2]

service:
  pipelines:
    traces/in:
      receivers: [foo]
      exporters: [routing]
    traces/default:
      receivers: [routing]
      processors: [sample]
      exporters: [bar/0]  
    traces/1:
      receivers: [routing]
      processors: [batch]
      exporters: [bar/1]    
    traces/2:
      receivers: [routing]
      processors: [filter]
      exporters: [bar/2]

Telemetry data types supported

traces, metrics, logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

No response

@djaglowski djaglowski added the needs triage New item requiring triage label Mar 15, 2023
@djaglowski
Copy link
Member Author

I believe @kovrus has expressed interest in developing this, but I wanted to create an issue for tracking.

@kovrus, please let me know if you think this config makes sense or if you had other ideas about how it should look.

@atoulme atoulme added Sponsor Needed New component seeking sponsor and removed needs triage New item requiring triage labels Mar 15, 2023
@kovrus
Copy link
Member

kovrus commented Mar 17, 2023

      - statement: attributes["id"] == 2
        exporters: [traces/2]

I guess, exporters should be pipelines here too.

It looks good to me! @jpkrohling can you take a look at the configuration?

I am happy to work on it, but I probably will be able to start on it sometime in the second half of April.

@djaglowski
Copy link
Member Author

@kovrus, good catch, just missed that one.

I can start work on this today if you and/or @jpkrohling are willing to sponsor/co-own the component.

@kovrus
Copy link
Member

kovrus commented Mar 17, 2023

I am happy to sponsor it!

@djaglowski djaglowski added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Mar 17, 2023
@jpkrohling
Copy link
Member

This looks good, but I would keep routing based on attributes out of the initial version, only based on resources and, eventually, context values (like auth or HTTP header).

@djaglowski
Copy link
Member Author

I have unfortunately not made much progress on this. I began an implementation but my proposed redesign of the configuration seems to have introduced many complications I did not anticipate. I do not believe this is a problem related to converting this component to a connector - just a problem with trying to redesign the configuration and evaluation of routing rules. Perhaps it is best to change the config as little as possible, though I do believe we should drop support for non-ottl statements.

I don't expect to have additional time for this until May, so @kovrus, if you are still able to work on this please feel free to pick this up. Otherwise, I will circle back to this when possible.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@mwear
Copy link
Member

mwear commented Jun 14, 2023

The PR for the initial version of the routing connector is in review. It supports routing based on resource attributes via OTTL statements. I know that we would like to add context based routing as a follow up PR, but I wanted to gather some information on this issue on how we'd like to approach it. Do we foresee adding additional configuration for context based routing? If so, how would we like that to look? Is the plan for us to extend OTTL to be able to match contexts used by the connector, or something else?

I just want to make sure we have some idea of how we'd like this to work so that we don't paint ourselves into a corner with the initial implementation.

@djaglowski
Copy link
Member Author

Do we foresee adding additional configuration for context based routing? If so, how would we like that to look?

My understanding is that we should support context-based routing, but I'm not an expert on the subject. @jpkrohling and @kovrus, as code owners of the routing processor, can you weigh in on this?

Is the plan for us to extend OTTL to be able to match contexts used by the connector, or something else?

Can you elaborate on this question?

@jpkrohling
Copy link
Member

For context, I imagine something like:

connectors:
 routing:
    spans:
      - statement: metadata["X-Tenant-ID"] == "acme"
        pipelines: [traces/1]

Perhaps client would be better than metadata? Context here means data that we obtained from the request's context (metadata) and placed into the clien.Info object.

We use something like that in the header setter extension: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/headerssetterextension#configuration

func (ts *ContextSource) Get(ctx context.Context) (string, error) {
cl := client.FromContext(ctx)
ss := cl.Metadata.Get(ts.Key)
if len(ss) == 0 {
return "", nil
}
if len(ss) > 1 {
return "", fmt.Errorf("%d source keys found in the context, can't determine which one to use", len(ss))
}
return ss[0], nil
}

The routing processor looks up only gRPC metadata in the context, which predates the client.Info facility.

djaglowski added a commit that referenced this issue Jul 6, 2023
**Description:** 

This PR introduces a connector version of the routing processor. It
currently supports routing based on resource attributes OTTL statements
only. Context based routing will be added later in a followup PR.

There are two issues regarding fanout consumers that are being addressed
in the core repo.
* The routing connector needs to be a consumer in multiple pipelines
(the routing processor can route to a single exporter).
* Will be resolved by:
open-telemetry/opentelemetry-collector#7840
* We need the ability to create connector routers to facilitate testing.
I had to temporarily copy the fanoutconsumer package into the routing
connector codebase due to package visibility issues.
* Will be resolved by:
open-telemetry/opentelemetry-collector#7673

We can address these issues as the relevant PRs land on the core repo.

cc: @djaglowski, @jpkrohling, @kovrus 

**Link to tracking Issue:** 
#19738

**Testing:** 
Unit / manual

**Documentation:**
The readme has been ported from the routing processor

---------

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Co-authored-by: Ruslan Kovalov <ruslan.kovalov@grafana.com>
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

github-actions bot commented Jan 1, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jan 1, 2024
@djaglowski djaglowski removed the Stale label Jan 8, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@verejoel
Copy link

verejoel commented Jun 10, 2024

Would be interested to broach the topic of context-aware routing again. My experience so far is that the routing connector works well, but necessarily destroys context. Having a context-aware routing connector would be much more efficient, as we don't need to unpack any of the payload in order to route it, and more easily support multi-tenant setups.

As for how to approach it - how about running the connector in either context or resource modes? context mode allows to route based on client metadata keys, and preserves context as telemetry passes through the connector. resource mode is the current behaviour.

I think this would be preferable, as I can't quite tell how the connector would work in a "joint mode" where we have routes that match both metadata and resource attributes:

connectors:
  routing:
    default_pipelines: [traces/jaeger]
    error_mode: ignore
    match_once: false
    table:
      - statement: route() where metadata["X-Tenant"] == "acme"
        pipelines: [traces/jaeger-acme]
      - statement: where attributes["job"] == "firewall"
        pipelines: [traces/jaeger-firewall]

I see context mode as supporting multi-tenancy based on request headers (think Loki/Thanos tenant headers). It would be fine, for example, to have an initial context-based routing stage, and then a secondary resource-based routing stage as needed.

@djaglowski
Copy link
Member Author

@verejoel, your proposal makes sense to me, and sounds fairly straightforward. Will we need a new config parameter to indicate the context, or where you envisioning that we could infer & enforce one or the other from the statements?

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@djaglowski
Copy link
Member Author

djaglowski commented Oct 29, 2024

I've opened a draft PR with a proof of concept for routing by request context. I'm looking for early feedback on the design.

  1. I was able to integrate the new "request" context into the existing routing table structure, so that any route can use ay context, and routes with different contexts can be mixed together in any order in the routing table. e.g. First, route all debug logs (using log context) to cheap storage. Then, route the entire remaining payload by request context. I think this is a big win but let me know if I'm missing something here.
  2. In order to make routing by request context feel similar to other contexts, I thought it would be nice to define it in terms of the condition field. To that end, I've stubbed in a very basic OTTL-like grammar for referencing request fields. It's strictly request["foo"] == "bar" or request["hello"] != "world". My thinking here is that in addition to feeling like other contexts, in the future if OTTL supports request context natively this can be rebased to pick up more flexibility. An alternative solution here would be to just add a new field to routes, which is only usable for request context.

@djaglowski
Copy link
Member Author

With #36067 merged, I think this issue can be closed once similar functionality is implemented for traces and metrics. I plan to work on this soon.

@verejoel
Copy link

verejoel commented Nov 6, 2024

@djaglowski Thanks for this - I will be very happy to test it with the next release :) awesome work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored
Projects
None yet
6 participants