New component: Failover Connector #20766

djaglowski · 2023-04-08T19:15:43Z

The purpose and use-cases of the new component

A connector that routes data based on the current health status of a downstream component, typically an exporter.

I have heard several users ask for the ability to send data to a backup exporter, if a primary exporter fails. I believe this could be implemented as a routing connector.

The user would specify at least one pipeline to which data would typically be routed. Additionally, the user must specify at least one backup pipeline or pipelines which would be used when an error is encountered.

Initially, I think the trigger for routing to a backup pipeline could be based on backpropogated errors, though this is not yet very robust (See open-telemetry/opentelemetry-collector#7460). At a later time, I imagine this could be based on the health status of an exporter (See open-telemetry/opentelemetry-collector#6344).

Example configuration for the component

receivers:
  foo:

exporters:
  bar/main:
  bar/backup:

connectors:
  failover:
    primary: logs/main
    secondary: logs/backup

service:
  pipelines:
    logs/in:
       receivers: [foo]
       exporters: [failover]
    logs/main:
      receivers: [failover]
      exporters: [bar/main]
    logs/backup:
      receivers: [failover]
      exporters: [bar/backup]

Telemetry data types supported

traces->traces
metrics->metrics
logs->logs

Is this a vendor-specific component?

This is a vendor-specific component
If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

No response

sethallen · 2023-04-09T03:53:46Z

I'm glad you made this @djaglowski! I was just chatting with @atoulme about adding failover and circuit breaker support for exporters a couple days ago. The connector seems like a great method to add broad failover support.

How about tweaking this slightly to support 1..N entries as a yaml flow sequence? It would reduce complexity in the failover connector by removing the need for keys (primary, secondary, etc.) in order to choose the next pipeline to failover to.

Example:

receivers:
  foo:

exporters:
  bar/main:
  bar/backup:
  bar/backup2:

connectors:
  failover: [logs/main, logs/backup, logs/backup2, .. n]
#    primary: logs/main
#    secondary: logs/backup

service:
  pipelines:
    logs/in:
       receivers: [foo]
       exporters: [failover]
    logs/main:
      receivers: [failover]
      exporters: [bar/main]
    logs/backup:
      receivers: [failover]
      exporters: [bar/backup]
    logs/backup2:
      receivers: [failover]
      exporters: [bar/backup2]

fatsheep9146 · 2023-04-09T04:21:34Z

I'd like to sponsor this.

djaglowski · 2023-04-09T18:45:15Z

@sethallen, I like the idea of allowing a priority list, but I think we should leave room for other parameters as well. I also think we need to allow multiple pipelines per "level".

connectors:
  failover: 
    priority:
      - [logs/main]
      - [logs/backup, logs/backup2]
      - [logs/backup/3]
    min_failover_interval: 2m # Possibly would add this in future

cparkins · 2023-04-12T03:23:51Z

@djaglowski How would the multiple pipelines be used? In a fan-out or Priority 1, Priority 2-1, Priority 2-2, ... Priority N

djaglowski · 2023-04-12T08:41:42Z

@cparkins, when there are multiple pipelines at the same priority level, it would fan out data to those pipelines.

github-actions · 2023-06-12T03:32:42Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

akats7 · 2023-07-12T14:32:41Z

@djaglowski I also had this feature request, I'd be happy to work on it/ support any way I can.

djaglowski · 2023-07-12T14:39:55Z

@akats7, any help moving this forward would be great. I'll be happy to review any PRs.

akats7 · 2023-07-12T14:49:50Z

@djaglowski sounds good, can I please be assigned this issue.

sethallen · 2023-07-12T16:38:25Z

@djaglowski / @akats7 / @atoulme - Perhaps this work effort can be merged with what @cparkins has been working on internally for us over the last few months. He added resiliency features (Failover, Circuit Breaker) to the Splunk HEC Exporter for the OTel Collector and submitted them in the PR below:

Add Resiliency Features to the Splunk HEC Exporter #23821

djaglowski · 2023-07-12T17:08:35Z

@sethallen, I'm supportive of the idea. In my opinion, failover at least should be implemented as a connector because in many cases it may be appropriate to failover to a different type of exporter. If I recall correctly, you and/or @cparkins looked into the idea of implementing other resiliency features into a connector. Do you still see that as a viable path? Either way, I think the failover connector should move forward and we can add additional capabilities based on a proposal.

github-actions · 2023-09-11T03:31:58Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

akats7 · 2023-09-11T03:49:51Z

^ I was able to begin looking into this recently and will open a first pass PR for this shortly.

sethallen · 2023-09-12T05:31:25Z

That's exciting @akats7. We've been maintaining an internal fork of resiliency features added to the Splunk HEC Exporter and would love to get these features somewhere into the mainline collector. Your PR for a Connector will be great to see and hopefully help with. Cheers!

github-actions · 2023-11-13T03:30:32Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@djaglowski

This is the Part 1 PR for the Failover Connector (split according to the CONTRIBUTING.md doc) Link to tracking Issue: #20766 Testing: Added factory test Note: Full functionality PR exists [here](#27641) and will likely be refactored to serve as the part 2 PR cc: @djaglowski @sethallen @MovieStoreGuy

@djaglowski

This is the Part 1 PR for the Failover Connector (split according to the CONTRIBUTING.md doc) Link to tracking Issue: open-telemetry#20766 Testing: Added factory test Note: Full functionality PR exists [here](open-telemetry#27641) and will likely be refactored to serve as the part 2 PR cc: @djaglowski @sethallen @MovieStoreGuy

@djaglowski

This is the 2nd PR for the failover connector that implements the core failover functionality. It is currently in place for Traces and once solidified will be repeated for metrics and logs Link to tracking Issue: #20766 Note: Will add traces tests today but pushing up to begin review cc: @djaglowski @fatsheep9146

@djaglowski

This is the 3rd PR for the failover connector. This PR adds support for metric and log pipelines Link to tracking Issue: #20766 cc: @djaglowski @fatsheep9146

@djaglowski

This is the 3rd PR for the failover connector. This PR adds support for metric and log pipelines Link to tracking Issue: open-telemetry#20766 cc: @djaglowski @fatsheep9146

github-actions · 2024-01-15T03:29:17Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions · 2024-03-18T03:29:19Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

verejoel · 2024-03-25T14:40:23Z

Would love to revive this, definitely interested in this topic.

djaglowski · 2024-03-25T15:22:31Z

Thanks for pinging this @verejoel.

An implementation is in place but stability is still marked as development. @akats7, do you recall what is left to do? If we have a minimally functional component then I think we should move it to alpha status, close this issue and open issues for any additional functionality we would like.

akats7 · 2024-03-25T18:55:32Z

Hey @djaglowski @verejoel,

Yep the MVP functionality is in place, I did have one more change I've been planning to push so I'll push that along with the update to Alpha.

github-actions · 2024-05-27T03:31:52Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

djaglowski · 2024-05-28T13:06:56Z

@akats7, should we close this issue as completed?

akats7 · 2024-06-04T13:42:00Z

Hey @djaglowski, yep I think we can close this

djaglowski · 2024-06-04T13:43:50Z

Thanks @akats7!

djaglowski added Sponsor Needed New component seeking sponsor needs triage New item requiring triage labels Apr 8, 2023

djaglowski added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor needs triage New item requiring triage labels Apr 10, 2023

github-actions bot added the Stale label Jun 12, 2023

fatsheep9146 removed the Stale label Jun 12, 2023

djaglowski assigned akats7 Jul 12, 2023

atoulme mentioned this issue Sep 8, 2023

[exporter/splunkhec] Add the ability to specify a Failover Endpoint and Enable a Circuit Breaker #23822

Closed

github-actions bot added the Stale label Sep 11, 2023

github-actions bot removed the Stale label Sep 11, 2023

crobert-1 mentioned this issue Oct 9, 2023

Add Resiliency Features to the Splunk HEC Exporter #23821

Closed

akats7 mentioned this issue Oct 12, 2023

Initial failover connector PR #27641

Closed

akats7 mentioned this issue Oct 31, 2023

First PR - Failover Connector skeleton #28818

Merged

github-actions bot added the Stale label Nov 13, 2023

fatsheep9146 removed the Stale label Nov 13, 2023

akats7 mentioned this issue Nov 29, 2023

Failover Connector PR2 - core failover functionality #29557

Merged

akats7 mentioned this issue Dec 18, 2023

Added metric and log support #30019

Merged

djaglowski pushed a commit that referenced this issue Jan 8, 2024

Added metric and log support (#30019)

f8a040f

This is the 3rd PR for the failover connector. This PR adds support for metric and log pipelines Link to tracking Issue: #20766 cc: @djaglowski @fatsheep9146

github-actions bot added the Stale label Jan 15, 2024

fatsheep9146 removed the Stale label Jan 15, 2024

djaglowski added Accepted Component New component has been sponsored and removed Accepted Component New component has been sponsored labels Jan 16, 2024

github-actions bot added the Stale label Mar 18, 2024

github-actions bot removed the Stale label Mar 26, 2024

github-actions bot added the Stale label May 27, 2024

github-actions bot removed the Stale label May 29, 2024

djaglowski closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New component: Failover Connector #20766

New component: Failover Connector #20766

djaglowski commented Apr 8, 2023

sethallen commented Apr 9, 2023 •

edited

Loading

fatsheep9146 commented Apr 9, 2023

djaglowski commented Apr 9, 2023

cparkins commented Apr 12, 2023

djaglowski commented Apr 12, 2023

github-actions bot commented Jun 12, 2023

akats7 commented Jul 12, 2023

djaglowski commented Jul 12, 2023

akats7 commented Jul 12, 2023

sethallen commented Jul 12, 2023

djaglowski commented Jul 12, 2023

github-actions bot commented Sep 11, 2023

akats7 commented Sep 11, 2023

sethallen commented Sep 12, 2023

github-actions bot commented Nov 13, 2023

github-actions bot commented Jan 15, 2024

github-actions bot commented Mar 18, 2024

verejoel commented Mar 25, 2024

djaglowski commented Mar 25, 2024

akats7 commented Mar 25, 2024

github-actions bot commented May 27, 2024

djaglowski commented May 28, 2024

akats7 commented Jun 4, 2024

djaglowski commented Jun 4, 2024

New component: Failover Connector #20766

New component: Failover Connector #20766

Comments

djaglowski commented Apr 8, 2023

The purpose and use-cases of the new component

Example configuration for the component

Telemetry data types supported

Is this a vendor-specific component?

Sponsor (optional)

Additional context

sethallen commented Apr 9, 2023 • edited Loading

fatsheep9146 commented Apr 9, 2023

djaglowski commented Apr 9, 2023

cparkins commented Apr 12, 2023

djaglowski commented Apr 12, 2023

github-actions bot commented Jun 12, 2023

akats7 commented Jul 12, 2023

djaglowski commented Jul 12, 2023

akats7 commented Jul 12, 2023

sethallen commented Jul 12, 2023

djaglowski commented Jul 12, 2023

github-actions bot commented Sep 11, 2023

akats7 commented Sep 11, 2023

sethallen commented Sep 12, 2023

github-actions bot commented Nov 13, 2023

github-actions bot commented Jan 15, 2024

github-actions bot commented Mar 18, 2024

verejoel commented Mar 25, 2024

djaglowski commented Mar 25, 2024

akats7 commented Mar 25, 2024

github-actions bot commented May 27, 2024

djaglowski commented May 28, 2024

akats7 commented Jun 4, 2024

djaglowski commented Jun 4, 2024

sethallen commented Apr 9, 2023 •

edited

Loading