Logs do not reach any endpoints if one `ClusterOutput` is configured with an invalid endpoint

**Describe the bug**:

When one `ClusterOutput` has an invalid endpoint, all logs (including those destined for a valid endpoint configured in a `ClusterOutput`) cease to reach their destinations. `fluentbit` Pods produce errors like the following:
```
[2025/04/14 21:37:36] [error] [upstream] connection #129 to tcp://10.43.190.83:24240 timed out after 10 seconds (connection timeout)
[2025/04/14 21:37:36] [error] [upstream] connection #130 to tcp://10.43.190.83:24240 timed out after 10 seconds (connection timeout)
[2025/04/14 21:37:36] [error] [upstream] connection #64 to tcp://10.43.190.83:24240 timed out after 10 seconds (connection timeout)
[2025/04/14 21:37:36] [ warn] [engine] failed to flush chunk '1-1744666485.766626455.flb', retry in 46 seconds: task_id=281, input=tail.0 > output=forward.0 (out_id=0)
[2025/04/14 21:37:36] [error] [output:forward:forward.0] no upstream connections available
[2025/04/14 21:37:36] [error] [output:forward:forward.0] no upstream connections available
[2025/04/14 21:37:36] [error] [output:forward:forward.0] no upstream connections available
[2025/04/14 21:37:36] [ warn] [engine] failed to flush chunk '1-1744666391.837929658.flb', retry in 147 seconds: task_id=170, input=tail.0 > output=forward.0 (out_id=0)
[2025/04/14 21:37:36] [ warn] [engine] failed to flush chunk '1-1744666392.763873065.flb', retry in 23 seconds: task_id=172, input=tail.0 > output=forward.0 (out_id=0)
```

**Expected behaviour**:

Logs destined for valid endpoints that are configured in a `ClusterOutput` should reach their destination even if an invalid endpoint is configured in a `ClusterOutput`.

**Steps to reproduce the bug**:

Install something that will act as a destination for the logs. I used Kibana and Elasticsearch, but my understanding is that it doesn't matter what you use, as long as you have a valid place to send and view logs:
1. Install ECK operator as per https://www.elastic.co/docs/deploy-manage/deploy/cloud-on-k8s/install-using-yaml-manifest-quickstart
2. Deploy an elasticsearch cluster as per https://www.elastic.co/docs/deploy-manage/deploy/cloud-on-k8s/elasticsearch-deployment-quickstart. I used the following manifest:
```yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
  namespace: cattle-logging-system
spec:
  version: 8.17.4
  nodeSets:
  - name: default
    count: 1
    config:
      node.store.allow_mmap: false
```
4. Deploy Kibana as per https://www.elastic.co/docs/deploy-manage/deploy/cloud-on-k8s/kibana-instance-quickstart. I used the following manifest:
```yaml
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: quickstart
  namespace: cattle-logging-system
spec:
  version: 8.17.4
  count: 1
  elasticsearchRef:
    name: quickstart
```

The following is the part that is relevant to this issue:

1. `helm upgrade --install --wait --create-namespace --namespace cattle-logging-system logging-operator --version 4.10.0 oci://ghcr.io/kube-logging/helm-charts/logging-operator`
2. Apply the following `Logging` and `FluentbitAgent`:
```yaml
apiVersion: logging.banzaicloud.io/v1beta1
kind: Logging
metadata:
  name: rancher-logging-root
  namespace: cattle-logging-system
spec:
  controlNamespace: cattle-logging-system
  fluentd:
    disablePvc: true
    livenessProbe:
      initialDelaySeconds: 30
      periodSeconds: 15
      tcpSocket:
        port: 24240
    metrics:
      prometheusRules: false
      serviceMonitor: false
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: FluentbitAgent
metadata:
  name: rancher-logging-root
  namespace: cattle-logging-system
spec:
  metrics:
    prometheusRules: false
    serviceMonitor: false
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    value: "true"
  - effect: NoExecute
    key: node-role.kubernetes.io/etcd
    value: "true"
```
3. Apply the following `ClusterOutput`:
```yaml
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
  name: elasticsearch
  namespace: cattle-logging-system
spec:
  elasticsearch:
    host: quickstart-es-http.cattle-logging-system.svc.cluster.local
    port: 9200
    scheme: https
    ssl_verify: false
    ssl_version: TLSv1_2
    user: elastic
    password:
      valueFrom:
        secretKeyRef:
          name: quickstart-es-elastic-user
          key: elastic
    buffer:
      timekey: 1m
      timekey_wait: 30s
      timekey_use_utc: true
```
4. Apply the following `ClusterFlow`:
```yaml
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
  name: elasticsearch
  namespace: cattle-logging-system
spec:
  globalOutputRefs:
    - elasticsearch
```
5. Check that logs are coming through in Elasticsearch/Kibana.
6. Apply the following `ClusterOutput`:
```yaml
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
  name: badelasticsearch
  namespace: cattle-logging-system
spec:
  elasticsearch:
    host: invalidaddress.cattle-logging-system.svc.cluster.local
    port: 9200
    scheme: https
    ssl_verify: false
    ssl_version: TLSv1_2
    user: elastic
    password:
      valueFrom:
        secretKeyRef:
          name: quickstart-es-elastic-user
          key: elastic
    buffer:
      timekey: 1m
      timekey_wait: 30s
      timekey_use_utc: true
```
7. Apply the following `ClusterFlow`:
```yaml
apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterFlow
metadata:
  name: badelasticsearch
  namespace: cattle-logging-system
spec:
  globalOutputRefs:
    - badelasticsearch
```
8. Note that logs are no longer coming through in Elasticsearch/Kibana. Note errors in the logs of the fluentbit agent.

**Additional context**:

This was first noticed in the [Rancher Logging helm chart](https://ranchermanager.docs.rancher.com/v2.10/integrations-in-rancher/logging), which repackages this project for easy use in Rancher. The reported issue is https://github.com/rancher/rancher/issues/26771.

**Environment details**:
- Kubernetes version (e.g. v1.15.2): v1.31.6
- Cloud-provider/provisioner (e.g. AKS, GKE, EKS, PKE etc): k3s
- logging-operator version (e.g. 2.1.1): 4.10.0
- Install method (e.g. helm or static manifests): Rancher Logging
- Logs from the misbehaving component (and any other relevant logs): see above
- Resource definition (possibly in YAML format) that caused the issue, without sensitive data: see above

/kind bug


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logs do not reach any endpoints if one `ClusterOutput` is configured with an invalid endpoint #2013

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Logs do not reach any endpoints if one ClusterOutput is configured with an invalid endpoint #2013

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Logs do not reach any endpoints if one `ClusterOutput` is configured with an invalid endpoint #2013