Skip to content

Conversation

leandroberetta
Copy link
Contributor

@leandroberetta leandroberetta commented Sep 11, 2025

Description

Support for writing logs to Loki (distributor) using gRPC.

Ref: https://issues.redhat.com/browse/NETOBSERV-584

Dependencies

netobserv/loki-client-go#3
netobserv/flowlogs-pipeline#1086
netobserv/network-observability-console-plugin#1021

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Copy link

openshift-ci bot commented Sep 11, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Sep 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from leandroberetta. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

codecov bot commented Sep 11, 2025

Codecov Report

❌ Patch coverage is 26.03550% with 125 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.27%. Comparing base (20eb307) to head (7eea6fb).

Files with missing lines Patch % Lines
internal/controller/flp/flp_pipeline_builder.go 26.59% 66 Missing and 3 partials ⚠️
api/flowcollector/v1beta2/zz_generated.deepcopy.go 0.00% 23 Missing and 1 partial ⚠️
internal/pkg/helper/loki_config.go 45.71% 16 Missing and 3 partials ⚠️
.../controller/consoleplugin/consoleplugin_objects.go 18.75% 11 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1973      +/-   ##
==========================================
- Coverage   71.02%   70.27%   -0.75%     
==========================================
  Files          75       75              
  Lines       10011    10143     +132     
==========================================
+ Hits         7110     7128      +18     
- Misses       2513     2622     +109     
- Partials      388      393       +5     
Flag Coverage Δ
unittests 70.27% <26.03%> (-0.75%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
api/flowcollector/v1beta2/flowcollector_types.go 100.00% <ø> (ø)
.../controller/consoleplugin/consoleplugin_objects.go 82.55% <18.75%> (-2.13%) ⬇️
internal/pkg/helper/loki_config.go 79.34% <45.71%> (-20.66%) ⬇️
api/flowcollector/v1beta2/zz_generated.deepcopy.go 38.83% <0.00%> (-0.97%) ⬇️
internal/controller/flp/flp_pipeline_builder.go 67.80% <26.59%> (-6.22%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leandroberetta leandroberetta added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 11, 2025
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:7c27f48
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-7c27f48
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-7c27f48

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:7c27f48 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-7c27f48

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-7c27f48
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 11, 2025
@leandroberetta leandroberetta added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 11, 2025
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:6425002
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-6425002
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-6425002

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:6425002 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-6425002

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-6425002
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@leandroberetta leandroberetta marked this pull request as ready for review September 15, 2025 11:46
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 23, 2025
@leandroberetta leandroberetta changed the title Support for writing logs to Loki (distributor) using gRPC NETOBSERV 584: Support for writing logs to Loki (distributor) using gRPC Sep 23, 2025
@leandroberetta leandroberetta changed the title NETOBSERV 584: Support for writing logs to Loki (distributor) using gRPC NETOBSERV-584: Support for writing logs to Loki (distributor) using gRPC Sep 23, 2025
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Sep 23, 2025

@leandroberetta: This pull request references NETOBSERV-584 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Description

Support for writing logs to Loki (distributor) using gRPC.

Ref: https://issues.redhat.com/browse/NETOBSERV-584

Dependencies

netobserv/loki-client-go#3
netobserv/flowlogs-pipeline#1086
netobserv/network-observability-console-plugin#1021

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@leandroberetta
Copy link
Contributor Author

This PR adds support for writing logs to Loki using gRPC. In terms of performance, results are pretty similar to the existing http implementation.

I was investigating this and the current implementation uses http but messages are protocol buffers compressed (snappy), so it's already using less memory I guess.

In other hands, gRPC establishes a connection and has mechanisms to reconnect, keep alive. That take more resources (according to what I read).

Probably we can do a better performance testing until we can decide.

@leandroberetta leandroberetta added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 23, 2025
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:770611c
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-770611c
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-770611c

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:770611c make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-770611c

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-770611c
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

Comment on lines 887 to 907
type LokiGRPCConfig struct {
//+kubebuilder:validation:Minimum=1024
//+kubebuilder:validation:Maximum=67108864
//+kubebuilder:default:=67108864
// `maxRecvMsgSize` is the maximum message size in bytes the gRPC client can receive. Default: 64MB.
MaxRecvMsgSize int `json:"maxRecvMsgSize,omitempty"`

//+kubebuilder:validation:Minimum=1024
//+kubebuilder:validation:Maximum=67108864
//+kubebuilder:default:=16777216
// `maxSendMsgSize` is the maximum message size in bytes the gRPC client can send. Default: 16MB.
MaxSendMsgSize int `json:"maxSendMsgSize,omitempty"`

//+kubebuilder:default:="30s"
// `keepAlive` is the gRPC keep-alive interval.
KeepAlive *metav1.Duration `json:"keepAlive,omitempty"`

//+kubebuilder:default:="5s"
// `keepAliveTimeout` is the gRPC keep-alive timeout.
KeepAliveTimeout *metav1.Duration `json:"keepAliveTimeout,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need those exposed in the API.

On top of that, what's the difference between maxSendMsgSize and the existing writeBatchSize ?
Maybe we can reuse the same field here and put the other options as Env map[string]string in the advanced section

Here is an example for eBPF agent consuming envs: https://github.com/netobserv/network-observability-operator/blob/main/internal/controller/ebpf/agent_controller.go#L33-L75

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think we can probably start with not exposing anything at all (not even the clientType) and have it under a env-based feature gate. Probably users want us to make the choice for them, whether to use it or not.
If after more tests we see that there are no clear winner, just pros and cons, then maybe we will expose it?

Copy link

openshift-ci bot commented Oct 9, 2025

@leandroberetta: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator 8334899 link false /test e2e-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants