Skip to content

Simplify the configuration loading #437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

petrkotas
Copy link
Member

@petrkotas petrkotas commented May 7, 2025

The code simplifies the ocm config loading. It also moves the env variables right to the start,
so it is obvious when they are used.

JIRA: https://issues.redhat.com/browse/SREP-71

@openshift-ci openshift-ci bot requested review from joshbranham and RaphaelBut May 7, 2025 14:15
Copy link
Contributor

openshift-ci bot commented May 7, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: petrkotas
Once this PR has been reviewed and has the lgtm label, please assign nikokolas3270 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov-commenter
Copy link

codecov-commenter commented May 7, 2025

Codecov Report

Attention: Patch coverage is 0.44248% with 225 lines in your changes missing coverage. Please review.

Project coverage is 30.48%. Comparing base (0445703) to head (36bc03c).

Files with missing lines Patch % Lines
cadctl/config/config.go 0.00% 91 Missing ⚠️
cadctl/cmd/investigate/investigate.go 0.00% 57 Missing ⚠️
pkg/ocm/ocm_config.go 0.00% 24 Missing ⚠️
pkg/ocm/ocm.go 0.00% 17 Missing ⚠️
pkg/metrics/metrics.go 0.00% 10 Missing ⚠️
pkg/k8s/client.go 0.00% 6 Missing ⚠️
cadctl/main.go 0.00% 4 Missing ⚠️
pkg/logging/logging.go 20.00% 4 Missing ⚠️
cadctl/cmd/root.go 0.00% 3 Missing ⚠️
...rrorbudgetburn/clustermonitoringerrorbudgetburn.go 0.00% 3 Missing ⚠️
... and 6 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #437      +/-   ##
==========================================
- Coverage   32.08%   30.48%   -1.60%     
==========================================
  Files          35       36       +1     
  Lines        2425     2529     +104     
==========================================
- Hits          778      771       -7     
- Misses       1587     1699     +112     
+ Partials       60       59       -1     
Files with missing lines Coverage Δ
...tigations/apierrorbudgetburn/apierrorbudgetburn.go 0.00% <0.00%> (ø)
...tions/insightsoperatordown/insightsoperatordown.go 8.45% <0.00%> (ø)
...e/machinehealthcheckunterminatedshortcircuitsre.go 50.29% <0.00%> (ø)
...cfailureover4hr/upgradeconfigsyncfailureover4hr.go 20.28% <0.00%> (ø)
pkg/managedcloud/managedcloud.go 0.00% <0.00%> (ø)
pkg/pagerduty/pagerduty.go 63.33% <0.00%> (+1.03%) ⬆️
cadctl/cmd/root.go 0.00% <0.00%> (ø)
...rrorbudgetburn/clustermonitoringerrorbudgetburn.go 10.86% <0.00%> (-0.50%) ⬇️
cadctl/main.go 0.00% <0.00%> (ø)
pkg/logging/logging.go 57.89% <20.00%> (-9.55%) ⬇️
... and 6 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@petrkotas petrkotas force-pushed the simplify-ocm-config branch 2 times, most recently from f661634 to f2f64c8 Compare May 13, 2025 11:44
@petrkotas petrkotas requested review from bergmannf and typeid May 13, 2025 11:46
@petrkotas
Copy link
Member Author

/retest

@petrkotas petrkotas force-pushed the simplify-ocm-config branch 3 times, most recently from 676689d to b6d8bbd Compare May 14, 2025 08:47
@petrkotas petrkotas force-pushed the simplify-ocm-config branch 4 times, most recently from 1341444 to 537e812 Compare May 14, 2025 11:04
@petrkotas petrkotas requested a review from typeid May 14, 2025 11:43
@RaphaelBut
Copy link
Contributor

Looks good and generally makes sense to me.
We have a couple more env variables flying around, but with this change it will be easy to integrate them later on or when needed.

Good stuff!

@petrkotas petrkotas force-pushed the simplify-ocm-config branch from 537e812 to 042af60 Compare May 19, 2025 12:02
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 19, 2025
@petrkotas petrkotas force-pushed the simplify-ocm-config branch 2 times, most recently from 99d38c9 to 5de9384 Compare May 19, 2025 13:26
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 19, 2025
@petrkotas petrkotas force-pushed the simplify-ocm-config branch 3 times, most recently from 582f9b1 to d65ed9a Compare May 20, 2025 09:12
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 20, 2025
@petrkotas
Copy link
Member Author

/retest

1 similar comment
@petrkotas
Copy link
Member Author

/retest

@petrkotas petrkotas force-pushed the simplify-ocm-config branch from d65ed9a to 16ddf2c Compare May 20, 2025 09:32
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 20, 2025
Moves the options to flags. Also moves the
env variables to the main command, so there are
no sideeffects when running the commands.

Signed-off-by: Petr Kotas <pkotas@redhat.com>
@petrkotas petrkotas force-pushed the simplify-ocm-config branch from 16ddf2c to 36bc03c Compare May 21, 2025 04:37
@petrkotas petrkotas requested review from RaphaelBut and typeid May 21, 2025 04:37
Copy link
Contributor

openshift-ci bot commented May 21, 2025

@petrkotas: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +43 to +73
PreRunE: func(cmd *cobra.Command, _ []string) error {
var err error
var c config.Config

c, err = config.BuildConfig(cmd)
if err != nil {
return fmt.Errorf("failed to build config: %w", err)
}

// Initialize the logger here because it depends on user input
logging.RawLogger = logging.InitLogger(c.LogLevel)
logging.RawLogger = logging.WithPipelineName(logging.RawLogger, c.PipelineName)

// configure the RunE command using a wrapper to
// append the configuration into the command
cmd.RunE = func(cmd *cobra.Command, args []string) error {
if err := run(c, cmd, args); err != nil {
logging.Error(err)
}
return err
}

return nil
},
RunE: func(_ *cobra.Command, _ []string) error {
// DO NOT REMOVE
// this ensures RunE work as expected
// RunE is set in the PreRunE function
// this works because the RunE is always called after PreRunE
return nil
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason we need this vs just loading the config & initialize the logger in in run()? It seems like an unnecessary hack to use RunE, maybe I'm missing something.

Comment on lines +90 to +96
var err error

if c.PrometheusPushGateway == "" {
logging.Warn("metrics disabled, set env 'CAD_PROMETHEUS_PUSHGATEWAY' to push metrics")
}

err = metrics.Push(c.PrometheusPushGateway)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the loose var err error here, this would help with readability.

Suggested change
var err error
if c.PrometheusPushGateway == "" {
logging.Warn("metrics disabled, set env 'CAD_PROMETHEUS_PUSHGATEWAY' to push metrics")
}
err = metrics.Push(c.PrometheusPushGateway)
if c.PrometheusPushGateway == "" {
logging.Warn("metrics disabled, set env 'CAD_PROMETHEUS_PUSHGATEWAY' to push metrics")
}
err := metrics.Push(c.PrometheusPushGateway)

See google golang styleguide on error handling: https://google.github.io/styleguide/go/guide#maintainability

}

// early init of logger for logs before clusterID is known
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover? I don't see the relation to the code around it.

Suggested change
// early init of logger for logs before clusterID is known

logging.Warn("metrics disabled, set env 'CAD_PROMETHEUS_PUSHGATEWAY' to push metrics")
}

err = metrics.Push(c.PrometheusPushGateway)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any metrics yet at this point in the code? The old logic pushed metrics after they were set (post run logic), now we push before we run anything IIUC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another issue is that we first check if the PushGateway is empty, but the proceed to push metrics to it, even if it is empty.

Comment on lines +108 to +113
if c.PagerDutyToken == "" {
return fmt.Errorf("missing PagerDuty token")
}
if c.PagerDutySilentPolicy == "" {
return fmt.Errorf("missing PagerDuty silent policy")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intent behind checking this again here?

We already checked that we loaded something while building the config. If we want to check for completeness/format, this should ideally also go to the config building to allow us to exit as early as possible once we detect we're missing something required.

ocmConfigFile = file
}

func HasConfigFile() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function used? If not, can we remove it? I couldn't find any references.

return nil, fmt.Errorf("one of the required envvars in the list '(CAD_SILENT_POLICY CAD_PD_TOKEN)' is missing")
}

func GetPDClient(webhookPayload []byte, cadPD string, cadSilentPolicy string) (*SdkClient, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat.

Comment on lines +13 to +14
// Initialize the logger
logging.RawLogger = logging.InitLogger("info")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  1. Thought: Hmm, this is a bit of an unwanted side effect of the ptr, but I suppose we'll get rid of this in a later refactor.
  2. Nitpick: this is a useless comment & we don't comment it in other files either. We can remove it for consistency.
Suggested change
// Initialize the logger
logging.RawLogger = logging.InitLogger("info")
logging.RawLogger = logging.InitLogger("info")

Comment on lines +185 to +193
if c.BackplaneURL == "" {
return fmt.Errorf("missing backplane URL")
}
if c.BackplaneProxyURL == "" {
logging.Warn("missing backplane proxy URL, using default")
}
if c.BackplaneInitialARN == "" {
return fmt.Errorf("missing backplane initial ARN")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit late to check these. Would we not have caught this when loading the config at the start?

What's the default for BackplaneProxyURL?

}

client := SdkClient{}
var err error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we introducing this specific line?

@typeid
Copy link
Member

typeid commented May 26, 2025

The latest review is mostly style/chore type changes. The one major issue I see is that the metrics are likely broken by moving the pushgateway push() to the start of the investigation run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants