Skip to content

ROX-12487: Fix handling of proxy-related environment variables in the operator #2988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

misberner
Copy link
Contributor

Description

See bug for a more detailed description of the issue.

This PR changes the proxy-related environment variable injection logic to always give precedence to environment variables specified in the CR in the customize section, ignoring the respective values from the Operator's environment if both are set.

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required
  • Determined and documented upgrade steps
  • Documented user facing changes (create PR based on openshift/openshift-docs and merge into rhacs-docs)

If any of these don't apply, please comment below.

Testing Performed

  • Unit tests
  • CI (tbd)

Copy link
Contributor

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thing that jumped at me, unfortunately the soonest I'd be able to take a proper look will be Monday.

@@ -52,7 +48,33 @@ func InjectProxyEnvVars(translator values.Translator, proxyEnv map[string]string
return nil, err
}

proxyVals, _ := getProxyConfigHelmValues(obj, proxyEnv) // ignore errors for now
return chartutil.CoalesceTables(vals, proxyVals), nil
proxyEnvVars, _ := getProxyConfigEnvVars(obj, proxyEnv) // ignore errors for now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, what do you mean by "for now"? 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid question, given that the comment had been there for over a year now.

I've added an explicit log plus an explanation as to why we're not bubbling up the error.

@ghost
Copy link

ghost commented Sep 7, 2022

Images are ready for the commit at 6eca9b5.

To use with deploy scripts, first export MAIN_IMAGE_TAG=3.72.x-493-g6eca9b54de.

Comment on lines 58 to 59
// While this log can be spammy (emitted on every reconciliation), an error here is extremely unlikely
// and thus we deem this acceptable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree. I'm worried about the conversion to unstructured failing for some obscure reason. My experience with the kubernetes types/converstions voodoo is that it is notorious for causing failures at a distance - you forget to register some scheme somewhere, and another part of your program blows up with mysterious message :-(
Ignoring an error here, causing part of the functionality to disappear, would be even worse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but corev1.EnvVar is a fairly simple structure that is also unlikely to change significantly in the future. I find it unlikely that something goes wrong at runtime that is not detected in the unit test (*) (because the configuration surface error exposed to the user is even smaller). If this breaks, something must be really weird, and I don't know if just disrupting the entire functionality of the operator is a better choice, especially given that there exists an escape hatch of specifying proxy env vars via spec.customize.

However, I do admit that given that we have a separate status condition for proxy configuration, it would have been better to surface any errors there rather than just in the logs. But I would still say that this is unlikely to be consequential.

(*) Yes, I realize(d after typing the above) that we don't actually check for an error in the unstructured conversion, but we do check that values are correctly piped through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the status would require a helm operator library change, unfortunately..

@misberner misberner force-pushed the mi/rox-12487-operator-proxy-env-vars branch from b7b8ad9 to c61eb96 Compare November 4, 2022 00:35
@misberner
Copy link
Contributor Author

@porridge I'd like to get this merged, as I think it is better to have this change than not have this change. As per your concern regarding just logging the error, I would still say that this is better than breaking reconciliation entirely - because if there's some unexpected error, I don't know what a user could do to get things back to a working state, whereas manually fixing proxy config by setting env vars in the CR is doable. LMK if you agree.

@openshift-ci
Copy link

openshift-ci bot commented Nov 4, 2022

@misberner: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-ui-e2e-tests b7b8ad9 link false /test gke-ui-e2e-tests
ci/prow/gke-upgrade-tests b7b8ad9 link false /test gke-upgrade-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@porridge porridge mentioned this pull request Nov 4, 2022
5 tasks
@porridge
Copy link
Contributor

porridge commented Nov 4, 2022

@porridge I'd like to get this merged, as I think it is better to have this change than not have this change. As per your concern regarding just logging the error, I would still say that this is better than breaking reconciliation entirely - because if there's some unexpected error, I don't know what a user could do to get things back to a working state, whereas manually fixing proxy config by setting env vars in the CR is doable. LMK if you agree.

I think the errors that ToUnstructured could return are of the "should never happen" kind, i.e. something is very broken, and we should not ship an operator in that state.

However, how about this approach?

* Fix a call site.

* Remove the need to return an error.
@misberner
Copy link
Contributor Author

However, how about this approach?

Yeah, that should work 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants