-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chore] Fix E2E autoscale test for OpenShift #1365
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until when do we support hpav1
?
@iblancasa while we support K8s 1.23, if I'm not wrong |
@@ -0,0 +1,7 @@ | |||
apiVersion: kuttl.dev/v1beta1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? The
name: tracegen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is: depending on the cluster where you are running the test, the previous duration
parameter provided to tracegen
can be not enough to trigger the HPA and scale the collector.
With this approach, we ran tracegen
until the number of replicas is increased. Later, we stop tracegen
to reduce the metrics and make the HPA to scale down.
From the issue
@iblancasa do we know the root cause why only |
I didn't research too much but I think it can be related to this comment in # TODO: these tests use .Spec.MaxReplicas and .Spec.MinReplicas. These fields are
# deprecated and moved to .Spec.Autoscaler. Fine to use these fields to test that old CRD is
# still supported but should eventually be updated. If you want, I can create a new issue for that. |
"k8s.io/client-go/util/homedir" | ||
) | ||
|
||
func main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add \n
as a last character to all Printf
statements
pollInterval := time.Second | ||
|
||
// Search in v2 and v1 for an HPA with the given name | ||
err = wait.Poll(pollInterval, 0, func() (done bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not sure why actually v1 and v2 HPAs are created in a test.
My understanding is that only a single HPA version should be used in a given cluster. Could you please explain why both are created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't write the test. I'm just trying to make it work on OpenShift. But this is what I found while checking for the purpose of this E2E test:
- The test creates 2
OpenTelemetryCollector
instances to test 2 ways of creating HPAs. From the00-install.yaml
file:
# This creates two different deployments:
# * The first one will be used to see if we scale properly
# * The second is to check the targetCPUUtilization option
- This creates 2 HPAs (I wait for their creation and metrics reporting in steps 1 and 2)
- We start
tracegen
in step 3 and wait for one of theOpenTelemetryCollector
instances to scale up to 2 replicas - We remove the
tracegen
deployment to stop reporting traces in step 4 - Wait until the
OpenTelemetryCollector
scales down in step 5
When the HPAs are created, they will be created using autoscaling/v1
or autoscaling/v2beta2
. If you check how this test was written before my changes, you can see how in 00-assert.yaml
, the test tries to assert the simplest-collector
HPA with autoscaling/v1
and the simplest-set-utilization-collector
HPA with autoscaling/v2beta2
.
When I ran this in OpenShift 4.11, both of them were created using autoscaling/v2beta2
(as I pointed in this comment).
So, since in KUTTL there is no way to conditionally check for one resource or another, I created the wait-until-hpa-ready.go
script to (given a name) dynamically:
- Look for an HPA in the
autoscaling/v2beta2
API. If found, check if the HPA status is different from unknown - If the HPA was not found in
autoscaling/v2beta2
, look for it inautoscaling/v1
. If found, check if the HPA status is different from unknown
Another thing: why the HPAs are created using different autoscaling
API versions (as we can see in 00-assert.yaml
) in the Kubernetes versions tested during the CI? I think this is because one is setting the .spec.minReplicas
and .spec.maxReplicas
values and the other is setting them in .spec.autoscaler
(as the comment in 00-install.yaml
explains). If you want, I can do a deeper investigation about why this happens, but in a separate issue since is not related to the current PR.
@iblancasa could you please fix the CI? |
I broke something in my branch. Fixing... |
Signed-off-by: Israel Blancas <iblancasa@gmail.com>
Fixed! |
Signed-off-by: Israel Blancas <iblancasa@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this and book a ticket to make sure only a single HPA version is used by the operator for a given k8s version.
* Improve the reliability of the autoscale E2E test Signed-off-by: Israel Blancas <iblancasa@gmail.com> * Revert change Signed-off-by: Israel Blancas <iblancasa@gmail.com> --------- Signed-off-by: Israel Blancas <iblancasa@gmail.com>
* Improve the reliability of the autoscale E2E test Signed-off-by: Israel Blancas <iblancasa@gmail.com> * Revert change Signed-off-by: Israel Blancas <iblancasa@gmail.com> --------- Signed-off-by: Israel Blancas <iblancasa@gmail.com>
Signed-off-by: Israel Blancas iblancasa@gmail.com
Fixes #1364
Also, this PR makes the test more reliable.