-
Notifications
You must be signed in to change notification settings - Fork 733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement a Keptn metrics provider #1630
Conversation
docs/gitbook/usage/metrics.md
Outdated
query: | | ||
keptnmetric/my-namespace/response-time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query: | | |
keptnmetric/my-namespace/response-time | |
query: "keptnmetric/my-namespace/response-time" |
I would make this a one-liner to suggest you can't have more than one object reference in there. I would also add an example for a real Istio metric like Istio 404 rate https://docs.flagger.app/usage/metrics#prometheus
pkg/controller/controller.go
Outdated
@@ -18,6 +18,7 @@ package controller | |||
|
|||
import ( | |||
"fmt" | |||
"k8s.io/client-go/rest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please sort the imports into groups, run make fmt
202e636
to
7e567f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @bacherfl for this! i did a quick first pass, will take a second look soon.
pkg/metrics/providers/keptn.go
Outdated
|
||
// delete the created analysis at the end of the function | ||
defer func() { | ||
_ = k.client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of ignoring the error, we should log it.
} | ||
return &KeptnProvider{ | ||
client: client, | ||
analysisTimeout: 10 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has this been chosen randomly? how does this play with the Canary analysis's interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the maximum time the provider will wait for the result of a Keptn Analysis to be available - I added this to prevent the provider from blocking the flagger controller indefinitely in case the created Analysis cannot be completed.
I chose this value as this should be sufficient in most cases for the created Keptn Analysis to be reconciled by the Keptn Analysis controller, however I can also adjust this If you'd like, or if there are some guidelines on how long a provider might take to retrieve a value.
069ebd3
to
8b93127
Compare
Thanks for the review @aryan9600! I just addressed your comments and adapted the code accordingly - If there are some further questions feel free to reach out :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took a second pass, its looking good, just had some minor nits.
can the docs be improved a bit more? a few suggestions:
- the
query
field inMetricTemplate
example mentions the timeframe and arguments as well - make it clearer that resource name is actually that of the
AnalysisDefinition
and not the to be createdAnalysis
. - an example around
Analysis
, which shows a sampleAnalysisDefinition
and what theAnalysis
object generated by Flagger would look like
pkg/metrics/providers/keptn.go
Outdated
} | ||
|
||
var analysisResource = schema.GroupVersionResource{ | ||
Group: "metrics.keptn.sh", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets define a constant for this as well
pkg/metrics/providers/keptn.go
Outdated
} | ||
} | ||
} | ||
return 0, fmt.Errorf("could not retrieve KeptnMetric - no value found int resource %s/%s", queryObj.Namespace, queryObj.ResourceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return 0, fmt.Errorf("could not retrieve KeptnMetric - no value found int resource %s/%s", queryObj.Namespace, queryObj.ResourceName) | |
return 0, fmt.Errorf("could not retrieve KeptnMetric - no value found in resource %s/%s", queryObj.Namespace, queryObj.ResourceName) |
pkg/metrics/providers/keptn.go
Outdated
select { | ||
case <-ctx.Done(): | ||
return 0, fmt.Errorf("encountered timeout while waiting for Keptn Analysis %s/%s to be finished", obj.Namespace, obj.ResourceName) | ||
case <-time.After(500 * time.Millisecond): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case <-time.After(500 * time.Millisecond): | |
case <-time.After(time.Second): |
as mentioned in the above comment
pkg/metrics/providers/keptn_test.go
Outdated
require.NotNil(t, provider) | ||
|
||
isOnline, err := provider.IsOnline() | ||
require.Nil(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.Nil(t, err) | |
require.NoError(t, err) |
pkg/metrics/providers/keptn_test.go
Outdated
func TestNewKeptnProviderNoKubeConfig(t *testing.T) { | ||
provider, err := NewKeptnProvider(nil) | ||
|
||
require.NotNil(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.NotNil(t, err) | |
require.Error(t, err) |
pkg/metrics/providers/keptn_test.go
Outdated
require.True(t, isOnline) | ||
} | ||
|
||
func TestNewKeptnProviderNoKubeConfig(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func TestNewKeptnProviderNoKubeConfig(t *testing.T) { | |
func TestNewKeptnProvider_NoKubeConfig(t *testing.T) { |
pkg/metrics/providers/keptn_test.go
Outdated
require.Nil(t, provider) | ||
} | ||
|
||
func TestKeptnProvider_RunQueryKeptnMetric(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func TestKeptnProvider_RunQueryKeptnMetric(t *testing.T) { | |
func TestKeptnProvider_RunQuery_KeptnMetric(t *testing.T) { |
pkg/metrics/providers/keptn_test.go
Outdated
tests := []struct { | ||
name string | ||
setupClient func() *fake.FakeDynamicClient | ||
args args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just put query string
here?
pkg/metrics/providers/keptn_test.go
Outdated
tests := []struct { | ||
name string | ||
setupClient func() *fake.FakeDynamicClient | ||
// verificationFunc() will run in a separate go routing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// verificationFunc() will run in a separate go routing | |
// verificationFunc() will run in a separate go routine |
pkg/metrics/providers/keptn_test.go
Outdated
// verificationFunc() will run in a separate go routing | ||
// and check if the expected resources are created | ||
verificationFunc func(fakeClient *fake.FakeDynamicClient) error | ||
args args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
@aryan9600 could you please push your suggestions on top of this PR. I see that @bacherfl has enabled us to push commit. |
Hi @stefanprodan @aryan9600 sorry for the late reply - I had a busy schedule recently. I'm not sure if i can get to incorporate the suggested changes this week, but I may have some time to do so next week - otherwise, feel free to add directly to the PR, as @stefanprodan suggested |
c559cdf
to
29c8bf5
Compare
PR updated, the docs could still use a bit more info around |
@aryan9600 please sync the CRDs, the helm chart has the old one. |
Add a Keptn metrics provider for two resources: * KeptnMetric: Verify the value of a single metric. * Analysis (via AnalysisDefinition): Run a Keptn analysis over an interval validating SLOs. Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
29c8bf5
to
ce976e2
Compare
kuma e2e test failure is unrelated, happening because the tarball has moved to another location. i think we should not let it block this PR. we can fix and bump the kuma version later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @bacherfl and @aryan9600
Closes #1627
This PR adds the implementation of a
keptn
metrics providerThe main difference to other providers in Flagger is that this one is using a Kubernetes client to interact with Keptn resources (
KeptnMetrics
andAnalyses
). Therefore, in addition to the code inkeptn.go
, theClusterRole
needed to be adapted to be able to access the required Keptn resources.The provider supports the use of the following resources:
KeptnMetric
: This resource represents the current value of a certain metric. The metric is retrieved by Keptn from one of the following data sources:prometheus
,thanos
,cortex
,dynatrace
,datadog
.Analysis
: This Keptn resource can be used to apply a more complex grading logic over a set of different metrics. More information about this concept can be found in the Keptn Docs