-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Jaeger CR #193
Update Jaeger CR #193
Conversation
This PR is an early draft, meant only to get feedback about the direction. The main changes here are:
Still missing:
Things to decide:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good approach - only comment is that it seems like any deployments no longer required will be deleted at the same time others are being created/updated. Would it be better to defer the deletions until the create/updates are ready?
Good point. This validates then that going for concrete types ( |
98773cc
to
91c9df4
Compare
I did some manual tests with this PR in its current state and I believe it's already at a state where we can consider merging, even if updating other components are missing (services, config maps, ...) Here's what I did for testing:
|
Codecov Report
@@ Coverage Diff @@
## master #193 +/- ##
==========================================
- Coverage 95.18% 90.35% -4.83%
==========================================
Files 35 59 +24
Lines 1848 2645 +797
==========================================
+ Hits 1759 2390 +631
- Misses 75 164 +89
- Partials 14 91 +77
Continue to review full report at Codecov.
|
@jpkrohling Looks promising, although main concern is that the new deployment is not confirmed as available before the old is removed - so there will be a time period where no traces are collected. If we could solve this problem, then atleast if the user is using the same backend storage, it should be able to cope with continously collecting spans through the transition. It also means that if the new deployment was to fail, we can abort and still have the old deployment available and continuing to collect. One option is to deal with this scenario as a separate PR, which would be fine - as long as it was addressed soon. |
Right, I think we discussed that but I missed it. I'll implement this next, still in this PR. |
bf746f5
to
32671d8
Compare
Marking as WIP, as there seems to be a problem with the unit test. |
@objectiser this is now ready to be reviewed. |
@jpkrohling Tested by deploying
UPDATE: Ignore, this was using the wrong image. |
4cb9cd8
to
5e260e1
Compare
As talked on IRC, this PR is missing the basic functionality of creating/updating non-deployment objects :-) I'm updating it with the other parts, but wanted to leave it for a short review while I fix a bug with config maps. The second commit from this PR contains the changes to support other non-deployment objects (missing a few ones, WIP). We talked about having a generic vs. concrete approach and decided in favor of the concrete approach. The second PR shows that there's almost a 1-to-1 code duplication for the other resources, but I think the code is simple enough to justify this duplication. Making this code more generic would make things less flexible and would increase complexity, as there's no common interface between all the objects other than Even though I believe the way it's done in this PR is the right choice, I'd like to get your opinion before moving forward. |
The third commit in this PR shows what's the required code for individual object types (ingresses, in the example). |
c42cbd7
to
e427561
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the separate types for now - possibly some of the more straightforward types could be handled in a generic way eventually.
8f3e828
to
764f6d0
Compare
@pavolloffay : could you please review this one as @objectiser is away? |
sure I will have a look. I would like to first fix the e2e tests so ES can be merged. |
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
444a1a6
to
9063595
Compare
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
I fixed the OpenShift problems (v3.11), but there's still one point open that might be a bug in the operator SDK. When running in Kubernetes, the reconcile loop is being called only when the CR is changed, while in OpenShift it's being called every second. Not sure yet why, but I opened a thread in the operator-framework mailing list addressing that. I also addressed the comments from the last review, except one (left a question on the item itself). |
It would be good to add a link to operator ML to this thread. Do you want me to re-review? |
https://groups.google.com/d/msg/operator-framework/DZynMacOgoY/RWaSFxFUBwAJ I'll track this in a separate issue so that it won't block this PR (#203 is probably related)
Yes, please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests are a bit shallow. They do tests change in annotations - basically ObjectMeta
which fine but they should more importantly tests if core fields of objects get updated.
It seems that my previous comments were not fixed, there is repeated logging and missing documentation for the update - how to update and what is supported.
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
running on minishift produces following errors. THe serviceaccount error repeates over time.
|
Are you running on a clean OpenShift, or could this be something left behind from previous iterations? I do get the service account one from time to time, but should be relatively innocuous, once #231 is fixed. |
It's clean OpenShift. The #231 causes the error to pop up more ofter, maybe there is something wrong with SA bc other objects are not logged with the same modified error. |
The problem is that |
The resource version field is the reason we update individual fields :) We are not supposed to touch that one: https://groups.google.com/forum/#!topic/operator-framework/KqgrZ3v2P9U |
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
I added a few more tests for fields within the |
thanks for the pointer. There is definitelly something fishy with SA or secrets. I have noticed that every loop creates new secrets.
|
That's a great hint! I did not notice this before, let me see if I can figure out what's wrong. |
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
So, I tracked the problem down to the reconcile logic, where we store back the CR as we might have done some reconciliation. Once that is done, I don't see #231 anymore and the number of service accounts becomes stable. I'll track that in a different issue, to unblock this PR. |
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
e2e tests are passing, will merge this:
|
} | ||
objs, err := es.CreateElasticsearchObjects(cDep.Spec.Template.Spec.ServiceAccountName, queryDep.Spec.Template.Spec.ServiceAccountName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpkrohling this PR broke Elasticsearch functionality, note that es.CreateElasticsearchObjects
was returning ES CR in the object array. Now the ES CR is not being processed.
Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de