Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose feature flags for collectors, diagnosers, exporters #52

Closed
wants to merge 14 commits into from

Conversation

davidkydd
Copy link
Collaborator

This allows selection of which collectors and diagnosers are enabled via changes to the deployment yaml only.

@davidkydd davidkydd requested a review from Tatsinnit May 6, 2021 03:23
Copy link
Member

@Tatsinnit Tatsinnit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Thank you David, I have added some basic suggestions/comments, configurability sounds good, it will have some impact on existing 2 PR. Mainly the comments is for code movement and possibly adding more comments.

Most important will be local test to make sure this works from the image. (Checklist covers that point)

Lets make sure we did these checks from checklist to make sure you have covered existing behaviours are: (before we close this PR)

✅ Make sure you locally run CI.
✅ Tip to locally test changes via image reside here: https://github.com/Azure/aks-periscope#programming-guide
* Above will be a very important scenario to test, and feel free to reach out to us for anything.

✅ If you find any new functionality could be easily extracted to UT (unit test) then feel enabled to do so.
✅ Once we have done all the above, we should have enough confidence to open PR for review.

(At least these are my initial thoughts) I will add more in case I forget anything.

Really appreciate it. Other folks will kindly add more. 🙏 cc: @JunSun17 + @arnaud-tincelin

cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
cmd/aks-periscope/aks-periscope.go Show resolved Hide resolved
cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
- support multiple exporters
- refactor main() to better encapsulate logic in sub-funcs
- add multierror package, currently only used when collector.export errors; to prevent one failing exporter preventing the others from running
- removed zip and export mode flag for now
@davidkydd davidkydd changed the title Expose a feature flag for collectors and diagnosers Expose feature flags for collectors, diagnosers, exporters May 7, 2021
@davidkydd
Copy link
Collaborator Author

Hi @Tatsinnit this is done and ready to merge IMO - sorry for rushing you but would like to get this into master ASAP so our partners in OSM and Arc team can start using some of the new functionality.

Manually tested against my single node test AKS cluster which has gitops and some other stuff on it.

In case you want to do manual testing yourself, build is pushed to my public acr here:
dakyddacr.azurecr.io/periscope/featureflag:v4

SAS Url for the outputs (earlier run 03-06 is from the new build, later run 03-24 is from current master): https://dakydddevauseast2.blob.core.windows.net/dakydd-test-eastus-dns-d0daedb9?sv=2019-12-12&st=2021-04-07T03%3A14%3A00Z&se=2022-05-08T03%3A14%3A00Z&sr=c&sp=rl&sig=3x%2FL0r%2FCXCf7YMunfiKTSIk5GlTasG9GAc0LLRI10i0%3D

Logs from the periscope pod are as attached, "dakydd" is the new build (can tell from slightly different log lines at end):
dakydd-PodLogs.txt
master-PodLogs.txt

@Tatsinnit
Copy link
Member

Tatsinnit commented May 7, 2021

Hiya, @davidkydd , I can take a look in few hours, as far as you have made sure to go through this check list #52 (review) (And things look good, I would take a quick peek and if nothing bad we can definitely go to master)

  • basically point 2: of the checklist is key aka - Manually tested with docker image? From kubect deploy -f deployment-file.yaml
  • Can you please add more in description as well along with what key areas these changes will affect?

I will take a look and approve this. 🙏 thank you!

@Tatsinnit
Copy link
Member

Tatsinnit commented May 7, 2021

Hiya, @davidkyd, thanks also I see/noticed you have made 19 files aka some more changes until which was in last review so please wait for quick eyes on this.

cc: @JunSun17 , @arnaud-tincelin and just in case if @Porges for eyes.

Thanks.

@davidkydd
Copy link
Collaborator Author

@Tatsinnit yes I have been through the checklist, changes are mostly just refactoring.

Yes I have manually tested by deploying to a test cluster - outputs of the test run vs. master are in the comment I posted above. Confident there are no regressions.

@davidkydd
Copy link
Collaborator Author

also note merging to master won't actually deploy or do anything other than update the code

@Tatsinnit
Copy link
Member

Tatsinnit commented May 7, 2021

Cool, @davidkydd sounds good, I will take a good look in few hours, thanks, cheers for making sure and testing it.

Feel free to add anything else in description of this PR, which will help me to understand any specific “exporter” change. 🙏

@davidkydd
Copy link
Collaborator Author

davidkydd commented May 7, 2021

Sounds good :) - ping me later on teams when you are around and I can try answer any outstanding Qs.

Exporter change is to support multiple exporters for a collector, simply iterates through them and calls export on each as it does currently for the single exporter, collecting any errors in a multierror as it iterates and then returning them. Also lets exporters be configured (enabled / disabled) via the yaml config.

Otherwise its just refactoring of the main func.

Should also mention I tested the version of the yaml which will end up in master runs fine - this includes the new configMaps but targets the current v0.3 version of the containerImage. Bit backwards as strictly we should push the new containerImage first and include the new image tag in the same commit as the other changes to the config, but this way makes things a bit easier for partners.

Copy link
Member

@Tatsinnit Tatsinnit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 looks good!! Thanks. it looks close to approve but just noticed 2 things, which need your eyes: I've spend last hour and half to identify the change and test in various way, but seems like with image: dakyddacr.azurecr.io/periscope/featureflag:v4.

I manually tested and also tested from local file hooked to vscode.

Since the impact of this will be multiple tools which consumes this I have added all the test what I have done below. like vscode and cli.

2 things are:

  • New changes in Yaml will have delete failure or config-maps: Delete -f *.yaml - in vscode we use the namesake delete so its safe, and in cli we use that as well, I wonder do we have any other tool where this will enable exitCode == non-zero.

  • For some reason, when I clear the namespace using kubectl delete -f aks-periscope I get stuck every time to Teminating state and I had to do manual termination. (by using the kubectl proxy with json data)

Flagging it to be sure that you don't see this behaviour at your end. screenshot below for the error.

Delete of Kubectl will fail with error of new config-maps as well:

Screen Shot 2021-05-07 at 7 00 03 PM

Only at one occasion I was able to get this error but rest of the time it worked so I believe it must be the azure-sdk or storage.

Screen Shot 2021-05-07 at 8 02 17 PM


2021/05/07 08:00:39 Collector: containerlogs, export data
2021/05/07 08:00:40 Collector: containerlogs, export data failed: 1 error occurred:
	* Failed to create blob with storage error: -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /app/vendor/github.com/Azure/azure-storage-blob-go/azblob/zc_storage_error.go:42 

Possible Solution: (to move forward)

  • If you are happy/agree/ready for urgent revert in case we found any depending tool failure then we can squash merge this as long as you are happy.

Otherwise, I would recommend exploring the above 2 errors and make sure they will not hinder any tool etc. Open for ideas.

Thanks heaps. 🙏

cmd/aks-periscope/aks-periscope.go Show resolved Hide resolved
pkg/collector/collector.go Show resolved Hide resolved
cmd/aks-periscope/aks-periscope.go Show resolved Hide resolved
cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
@davidkydd
Copy link
Collaborator Author

Many thanks @Tatsinnit - yeah I admit the errors from additional configmaps on delete are a bit ugly - vs code runs "delete ns aks-periscope" and CLI has --ignore-not-found set so shouldn't show up in either - perhaps older versions might not handle though..?

Haven't seen either the storage error or delete hanging in Terminating you mention but they do give me pause - suggest holding off for now then until we can try repro. Thanks again bud!

@Tatsinnit
Copy link
Member

Tatsinnit commented May 7, 2021

Anytime @davidkydd, no worries and sounds good: pause it is. 👍 thanks. Sounds worth to just double check in another machine for sure. In case you have same issue, I did this manual work to terminate the namespace :

thanks 🙏

sophsoph321 pushed a commit to sophsoph321/aks-periscope that referenced this pull request May 7, 2021
# This is the 1st commit message:

Revert "introduce arcmode"

This reverts commit 5f4fed4.

# This is the commit message #2:

remove secrets

# This is the commit message Azure#3:

add print statement

# This is the commit message Azure#4:

update print statement

# This is the commit message Azure#5:

committed

# This is the commit message Azure#6:

committed

# This is the commit message Azure#7:

committed

# This is the commit message Azure#8:

committed

# This is the commit message Azure#9:

remove print statements

# This is the commit message Azure#10:

add helm collector

# This is the commit message Azure#11:

change helm command

# This is the commit message Azure#12:

add helm 3 installation

# This is the commit message Azure#13:

add curl command installation

# This is the commit message Azure#14:

change helm command

# This is the commit message Azure#15:

remove helm history

# This is the commit message Azure#16:

debug helm history

# This is the commit message Azure#17:

add repo command

# This is the commit message Azure#18:

change stable repo name

# This is the commit message Azure#19:

add write to file

# This is the commit message Azure#20:

add kured

# This is the commit message Azure#21:

change

# This is the commit message Azure#22:

changes

# This is the commit message Azure#23:

add default namespace

# This is the commit message Azure#24:

change

# This is the commit message Azure#25:

add integration test

# This is the commit message Azure#26:

changes

# This is the commit message Azure#27:

add helm test

# This is the commit message Azure#28:

change print statement to error

# This is the commit message Azure#29:

change

# This is the commit message Azure#30:

more changes

# This is the commit message Azure#31:

add go installation

# This is the commit message Azure#32:

fix unit test

# This is the commit message Azure#33:

iptables to Helm

# This is the commit message Azure#34:

add custom resource collector

# This is the commit message Azure#35:

add new exporter, diagnoser, collector

# This is the commit message Azure#36:

comment unused variables

# This is the commit message Azure#37:

debug exporter

# This is the commit message Azure#38:

filenames

# This is the commit message Azure#39:

test zip function

# This is the commit message Azure#40:

list files

# This is the commit message Azure#41:

fmt to log

# This is the commit message Azure#42:

delete lines

# This is the commit message Azure#43:

changed

# This is the commit message Azure#44:

get current directory

# This is the commit message Azure#45:

remove some print statements

# This is the commit message Azure#46:

test zip

# This is the commit message Azure#47:

changes

# This is the commit message Azure#48:

add windir check

# This is the commit message Azure#49:

minor fix

# This is the commit message Azure#50:

get hostname

# This is the commit message Azure#51:

add expose in dockerfile

# This is the commit message Azure#52:

add exec collector

# This is the commit message Azure#53:

mitigate exit code 126

# This is the commit message Azure#54:

change curl url from example.com to dp endpoint

# This is the commit message Azure#55:

changes

# This is the commit message Azure#56:

uncomment exec

# This is the commit message Azure#57:

add new diagnoser

# This is the commit message Azure#58:

debugging

# This is the commit message Azure#59:

debug

# This is the commit message Azure#60:

debugging

# This is the commit message Azure#61:

remove print statements

# This is the commit message Azure#62:

remove print

# This is the commit message Azure#63:

add back crd print statement

# This is the commit message Azure#64:

change

# This is the commit message Azure#65:

change

# This is the commit message Azure#66:

update dataPoint name

# This is the commit message Azure#67:

modify forloop

# This is the commit message Azure#68:

add filename to datapoint

# This is the commit message Azure#69:

add back log prints

# This is the commit message Azure#70:

test

# This is the commit message Azure#71:

add fields to diagnostic signal

# This is the commit message Azure#72:

add config content to diagnoser

# This is the commit message Azure#73:

change format from yaml to json

# This is the commit message Azure#74:

add parameters for kubeobject config map

# This is the commit message Azure#75:

Revert "introduce arcmode"

This reverts commit 5f4fed4.

# This is the commit message Azure#76:

fix helm collector style

# This is the commit message Azure#77:

revert changes that test arc customizations

# This is the commit message Azure#78:

fix merge conflicts

# This is the commit message Azure#79:

fix merge conflicts

# This is the commit message Azure#80:

Revert "Add v0.3 acr image for Private cluster fix. (Azure#22)"

This reverts commit 49dd302.

# This is the commit message Azure#81:

fix merge conflicts

# This is the commit message Azure#82:

fix merge conflicts

# This is the commit message Azure#83:

add print statement

# This is the commit message Azure#84:

update print statement

# This is the commit message Azure#85:

committed

# This is the commit message Azure#86:

committed

# This is the commit message Azure#87:

committed

# This is the commit message Azure#88:

committed

# This is the commit message Azure#89:

remove print statements

# This is the commit message Azure#90:

fix merge conflicts

# This is the commit message Azure#91:

fix merge conflicts

# This is the commit message Azure#92:

fix merge conflicts

# This is the commit message Azure#93:

add repo command

# This is the commit message Azure#94:

change stable repo name

# This is the commit message Azure#95:

add write to file

# This is the commit message Azure#96:

add kured

# This is the commit message Azure#97:

change

# This is the commit message Azure#98:

changes

# This is the commit message Azure#99:

add default namespace

# This is the commit message Azure#100:

change

# This is the commit message Azure#101:

add integration test

# This is the commit message Azure#102:

changes

# This is the commit message Azure#103:

add helm test

# This is the commit message Azure#104:

change print statement to error

# This is the commit message Azure#105:

change

# This is the commit message Azure#106:

more changes

# This is the commit message Azure#107:

add go installation

# This is the commit message Azure#108:

fix unit test

# This is the commit message Azure#109:

add custom resource collector

# This is the commit message Azure#110:

fix merge conflicts

# This is the commit message Azure#111:

comment unused variables

# This is the commit message Azure#112:

debug exporter

# This is the commit message Azure#113:

filenames

# This is the commit message Azure#114:

test zip function

# This is the commit message Azure#115:

list files

# This is the commit message Azure#116:

fmt to log

# This is the commit message Azure#117:

delete lines

# This is the commit message Azure#118:

changed

# This is the commit message Azure#119:

get current directory

# This is the commit message Azure#120:

remove some print statements

# This is the commit message Azure#121:

test zip

# This is the commit message Azure#122:

changes

# This is the commit message Azure#123:

add windir check

# This is the commit message Azure#124:

minor fix

# This is the commit message Azure#125:

get hostname

# This is the commit message Azure#126:

add expose in dockerfile

# This is the commit message Azure#127:

add exec collector
davidkydd and others added 4 commits May 8, 2021 09:28
…p level deployment yaml - to avoid changing the one used by downstream tools (vscode / cli)
(cherry picked from commit 910ed97)
@@ -0,0 +1,192 @@
apiVersion: v1
Copy link
Member

@Tatsinnit Tatsinnit May 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 We really need to think more in depth about this please. This should be thought well with specific concrete scenarios and some kind of design doc as to advantages and disadvantages. We are duplicating deployment files and this is kind of a design thought, we/I need to think more and not making this part of a refactor. This should be more broader discussed before we go ahead and duplicate deployment files for small use-cases.

Pros and Cons, with clear advantage will give much better direction, because all these deployment files will still need to handled from single code logical flow, so under the hood all this will go through same logic flow, hence 1 deployment file like it is makes more sense.

📓 Rest looks good I liked the very initial refactor. we can try scoping the configMap issue which we can see, and if there is a way not to get this delete error. I mean if its just for delete yaml and we make it sure that its 100% safe operation, we could move forward. But creating separate yaml will introduce bigger design challenge for tools which consume this.

Possibly exploring “ --ignore-not-found” to be added to the delete command for yaml, possibly update appendix.md. Surely making good test to check rest all scenarios work, and going back to the old one deployment file seems like good solution, if it’s same image I could do some more test tonight. Thanks 🙏

Copy link
Collaborator

@arnaud-tincelin arnaud-tincelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are many things in this PR : features, helm chart, interfaces modifications. Could we split this PR or at lease re-arrange the commits so it is easier to read?

@@ -0,0 +1,8 @@
apiVersion: v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use kustomize instead of a helm release. We can discuss this but I tend to avoid helm releases

Copy link
Collaborator Author

@davidkydd davidkydd May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for confusion - should have cut a new WIP branch and left this PR branch as is before pulling in the helm changes. Consider this PR on hold until we can discuss

@@ -0,0 +1,192 @@
apiVersion: v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference with charts/azure-k8s-periscope/templates/azure-k8s-periscope.yaml ? Why 2 templates ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should have been removed from the PR - the chart template would have been replacing it

@davidkydd davidkydd closed this May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants