Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - support for KubernetesInDocker (kind) clusters #75

Closed
wants to merge 60 commits into from

Conversation

davidkydd
Copy link
Collaborator

@davidkydd davidkydd commented Jun 28, 2021

Reverted previous premature merge (original PR here: #69 )

test image available @ dakyddacr.azurecr.io/periscope/kind:v25

davidkydd and others added 30 commits April 22, 2021 14:01
- support multiple exporters
- refactor main() to better encapsulate logic in sub-funcs
- add multierror package, currently only used when collector.export errors; to prevent one failing exporter preventing the others from running
- removed zip and export mode flag for now
…p level deployment yaml - to avoid changing the one used by downstream tools (vscode / cli)
(cherry picked from commit 910ed97)
add default .gitattributes file and git add --renormalize
# Conflicts:
#	cmd/aks-periscope/aks-periscope.go
#	go.mod
#	go.sum
#	pkg/collector/collector.go
#	pkg/exporter/azureblob_exporter.go
#	pkg/utils/helper.go
…LECTOR_LIST. New temp deployment config for kind, to be soon replaced with Kustomize version
Copy link
Collaborator

@arnaud-tincelin arnaud-tincelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I'm lost in this PR :(

zipAndExport = true
}

if zipAndExport {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is about Kind cluster. Why all these changes in this file? What are we doing here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent refactor to use more smaller funcs with clearly defined purpose in main execution loop.
ZipAndExport step made configurable also

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have 3 PR:

  • configuration of exporters + mechanism to run (or not) collectors / diagnosers
  • PR for zip (not sure what we need to do here)
  • PR to support kind cluster: if others PR are done, this one shall be very small

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZipAndExport step shall not exist. We will remove it asap but I know other tools such as vscode are using it.
This zip step is a duplicate of the single exports per collector / diagnoser

var enabledCollectorNames []string

//TODO try get partners to move from COLLECTOR_LIST to use ENABLED_COLLECTORS instead, for now COLLECTOR_LIST takes precedence if defined
collectorList, found := os.LookupEnv("COLLECTOR_LIST")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just an env variable, we can drop it safely this won't break anything

}

// selectedExporters select the exporters to run
func selectExporters(allExporters map[string]interfaces.Exporter) []interfaces.Exporter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of the PR scope

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of earlier refactor to make which components are run configurable, exporters done in same consistent pattern as collectors and diagnosers. Intended to support Local Machine Exporter scenario also

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand but right now the PR is hard to review and you have to deal with our numerous comments.
I think exporters shall not be treated the same way as collectors and diagnosers: exporters shall be configured by the user (export to blob, to disk, ...) but as we already discussed, I believe collectors and diagnosers should be 'hidden' from the users.

May be before a PR to support Kind cluster we should address these points in a dedicated PR ?

@@ -31,6 +31,12 @@ spec:
name: kubeobjects-config
- configMapRef:
name: nodelogs-config
- configMapRef:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the link with Kind cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ability to define which collectors to run in the yaml is intended as a replacement for the COLLECTOR_LIST env var which seems less obvious in what it will do. Diagnosers and Exporters done in same way for consistency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessity to filter which collectors are run was required when adding support for kind clusters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean - its possible we could instead refactor all the collectors to be resilient to any possible errors across any possible environments as you have mentioned previously.

However the ability to define which collectors / diagnosers / exporters to run is still a valuable feature outside of this immediate requirement, and it is easier to do this now than to refactor all the components as described above.


//GetAllContainerLogFilesThatHaveEverRunOnHost gets the list of log files for all containers that have ever run on the host
func (collector *ContainerLogsCollectorContainerD) GetAllContainerLogFilesThatHaveEverRunOnHost() ([]string, error) {
output, err := utils.RunCommandOnHost("ls", containerLogDirectory)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the idea is to use adequate libraries : for kubectl use https://github.com/kubernetes/client-go, for helm use https://github.com/helm/helm/blob/main/pkg/kube/client.go, to access files we can mount volumes and use io / ioutils libraries, etc ...
I'm not saying to remove RunCmdOnHost now but aks-periscope is not aimed to be a wraper to call cli tools. We have libraries to do it properly so let's do it :)
Leave old code as is but for new code, let's use the libraries, especially regarding files

@@ -32,48 +33,8 @@ func NewAzureBlobExporter(creationTime, hostname string) *AzureBlobExporter {
}
}

func createContainerURL() (azblob.ContainerURL, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the exporter changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as below, we needed to make the storage container name generation logic more resilient for additional environments, examples can be seen in tests. Also made exporter consistent with collector / diagnoser pattern and exposed it in config similarly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, cool. I'd suggest to move this whole thing to another PR


// TODO: Hack: for now AKS-Periscope is running as a deamonset so it shall not stop (or the pod will be restarted)
// Revert from https://github.com/Azure/aks-periscope/blob/b98d66a238e942158ef2628a9315b58937ff9c8f/cmd/aks-periscope/aks-periscope.go#L70
select {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡side note: See the comment above, this is needed at its current state. Unless something is changed which fixes this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still here - just moved to end of main.go function (line 83) which is better place for it as it is more obvious that it is (and needs to be) the final command in the program. In main() it is less likely to be refactored to a position where it ends up running NOT at the very end of the program, which would introduce a bug

}

func contains(flagsList []string, flag string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡Key: Not sure what are these changes, and what was the intention of removing this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"contains" was moved to helpers.go which is a more natural home

lines := strings.Split(output, "\n")
for _, line := range lines {
index := strings.Index(line, "server: ")
if index >= 0 {
fqdn := line[index+len("server: "):]
fqdn = strings.Replace(fqdn, "https://", "", -1)
Copy link
Member

@Tatsinnit Tatsinnit Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Why is this removed? Test for private cluster, this was specific for that. (I think)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with equivalent but more robust code in lines 116-126 which parses fqdn urlString to a url type, and then splits off the host to return. Test cases added to helper_test

@@ -312,6 +360,16 @@ func GetResourceList(kubeCmds []string, separator string) ([]string, error) {
return strings.Split(strings.Trim(resourceList, "\""), separator), nil
}

//contains checks if an array contains a value
func Contains(flagsList []string, flag string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Not able to follow this moving around of code because previously this funct was privately used in the main aks-perisceop.go file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its been moved to helpers.go unchanged and any references updated. There are tonnes of methods in helpers.go which could be considered private methods of aks-periscope.go, determination is based on what the method does - this is definitely a utils kind of helper method so doesn't need to stay in aks-periscope.go

Copy link
Member

@Tatsinnit Tatsinnit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Thanks for this,I will add bit more latter. I am not able to follow changes in this PR because it seems this PR is more then adding Kind support (as title suggest) and some of the code movement around is something I am still going through and understanding.

cc: @arnaud-tincelin, - for more thoughts on other changes.

🙏 Thank you

Copy link
Collaborator Author

@davidkydd davidkydd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I'm lost in this PR :(

It might be easier to review just the raw code itself outside of diff form by checking out the branch the PR is on.

I usually find this easier with larger PR's which change lots of bits of files due to refactoring, e.g. I had to do this recently for your PR #65

zipAndExport = true
}

if zipAndExport {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent refactor to use more smaller funcs with clearly defined purpose in main execution loop.
ZipAndExport step made configurable also

}

// selectedExporters select the exporters to run
func selectExporters(allExporters map[string]interfaces.Exporter) []interfaces.Exporter {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of earlier refactor to make which components are run configurable, exporters done in same consistent pattern as collectors and diagnosers. Intended to support Local Machine Exporter scenario also

@@ -31,6 +31,12 @@ spec:
name: kubeobjects-config
- configMapRef:
name: nodelogs-config
- configMapRef:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ability to define which collectors to run in the yaml is intended as a replacement for the COLLECTOR_LIST env var which seems less obvious in what it will do. Diagnosers and Exporters done in same way for consistency

@@ -31,6 +31,12 @@ spec:
name: kubeobjects-config
- configMapRef:
name: nodelogs-config
- configMapRef:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessity to filter which collectors are run was required when adding support for kind clusters


// TODO: Hack: for now AKS-Periscope is running as a deamonset so it shall not stop (or the pod will be restarted)
// Revert from https://github.com/Azure/aks-periscope/blob/b98d66a238e942158ef2628a9315b58937ff9c8f/cmd/aks-periscope/aks-periscope.go#L70
select {}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still here - just moved to end of main.go function (line 83) which is better place for it as it is more obvious that it is (and needs to be) the final command in the program. In main() it is less likely to be refactored to a position where it ends up running NOT at the very end of the program, which would introduce a bug

Comment on lines -41 to -46
containerName := strings.Replace(APIServerFQDN, ".", "-", -1)
containerLen := strings.Index(containerName, "-hcp-")
if containerLen == -1 {
containerLen = maxContainerNameLength
}
containerName = strings.TrimRight(containerName[:containerLen], "-")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only section of the original code which is changed, extracted into new function getStorageContainerName

@@ -32,48 +33,8 @@ func NewAzureBlobExporter(creationTime, hostname string) *AzureBlobExporter {
}
}

func createContainerURL() (azblob.ContainerURL, error) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as below, we needed to make the storage container name generation logic more resilient for additional environments, examples can be seen in tests. Also made exporter consistent with collector / diagnoser pattern and exposed it in config similarly.

pkg/utils/helper.go Outdated Show resolved Hide resolved
lines := strings.Split(output, "\n")
for _, line := range lines {
index := strings.Index(line, "server: ")
if index >= 0 {
fqdn := line[index+len("server: "):]
fqdn = strings.Replace(fqdn, "https://", "", -1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with equivalent but more robust code in lines 116-126 which parses fqdn urlString to a url type, and then splits off the host to return. Test cases added to helper_test

@@ -312,6 +360,16 @@ func GetResourceList(kubeCmds []string, separator string) ([]string, error) {
return strings.Split(strings.Trim(resourceList, "\""), separator), nil
}

//contains checks if an array contains a value
func Contains(flagsList []string, flag string) bool {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its been moved to helpers.go unchanged and any references updated. There are tonnes of methods in helpers.go which could be considered private methods of aks-periscope.go, determination is based on what the method does - this is definitely a utils kind of helper method so doesn't need to stay in aks-periscope.go

@Tatsinnit
Copy link
Member

Tatsinnit commented Jul 6, 2021

💡 Important note: Kubernetes in docker is deprecated now, is the key purpose of this PR will be affected by this?

Thanks,

@davidkydd
Copy link
Collaborator Author

💡 Important note: Kubernetes in docker is deprecated now, is the key purpose of this PR will be affected by this?

Thanks,

Hey, no this PR isn't affected - also its not quite right to say that "Kubernetes in Docker is being deprecated", as its the dockershim container runtime which is being deprecated and replaced with containerd. KIND certainly isn't being deprecated (see banner at the tope of https://kind.sigs.k8s.io/).

As this PR adds a containerd based containerLog collector you could say this PR is even MORE applicable considering this deprecation 😄

Copy link
Member

@Tatsinnit Tatsinnit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Thank you for this, here are some final observations:

  • This PR is not just Specific k8sindocker support but 3 other things, i.e. containerd, some local machine support along with code move around hence it's little confusing to follow the flow of logic.

    • If we divide this into smaller feature based PR we might be able to understand better.
      • Especially, LocalMachine mechanism, and specifics of k8sindocker support.
  • Might be split into smaller PR: I agree with initial comments of @arnaud-tincelin that it seems like this PR is not just Support for some specific Kind but became more then 3 features in 1. It will be much easy to spilt into smaller more chunks for better understanding. Unless it cannot be separated.

  • For tests I would recommend Gomega which is already used within this project and much better assertion notation.

  • For localmachine work: What is it trying to achieve? how are you defining where to store logs in user machine?

I will defer to @arnaud-tincelin for further comments, suggestions or recommendation.

I guess with test published image, testing all scenarios will be key as usual.

Thank you. 🙏


// GetStorageContainerName get storage container name
func TestParseAPIServerFQDNFromKubeConfig(t *testing.T) {
for _, tt := range parseAPIServerFQDNFromKubeConfigTests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 what is tt acronym for?

output, err := RunCommandOnHost("cat", "/var/lib/kubelet/kubeconfig")
if err != nil {
result = multierror.Append(result, err)
output, err = RunCommandOnHost("cat", "/etc/kubernetes/kubelet.conf")
Copy link
Member

@Tatsinnit Tatsinnit Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 I would recommend commenting the reason of this config use: /etc/kubernetes/kubelet.conf, especially what specific scenario is fallback consist. (Comment to clarify that this not for the support for wider kind cluster behaviour like GKE, AWS et. al. its specific to fall back 1 scenario).

just some thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tatsinnit, the reason for using /etc/kubernetes/kubelet.conf is to override the hardcoded value /var/lib/kubelet/kubeconfig. For kind and kubeadm distros, we will get a "can't read kubeconfig file" if we try to get the APIServerFQDN from the filepath /var/lib/kubelet/kubeconfig.

Copy link
Member

@Tatsinnit Tatsinnit Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, @sophsoph321, thanks, Indeed, that is why as recommended, I think it is important to leave a comment i.e. add a code level comment for this: commenting the reason within file this config is use, especially when specific code if for k8sindocker / (KubeKDM) kind it is not every kind Kubernetes cluster?

For anyone to look at the code it will make it clear that what all this change was specifically for.

output, err = RunCommandOnHost("cat", "/etc/kubernetes/kubelet.conf")
if err != nil {
result = multierror.Append(result, err)
return "", fmt.Errorf("open kubeconfig file at /etc/kubernetes/kubelet.conf or /var/lib/kubelet/kubeconfig\": %+v", result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 if you extract these 2 config with more apt names, it will be easy to avoid repetition. /etc/kubernetes/kubelet.conf and /var/lib/kubelet/kubeconfig

  • also why do we need escape here: \":?

t.Run(tt.APIServerFQDN, func(t *testing.T) {
APIServerFQDN, err := ParseAPIServerFQDNFromKubeConfig(tt.kubeConfig)
if err != nil {
t.Errorf("utils.TestParseAPIServerFQDNFromKubeConfig(%q) Error: %q, expected %q",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡Please use Gomega, which is part of this project, which will help in built in verbs to test for expect, Not, HaveOccurred etc. more here: https://onsi.github.io/gomega/

@@ -0,0 +1,69 @@
package utils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Is this intentional to only test 1 function, why not other moved functs which are part of this PR?

containerName, _ := getStorageContainerName(tt.apiServerFqdn)

if containerName != tt.containerName {
t.Errorf("TestGetStorageContainerName(%q) => %q, want %q",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡recommendation: Please use Gomega. (its already part of this project)

@@ -0,0 +1,153 @@
package collector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Can we please keep same naming consistency pattern like other files: containerlogs_and_containerd_collector.go

@@ -0,0 +1,69 @@
package utils

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Not able to see private cluster test in helper_test as mentioned here: https://github.com/Azure/aks-periscope/pull/75/files#r662953574

Copy link
Collaborator

@arnaud-tincelin arnaud-tincelin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should split this PR as it's taking too long to review, Tatsat and I are getting lost about the reasons behind several changes and we disagree on some points


//GetAllContainerLogFilesThatHaveEverRunOnHost gets the list of log files for all containers that have ever run on the host
func (collector *ContainerLogsCollectorContainerD) GetAllContainerLogFilesThatHaveEverRunOnHost() ([]string, error) {
output, err := utils.RunCommandOnHost("ls", containerLogDirectory)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason why we mount /var/log:

 volumes:
      - name: varlog
        hostPath:
          path: /var/log

@nicoletaz3
Copy link

Hi Tatsat, Docker is deprecated as a container runtime in Kubernetes, but not KIND which is a tool for running Kubernetes in a lightweight manner.


In reply to: 874466230

@Tatsinnit
Copy link
Member

Tatsinnit commented Jul 8, 2021

Hi Tatsat, Docker is deprecated as a container runtime in Kubernetes, but not KIND which is a tool for running Kubernetes in a lightweight manner.

In reply to: 874466230

Thank you @nicoletaz3 , yep sounds good. Really appreciate your reply, As a archival history I left this question since there is no other design-doc but this PR for this work. 🙏

Regarding this PR: There are bunch of key comments which are about other changes getting checked in as part of this PR and some are not clear, hence Arnaud’s and mine comments above in case you are reviewing this PR please feel free to add comments. Thank you so much.

@davidkydd
Copy link
Collaborator Author

Closing - will be refactored into multiple smaller PRs

@davidkydd davidkydd closed this Jul 12, 2021
@Tatsinnit Tatsinnit mentioned this pull request Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants