Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with "hcsshim::GetComputeSystems" while using container metric #453

Closed
RamBoddapati opened this issue Dec 31, 2019 · 21 comments
Closed

Comments

@RamBoddapati
Copy link

RamBoddapati commented Dec 31, 2019

Hi Team,
I am having trouble with container metric. I have packaged the code "wmi_exporter-0.9.0-amd64.exe" as a container and deployed to AKS windows server 2019 to monitor my windows containers in AKS. But am experiencing below error with WMI_Exporter.

msg="collector container failed after 0.001015s: hcsshim::GetComputeSystems: The specified module could not be found." source="exporter.go:215"

Please help me if am missing something to configure.

Here is my Dockerfile.

# escape=`
FROM mcr.microsoft.com/windows/servercore:ltsc2019
SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]

COPY . C:\wmiexporter\

EXPOSE 5000

ENTRYPOINT ["powershell", `
            "C:\\wmiexporter\\wmi_exporter-0.9.0-amd64.exe", `
            "--collectors.enabled container"]
@RamBoddapati
Copy link
Author

I see some referenced code which is not part of our code. Is something we need to add?
Here is url
https://github.com/microsoft/hcsshim/blob/master/internal/hcs/system.go

@carlpett
Copy link
Collaborator

HI @RamBoddapati.
I haven't personally had opportunity to try running from within a container, so I'm not sure how/if that works.
@sachinmsft, you contributed this collector, do you have any insights?

@RamBoddapati
Copy link
Author

@sachinmsft, your help is more important to me.

@sachinmsft
Copy link
Contributor

sachinmsft commented Dec 31, 2019 via email

@RamBoddapati
Copy link
Author

@sachin, just like to understand if you would able to find time to look into this issue? I am not much familiar with go lang. Hence looking for your solution. Please help me.

@sachinmsft
Copy link
Contributor

sachinmsft commented Jan 3, 2020 via email

@sachinmsft
Copy link
Contributor

sachinmsft commented Jan 3, 2020 via email

@RamBoddapati
Copy link
Author

@sachinmsft , Thanks Sachin for your quick return.
As I am using AKS managed windows node, is there any way that we can directly deploy in AKS windows node to pull container metrics, instead of having as windows container?

Please help me.

@sachinmsft
Copy link
Contributor

Can not you get the windows container insights through this https://docs.microsoft.com/en-us/azure/azure-monitor/insights/container-insights-analyze
?

@RamBoddapati
Copy link
Author

@sachinmsft , No Sachin, I have seen that solution earlier. its limited to Node level and not at container level to scrape all metrics like network, cpu, memory, disk ios .. etc. Hence we have started looking at open source to scrape metric to Grafana through Prometheus.

We have implemented Prometheus and grafana solution for Linux containers and looking to implement for windows containers.

Your help is needed highly. Thanks for understanding.

@sachinmsft
Copy link
Contributor

wmi_exporter only provides the CPU, memory and network metrics https://github.com/martinlindhe/wmi_exporter/blob/master/collector/container.go#L16

And if you want to run the wmi_exporter through daemon set the way you might be using node_exporter is to use https://github.com/rancher/wins .
Please take a look at here https://github.com/rancher/system-charts/blob/dev/charts/rancher-monitoring/v0.0.7/charts/exporter-node-windows/templates/daemonset.yaml

@RamBoddapati
Copy link
Author

@sachinmsft , Thanks Sachin. It looks it might fit for my needs, I will try this and see if any issues. Thanks a lot for your help. It really great day to me. thanks again for you and and Carl.

I will keep in touch with you. I will get back you very soon.

@RamBoddapati
Copy link
Author

@sachinmsft , Please provide me your assistance. I am ending with below error.
Error:


�[31mFATA�[0m[2020-01-06T13:13:22Z] rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing open \\.\pipe\rancher_wins: The system cannot find the file specified."


It looks some issue with mount point. Please find the screen shot. I dont understand logic behind creating a volumeMount and looking for a windows service "rancher_wins".
Please help me.

image

@sachinmsft
Copy link
Contributor

@RamBoddapati I am not sure about this error as I have not run it myself. I came across this tool sometime back and gave the pointer to you just in case it is useful for you. You should do the followup for this here https://github.com/rancher/wins.

@petemounce
Copy link

I have this issue when I attempt to run win_exporter 0.10.2 on Windows Server 2019 GUI edition within GCE, on the host (as in, not within a container).

The host VM does not have HyperV (GCE does not support nested virtualisation yet).

time="2020-03-15T15:54:40Z" level=error msg="Err in Getting containers:hcsshim::GetComputeSystems: The specified module could not be found." source="container.go:155"
time="2020-03-15T15:54:40Z" level=error msg="failed collecting ContainerMetricsCollector metrics:<nil> hcsshim::GetComputeSystems: The specified module could not be found." source="container.go:136"
time="2020-03-15T15:54:40Z" level=error msg="collector container failed after 0.004888s: hcsshim::GetComputeSystems: The specified module could not be found." source="exporter.go:207"
time="2020-03-15T15:54:41Z" level=error msg="hcsshim::GetComputeSystems - End Operation - Error" error="hcsshim::GetComputeSystems: The specified module could not be found."

The host has had docker installed via this ansible:

---
# These steps are adapted from the official docker documentation on how to install via powershell:
# https://docs.docker.com/install/windows/docker-ee/#use-a-script-to-install-docker-ee
# The Docker EE licence is included with Windows Server

- name: download docker
  win_get_url:
    url: "{{ install_docker_download_url[ansible_os_family] }}"
    dest: "c:/windows/temp/{{ install_docker_package }}.zip"

- name: extract docker zip
  win_unzip:
    src: "c:/windows/temp/{{ install_docker_package }}.zip"
    dest: "%ProgramFiles%/"
    delete_archive: yes

# see https://blog.airdesk.com/2017/09/windows-containers-feature-.html for more details
- name: enable Windows Containers feature
  win_feature:
    name: containers
    state: present

- name: add docker to path
  win_path:
    elements: 'C:\Program Files\docker'

- name: make a group to allow non-privileged users to use docker
  win_group:
    name: docker-users

- name: add users to docker-users group
  win_group_membership:
    name: docker-users
    members: "{{ install_docker_users }}"
    state: present

- name: make sure docker config location exists
  win_file:
    path: c:/programdata/docker/config
    state: directory

- name: configure the docker daemon
  win_copy:
    src: docker-daemon.json
    dest: c:/programdata/docker/config/daemon.json

# There used to be a reboot step here and a `dockerd --register-service` step, as per
# the official installation instructions. What we found however, is that on Windows 2019
# this step was slow and flaky, and resulted in the docker daemon not starting on the buildkite agents. For
# these reasons we skip the reboot and use nssm.

- name: Install the docker service
  win_nssm:
    name: docker
    # Using Windows formatted pathes here, to make sure we don't trip up nssm.
    application: C:/Program Files/docker/dockerd.exe
    stdout_file: "{{ install_docker_logs_path[ansible_os_family] }}/dockerd.log"
    stderr_file: "{{ install_docker_logs_path[ansible_os_family] }}/dockerd.log"

# The win_nssm module does not explicitly describe restart behaviour so we set
# it to auto restart in case of failure here. https://docs.ansible.com/ansible/latest/modules/win_nssm_module.html
- name: Make sure the docker service autorestarts
  win_shell: nssm set docker AppExit Default Restart

The docker-daemon config file is

{
  "group": "docker-users"
}

The install_docker_users variable is an array of non-admin usernames.

@sachinmsft
Copy link
Contributor

sachinmsft commented Mar 16, 2020 via email

@cloudcafetech
Copy link

Any solution running on kubernetes similar like node exporter ?

@carlpett
Copy link
Collaborator

@cloudcafetech in #581 I believe the conclusion for now is that until Windows supports privileged containers it is not possible, and you need to run the exporter directly on the host for now (which unfortunately is not possible on hosted Kubernetes services)

@widdix123
Copy link

@carlpett - Is this not possible with EKS too ?

@carlpett
Copy link
Collaborator

To the best of my knowledge, yes. This isn't specific to the Kubernetes distros/managed service variants, but rather how Windows containers presently work.
In the case of EKS, you have a somewhat "simple" workaround possible in defining custom workers, where you can then use your own AMIs that include the windows_exporter. For GKE/AKS and similar offerings, this is not possible.

@jsturtevant
Copy link
Contributor

This can be closed. This has been fixed With #864 and has some docs. There are also examples on wiring all this up with the rest of the Prometheus stack in https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/windows.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants