IX-Exporter is a http server to expose Iluvatar GPU node information.
Build the executable binary ix-exporter
to the build
directory.
$ make build
$ ls build/ix-exporter
build/ix-exporter
Build the image
## build the image with default registry and version
$ make image
...
Successfully built f8e16ea6adb6
Successfully tagged ix-exporter:4.2.0-x86_64
## build the image with customize registry and version
$ REGISTRY=iluvatar.com/release VERSION=v4.2.1 make image
...
Successfully built f8e16ea6adb6
Successfully tagged iluvatar.com/release/ix-exporter:v4.2.1-x86_64
$ ./ix-exporter --help
NAME:
ix-exporter - Generates Iluvatar coreX metrics in the prometheus format
USAGE:
ix-exporter [global options] command [command options]
COMMANDS:
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--log-level value, -v value Log level, 0-panic, 1-fatal, 2-error, 3-warn, 4-info, 5-debug, 6-trace. (default: 4) [$IX_EXPORTER_LOGLEVEL]
--log-file value, -f value Path of log file. (default: "/tmp/log/ix-exporter.log") [$IX_EXPORTER_LOGFILE]
--enable-kubernetes, -k Enable kubernetes. (default: false) [$IX_EXPORTER_ENABLE_KUBERNETES]
--metrics-config value, -c value Path of metrics config file which contains of all fields. (default: "/etc/ixexporter/metrics.yaml") [$IX_EXPORTER_METRICS_CONFIG]
--remote-ix-hostengine value, -r value Connect to remote ix-hostengine at <HOST>:<PORT>. (e.g. 10.10.2.6:5777) [$IX_REMOTE_HOSTENGINE_INFO]
--ip value Service IP. (default: "0.0.0.0") [$IX_EXPORTER_SERVICE_IP]
--port value, -p value Service port. (default: "32021") [$IX_EXPORTER_SERVICE_PORT]
--help, -h show help
Before running the ix-exporter, there are following preperations,
- ensure that Corex was installed.
- configure your metrics.yaml to enable metrics.
- the ix-exporter use IxDCGM with embedded mode defaultly, if you want to connect to a remote ix-hostengine, please use
-r
option.
$ ./build/ix-exporter -c ./etc/metrics.yaml -p 32021
Default listening in http://localhost:32021
.
$ curl http://localhost:32021/metrics
See Deploy IX Exporter in Kubernetes Cluster
$ curl http://localhost:32021/metrics
# HELP ix_ecc_dbe_vol_status The double-bit volatile ecc errors status. if the value is 1, errors occurred, otherwise, no errors.
# TYPE ix_ecc_dbe_vol_status gauge
ix_ecc_dbe_vol_status{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 0
ix_ecc_dbe_vol_status{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 0
ix_ecc_dbe_vol_status{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 0
ix_ecc_dbe_vol_status{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 0
# HELP ix_ecc_sbe_vol_status The single-bit volatile ecc errors status. if the value is 1, errors occurred, otherwise, no errors.
# TYPE ix_ecc_sbe_vol_status gauge
ix_ecc_sbe_vol_status{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 0
ix_ecc_sbe_vol_status{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 0
ix_ecc_sbe_vol_status{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 0
ix_ecc_sbe_vol_status{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 0
# HELP ix_gpu_utilization The utilization of iluvatar GPU (%).
# TYPE ix_gpu_utilization gauge
ix_gpu_utilization{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 0
ix_gpu_utilization{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 0
ix_gpu_utilization{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 100
ix_gpu_utilization{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 100
# HELP ix_mem_clock Mem clock of iluvatar GPU (MHz).
# TYPE ix_mem_clock gauge
ix_mem_clock{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 1200
ix_mem_clock{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 1600
ix_mem_clock{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 1600
ix_mem_clock{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 1600
# HELP ix_mem_free The free physical memory of iluvatar GPU (MiB).
# TYPE ix_mem_free gauge
ix_mem_free{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 32511
ix_mem_free{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 32652
ix_mem_free{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 31870
ix_mem_free{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 31870
# HELP ix_mem_total The total physical memory of iluvatar GPU (MiB).
# TYPE ix_mem_total gauge
ix_mem_total{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 32768
ix_mem_total{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 32768
ix_mem_total{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 32768
ix_mem_total{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 32768
# HELP ix_mem_used The used physical memory of iluvatar GPU (MiB).
# TYPE ix_mem_used gauge
ix_mem_used{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 257
ix_mem_used{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 116
ix_mem_used{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 898
ix_mem_used{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 898
# HELP ix_mem_utilization The memory utilization of iluvatar GPU (%).
# TYPE ix_mem_utilization gauge
ix_mem_utilization{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 1
ix_mem_utilization{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 1
ix_mem_utilization{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 3
ix_mem_utilization{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 3
# HELP ix_pcie_replay_counter The PCIe replay counter.
# TYPE ix_pcie_replay_counter gauge
ix_pcie_replay_counter{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 0
ix_pcie_replay_counter{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 0
ix_pcie_replay_counter{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 0
ix_pcie_replay_counter{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 0
# HELP ix_pcie_rx_throughput The PCIe rx (read) data including both header and payload (KB/s).
# TYPE ix_pcie_rx_throughput gauge
ix_pcie_rx_throughput{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 0
ix_pcie_rx_throughput{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 28
ix_pcie_rx_throughput{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 77433
ix_pcie_rx_throughput{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 74598
# HELP ix_pcie_tx_throughput The PCIe tx (transmit) data including both header and payload (KB/s).
# TYPE ix_pcie_tx_throughput gauge
ix_pcie_tx_throughput{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 0
ix_pcie_tx_throughput{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 24180
ix_pcie_tx_throughput{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 0
ix_pcie_tx_throughput{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 0
# HELP ix_power_usage The power usage of iluvatar GPU.
# TYPE ix_power_usage gauge
ix_power_usage{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 55
ix_power_usage{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 132
ix_power_usage{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 206
ix_power_usage{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 210
# HELP ix_process_info The process info of iluvatar GPU (MiB).
# TYPE ix_process_info gauge
ix_process_info{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",process_name="",process_pid="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 0
ix_process_info{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",process_name="",process_pid="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 0
ix_process_info{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",process_name="./gemm_perf --i 2,3 --d 0 --m 1024 --l 1000 ",process_pid="49685",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 782
ix_process_info{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",process_name="./gemm_perf --i 2,3 --d 0 --m 1024 --l 1000 ",process_pid="49685",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 782
# HELP ix_sm_clock Sm clock of iluvatar GPU (MHz).
# TYPE ix_sm_clock gauge
ix_sm_clock{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 1500
ix_sm_clock{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 1500
ix_sm_clock{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 1600
ix_sm_clock{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 1625
# HELP ix_sm_utilization The utilization of SM (%).
# TYPE ix_sm_utilization gauge
ix_sm_utilization{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 0
ix_sm_utilization{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 1
ix_sm_utilization{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 1
# HELP ix_temperature The temperature of the iluvatar GPU(C).
# TYPE ix_temperature gauge
ix_temperature{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 35
ix_temperature{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 57
ix_temperature{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 60
ix_temperature{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 59
# HELP ix_xid_errors The Value of the last xid error encountered.
# TYPE ix_xid_errors gauge
ix_xid_errors{container="",gpu="0",name="Iluvatar BI-V100",namespace="",node_name="infra-92",pod="",uuid="GPU-4a8348cb-505c-507f-8df7-ff3c796e3033"} 0
ix_xid_errors{container="",gpu="1",name="Iluvatar MR-V50",namespace="",node_name="infra-92",pod="",uuid="GPU-2421fa19-18cf-47bb-b629-1ae3e642436d"} 0
ix_xid_errors{container="",gpu="2",name="Iluvatar BI-V150S",namespace="",node_name="infra-92",pod="",uuid="GPU-6d2ec5fa-f293-57a3-9f2c-335f78120578"} 0
ix_xid_errors{container="busybox",gpu="3",name="Iluvatar BI-V150S",namespace="default",node_name="infra-92",pod="test-pod",uuid="GPU-7edb0dc9-9291-5e13-9e1c-ad92672bdfec"} 0
Note: if some metric values of gpu are not listed, it might be due to that some gpus not support a part of metrics.