ACROSS-monitoring-stack

ACROSS-monitoring-stack is a multi-service architecture to handle telemetry data from diferent network nodes. The system is formed by:

Monitoring Stack = Telemetry Data Collector + Telemetry Processing
Telemetry Data Collector:
- Node exporter collector app
- Kafka producer python microservice
NDT Data Fabric
- Zookeeper server
- Kafka broker
Telemetry Processing
- Apache Flink Operator Cluster
Machine Learning Stack
- Machine Learning Inference Engine

All of these services are deployed in a Kubernetes cluster. All the resources needed to deploy the telemetry system architecture are included in the Kubernetes folder.

Current versions:

Prometheus Node Exporter: 1.8.2
Prometheus Node Exporter Collector application: 1.0.1
Apache Flink: 1.14.4
Scala: 2.12
Java: 11

Input metrics format

Telemetry Data Collector handles HTTP GET requests to Node Exporter Collector service trhough a pyhton microservice Kafka Producer. NDT Data Fabric input format data follows this JSON structure:

{
  "node_exporter": "r1:9100",
  "epoch_timestamp": "1745233635.2172275",
  "experiment_id": "1",
  "interfaces": ["eth2"],
  "flag_debug_params": true,
  "debug_params": {
    "flag_original_metrics": true,
    "max_throughput_mbps": 100,
    "unit": "bytes",
    "polling_interval": 5,
    "test_metrics": false,
    "multiplier": 2000,
    "collector_timestamp": "1745233631.0145035"
  },
  "metrics": [
    {
      "name": "node_network_receive_bytes_total",
      "description": "Network device statistic receive_bytes.",
      "type": "counter",
      "values": [
        {
          "labels": [
            {
              "name": "device",
              "value": "eth2"
            }
          ],
          "value": "2.434326851e+09"
        }
      ]
    },

    {
      "name": "node_network_receive_packets_total",
      "description": "Network device statistic receive_packets.",
      "type": "counter",
      "values": [
        {
          "labels": [
            {
              "name": "device",
              "value": "eth2"
            }
          ],

          "value": "2.040682e+06"
        }
      ]
    }
  ]
}

An example JSON input file is included at input_metrics.

ML input format

Kubernetes Flink Operator Telemetry System exposes metrics in an output JSON format which follows this structure:

{
  "node_exporter": "r1:9100",
  "epoch_timestamp": "1745233728.7595372",
  "experiment_id": "1",
  "interfaces": ["eth2"],
  "flag_debug_params": true,
  "debug_params": {
    "flag_original_metrics": true,
    "max_throughput_mbps": 100,
    "unit": "bytes",
    "polling_interval": 5,
    "test_metrics": false,
    "multiplier": 2000,
    "collector_timestamp": "1745233631.0145035",
    "process_timestamp":"1745233724580"
  },
  "metrics": [
  {
    "name": "node_network_receive_bytes_total",
    "description": "Network device statistic receive_bytes.",
    "type": "counter",
    "values": [
      {
        "value": "2.434367984e+09",
        "labels": [
          {
            "name": "device",
            "value": "eth2"
          }
        ]
      }
    ]
  },
  {
    "name": "node_network_receive_packets_total",
    "description": "Network device statistic receive_packets.",
    "type": "counter",
    "values": [
      {
        "value": "2.040711e+06",
        "labels": [
          {
            "name": "device",
            "value": "eth2"
          }
        ]
      }
    ]
  }
  ],
  "input_ml_metrics": [
    {
      "name": "node_network_receive_packets_total_rate",
      "type": "rate",
      "value": 301.9684264999918
    },
    {
      "name": "node_network_average_received_packet_length",
      "type": "length",
      "value": 1464
    },
    {
      "name": "node_network_router_capacity_occupation",
      "type": "percentage",
      "value": 0.0017683271055839516
    }
  ]
}

An example JSON output file is included at input_ml_metrics.

ML output format

Machine Learnig (ML) inference engines are emulated through dummies which generate power consumption metrics depending on router type and router occupation. The ML metrics are included in the output JSON file at output_ml_metrics.

{
  "node_exporter": "r1:9100",
  "epoch_timestamp": "1745233815.5825064",
  "experiment_id": "1",
  "interfaces": ["eth2"],
  "flag_debug_params": true,
  "debug_params": {
    "flag_original_metrics": true,
    "max_throughput_mbps": 100,
    "unit": "bytes",
    "polling_interval": 5,
    "test_metrics": false,
    "multiplier": 2000,
    "collector_timestamp": "1745233811.352313",
    "process_timestamp": "1745233811395",
    "ml_timestamp": "1745233803.5055509"
  },
  "metrics": [
  {
    "name": "node_network_receive_bytes_total",
    "description": "Network device statistic receive_bytes.",
    "type": "counter",
    "values": [
      {
        "value": "2.434457428e+09",
        "labels": [
          {
            "name": "device",
            "value": "eth2"
          }
        ]
      }
    ]
  },
  {
    "name": "node_network_receive_packets_total",
    "description": "Network device statistic receive_packets.",
    "type": "counter",
    "values": [
      {
        "value": "2.040774e+06",
        "labels": [
          {
            "name": "device",
            "value": "eth2"
          }
        ]
      }
    ]
  }
  ],
  "input_ml_metrics": [
    {
      "name": "node_network_receive_packets_total_rate",
      "type": "rate",
      "value": 650.8177749595993
    },
    {
      "name": "node_network_average_received_packet_length",
      "type": "length",
      "value": 1464
    },
    {
      "name": "node_network_router_capacity_occupation",
      "type": "percentage",
      "value": 0.003811188890163414
    }
  ],
  "output_ml_metrics": [
    {
      "name": "node_network_power_consumption_wats",
      "type": "power_consumption_wats",
      "value": [698.8288]
    },
    {
      "name": "node_network_power_consumption_variation_rate_occupation",
      "type": "power_consumption_variation_rate",
      "value": [0.0]
    },
    {
      "name": "node_network_power_consumption_variation_rate_packet_length",
      "type": "power_consumption_variation_rate",
      "value": -0.0008
    }
  ]
}

Docker Images

If you want to build your own Docker images, you can find the Dockerfiles for every service in the telemetry system architecture at docker.

$ cd Kubernetes/docker/flink
$ sudo docker build -t flink-operator:latest .
$ cd Kubernetes/docker/kafka_producer
$ sudo docker build -t kafka-producer:latest .
$ cd Kubernetes/docker/node-exporter-collector
$ sudo docker build -t node-exporter-collector:latest .
$ cd Kubernetes/docker/ml
$ sudo docker build -t ml-dummy:latest .
$ cd Kubernetes/docker/ml_models
$ sudo docker build -t ml_models:latest .

Configuration Parameters

Config.json: There is a JSON configuration file at config.json with several configuration parameters you can change manually before running the telemetry system. A JSON schema is included at config-schema.json in order to validate the configuration file as follows:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
        "experiment_id": {
            "type": "string",
            "description": "Experiment unique identifier"
        },
        "flag_original_metrics": {
            "type": "boolean",
            "description": "Flag for showing orignial node exporter's metrics or not"
        },
        "flag_debug_params": {
            "type": "boolean",
            "description": "Flag for showing debug params or not"
        },
        "max_throughput_mbps": {
            "type": "integer",
            "description": "Maximum bit rate for router's links [Mbps]"
        },
        "polling_interval": {
            "type": "integer",
            "description": "Time between node exporter collector metrics requests [s]"
        },
        "multiplier": {
            "type": "integer",
            "description": "Multiplier applied to packet releated metrics values [1:2000]"
        },
        "routers": {
            "type": "array",
            "description": "List of monitored routers",
            "items": {
                "type": "object",
                "patternProperties": {
                    ".+": {
                        "type": "object",
                        "properties": {
                            "collector_url": {
                                "type": "string",
                                "format": "uri",
                                "description": "Node exporter collector endpoint for each router's metrics"
                            },
                            "topic": {
                                "type": "string",
                                "description": "Kafka input topic for each router's metrics"
                            },
                            "interfaces": {
                                "type": "array",
                                "items": {"type": "string"},
                                "description": "List of router's interfaces monitored in each experiment"
                            },
                            "test_metrics": {
                                "type": "boolean",
                                "description": "Flag for using manual test metrics in each router or not"
                            }
                        },
                        "required": ["collector_url", "topic", "interfaces", "router_type", "test_metrics"]
                    }
                },
                "additionalProperties": false
            }
        }
    },
    "required": [
        "experiment_id",
        "flag_original_metrics",
        "flag_debug_params",
        "max_throughput_mbps",
        "polling_interval",
        "multiplier",
        "routers"
    ]
}

Deployment

There are two deployment scripts for the telemetry system architecture which deploy:

Apache Kafka broker
Node Exporter Collector
Kafka Producer microservice
Flink Operator Cluster
ML Stack
k8s-deploy-ml-models.sh: Deploys Monitoring Stack, NDT Data Fabric and Machine Learning Stack with ML models.
k8s-deploy-ml-dummy.sh: Deploys Monitoring Stack, NDT Data Fabric and Machine Learning Stack with ML dummy.

The deployment script k8s-deploy-ml-models.sh requires two input parameters to define the type of router and the type of model that will be used by the Machine Learning Stack, ML Stack.

./k8s-deploy.sh <router_type> <model_type>

<router_type>: Router type to use, for example huawei.
<model_type>: Model type to use, for example linear, MLP, polynomial, rf.

Router type <router_type>: huaweiand model type <model_type>: linear are the default values used if no input parameters are specified.

The deployment script k8s-deploy-ml-dummy.sh does not require any input parameters.

ConfigMaps

$ kubectl create configmap config-json --from-file=config/config.json
$ kubectl create configmap test-metrics-configmap $TEST_FILES
$ kubectl create configmap ml-huawei-config --from-file=config/ml-config/ml-huawei-config.txt
$ kubectl create configmap ml-adva-config --from-file=config/ml-config/ml-adva-config.txt
$ kubectl apply -f templates/ml_models/ml_models_configmap.yaml
$ kubectl create configmap ml-inference --from-file=docker/ml_models/ml_inference/inference.py

NDT Data Fabric + Node Exporter Collector

$ kubectl apply -f ./templates/node-exporter-collector.yaml
$ kubectl apply -f ./templates/zookeeper.yaml
$ kubectl apply -f ./templates/kafka.yaml

Flink Operator Cluster

$ kubectl apply -f ./templates/flink-cluster.yaml

Flink Jobs

$ kubectl apply -f "./templates/jobs/flink-job-submitter-${router}.yaml"

Kafka Producer Microservice

$ kubectl apply -f ./templates/kafka-producer.yaml

Machine Learning Stack

$ ./scripts/ml_models/launch_ml_stack.sh "$ROUTER_TYPE" "$MODEL_TYPE"

To switch from ML models Stack to ML dummy Stack and vice versa, you can use the script switch_ml_stack.sh as follows:

$ ./scripts/ml_models/switch_ml_stack.sh ml-model

This usage will change from ML dummy Stack to ML models Stack with default values for router type (huawei) and model type (linear).

$ ./scripts/ml_models/switch_ml_stack.sh ml-model huawei rf

This usage will change from ML dummy Stack to ML models Stack with router type and model type specified.

$ ./scripts/ml_models/switch_ml_stack.sh dummy

This usage will change from ML models Stack to ML dummy Stack.

Experiment

To change the telemetry system parameters to perform a new experiment, you need to:

Edit configmap from where the Telemetry Data Collector takes the configuration parameters

$ kubectl edit configmap config-json

Restart Kafka Producer microservice in order to take the new configuration parameters

$ kubectl rollout restart deployment kafka-producer

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
Kubernetes		Kubernetes
docker		docker
docs		docs
examples		examples
flink-aggregation		flink-aggregation
prometheus-config		prometheus-config
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
docker-compose-vnx.yml		docker-compose-vnx.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ACROSS-monitoring-stack

Current versions:

Input metrics format

ML input format

ML output format

Docker Images

Configuration Parameters

Deployment

Experiment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

giros-dit/ACROSS-monitoring-stack

Folders and files

Latest commit

History

Repository files navigation

ACROSS-monitoring-stack

Current versions:

Input metrics format

ML input format

ML output format

Docker Images

Configuration Parameters

Deployment

Experiment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages