Combining Quality of Service and System Health Metrics in MAPE-K based ROS Systems through Behavior Trees
- Docker (necessary)
- CUDA (optional if you want to run the segmentation models in the managed subsystem on the GPU)
Download the segmentation models and the ros bags with test data from the a server with the scp command (password acsos
):
scp acsos@135.181.254.26:/mapek-bt/data.zip .
Unzip and place it inside the ros_ws as a ".data" folder.
ros_ws
├── .data
│ ├── chckpoints
│ │ ├── depth.pth
│ │ ├── fusion.pth
│ │ └── rgb.pth
│ └── SynDrone_t01_h50.bag
Run the experiment from the root of the repository with
bash ./ros_ws/evaluation/run_multi_experiment.sh
The logs will be stored in a folder log_dump
next to the ros_ws directory.
If you want to reproduce the results for the experiment excluding the dependency graph, use the exclude-dependency-graph
branch.
To calculate the results in our table, for each table there are scripts in the experiment_setup folder To retrieve our log files you can get them from the same server as above:
scp acsos@135.181.254.26:/mapek-bt/logs.zip .
Unzip this file and place it in the ros_ws folder. You should have the following directory structure:
├── main
│  ├── scenario_empty_activation_rules
│  ├── scenario_empty_full_rules
│  ├── scenario_F1+S1_activation_rules
│  ├── scenario_F1+S1_full_rules
│  ├── scenario_image_degradation_activation_rules
│  ├── scenario_image_degradation_full_rules
│  ├── scenario_message_drop_activation_rules
│  └── scenario_message_drop_full_rules
└── wo_dep_logs
├── scenario_image_degradation_full_rules
└── scenario_message_drop_full_rules
Now you can run the eval scripts to get the results in our paper. For Table 1, this script calculates the results. For Table 2, this script calculates the results. For Table 3, this script calculates the results.
There is a devcontainer file, i.e. you can just open VSCode in the root of this repository and reopen VSCode in the devcontainer. This should do the rest.
In case you want to visualize the Behaviour Tree that will be run, install Groot2 whereever you like. Live visualization is only available in the Pro Version of Groot2 anyway, so we don't bother with that.
Everything important is in the ros_ws. Here you find the following packages:
- bt_mape_k
- engel_interfaces
- managed_subsystem
- python_base_class
This package contains the implementation of the Behaviour Tree. The bt_exectutor is responsible for building the BT and executing it. The implementation of the nodes is in the include directory and in the according src files.
All BTs we develop in this paper can be stored in the bts directory.
This package contains all custom interfaces that we need in order to implement the MAPE-K loop.
If you want to describe a rule in the Behaviour Tree and do not want to condition to be triggered everytime but only on change, you can use the newly created ScriptCodeWithMemory node. You can use it exactly same as the "normal" ScriptCondition implemented in BehaviourTreeCPP. It is just extended with a memory, i.e. it remembers if it was already true/false in the last tick and therefore only gets triggered on change.
This package contains the full managed subsystem.
The camera node can artificially degrade the incoming image from the rosbag by shifting it towards a darker spectrum.
Use the parameter image_degradation
to set the degradiation from 0.0=no degradation
to 1.0=maximum degradation
.
The depth node can artificially degrades the incoming depth by adding noise to simulate unreasonable data at higher altitudes.
Use the parameter depth_degradation
to set the degradiation from 0.0=no degradation
to 1.0=maximum degradation
.
The image enhancement node tries to reverses the image shift by normalizing the image mean and standard deviation to a constant value.
The sensor fusion node synchronizes the data streams from the camera, lidar and enhancement.
Use the parameter topic_camera_input
to set the rgb input to either /rgb_camera
or /rgb_enhanced
.
Use the parameter modality
to set the fusion to 0=fusion
, 1=rgb
or 2=depth
.
The Segmentation node takes the fused data as input and outputs a predicted segmentation.
This package contains all logic that is necessary for running a full experiment.
Use ros2 launch experiment_setup experiment.launch.py
to run the full managed subsystem and all evaluation nodes.
The ros bag needs to be started seperately with ros2 bag play .data/SynDrone_t01_h50.bag --clock
. Make sure to choose the Dockerfile.cuda
in the devcontainer configuration when using a GPU. Note the --clock
flag which is necessary to sync the time of the system with the ros bag time in combination with the use_sim_time=true
flag for all nodes.
This node will compare the ground truth segmentation with the segmentation generated by the system. The results are directly logged via a logger that writes the results into a csv.
This node implements an additional message republishing step for all intermediate data streams. It can be used to strategically drop messages during the experiment execution.
Set the parameter do_drop_<topic_name> = True
to drop a specific topic until the param is set to false again.
This node will create specific failure scenarios based on a yaml file.
See ros_ws/src/experiment_setup/resources/scenarios
for example uses and required keys for each event type.
This package contains the base class that will be used by every node in the managed subsystem. For examples how to use the base class follow the instructions in the next section.
Next there is an explanation of all the ports that are necessary to understand in order to build a behavior tree. First this covers the main behavior tree and second the subtree responsible for a single ROS node in the managed subsystem. In General, all ports with a {=} value, do not have to be configured, they will be filled by the nodes itself.
In this image you can see a simpler version of the main behavior tree .
The InitializeBlackboard node is responsible for filling the blackboard with initial values. This includes variables that are used in the rules for evaluating the state of the managed subsystem but also the dependency graph. All values that should be present from the first tick on are stored in here. This is needed since the BT may tick through all the rules that want to read values from the blackboard before the information from the managed subsystem are present.
The monitoring node works as described in the paper and writes the current time in the ROS system on the blackboard. It receives the messages what to write onto the blackboard from the blackboard setter which sends a list of key value pairs (implemented as parameters) to the monitoring node. The name of the parameter will be the key under which the value of the parameter will be stored in the blackboard.
The subtrees need to be configured when implementing this BT for a new managed subsystem or in case the managed subsystem is modified.
Necessary ports for the subtree:
- component_name: exact name of the ROS node (namespace + name)
- dependency_graph: Is configures in the InitializeBlackboard BT node. Apart from that no configuration needed
- clock: Current time; will be filled by Monitoring node, so no configuration needed
- lc_state (lifecycle_state of this node): Will be filled initially by the InitializeBlackboard node with the value described in here. After that the executeLifecycle node in the respective subtree fills the new lifecycle state as soon as there is a response when activating / deactivating a node
The other ports are used because these are information that we collect in the managed subsystem and use them in the rules we define for our experiments.
The subtree in prinicple can have any design you like with a few "best practices".
- If you want to include the dependency_graph, you will need the PlanningDecorator as parent of the execution node sequence
- If you want to use rules from the DSL, you have to use the AnalysisDecorator (see below)
If you want to define the rules with BT nodes you can use sequences. Therefore, add a ScriptCondition or ScriptConditionMemory in which a condition for a rule is defined. The ScriptConditionWithMemory only triggers when the condition changes. Next add a RegisterAdaptation which will put an adaptation into a queue with potential adaptations that could be executed. Therefore, define an action which can have the values (action_activate, action_deactivate, action_restart, action_redeploy, action_change_communication, action_set_parameter).
Difference between restart and redeploy:
- Restart: Deactivates and activates a node
- Redeploy: Kill a node completely and relaunch it
For the actions action_activate, action_deactivate, action_restart, action_redeploy the fields parameter_name and value can be left empty.
For the other actions the parameter_name as configured for the ROS node and the value of the parameter have to be provided. For robustness, you can also provide the datatype of the parameter value. Possible values are string, bool, double, int.
The only other port you can configure is in the AnalysisDecorator: fails_2_redeploy in the AnalysisDecorator: If a specific adaptation fails for a few times (the amount you specify in this port), the node will be redeployed.
Current approach for designing a domain specific language for rule definition:
BEGIN CONSTS
int const_flight_phase_not_set 0
int const_flight_phase_cruise 1
int const_flight_phase_descent 2
int const_flight_phase_approach 3
int const_flight_phase_final_approach 4
int const_flight_phase_landing 5
string topic_camera_input topic_camera_input
bool const_true true
bool const_false false
int modality_fusion 0
int modality_rgb 1
string enhanced_rgb_topic /rgb_enhanced
string raw_rgb_topic /rgb_camera
double hb_freq_min 5.0
double node_freq_min 5.0
END CONSTS
BEGIN RULES
RULE init_sensor_fusion ON_TRIGGER_CHANGE
COMPONENTS [/managed_subsystem/sensor_fusion]
IF (current_phase == const_flight_phase_cruise)
THEN action_set_parameter modality modality_rgb
RULE activate_base_system ON_EVERY_TRIGGER
COMPONENTS [/managed_subsystem/camera, /managed_subsystem/sensor_fusion, /managed_subsystem/segmentation]
IF (current_phase == const_flight_phase_cruise)
THEN action_activate
RULE activate_base_system ON_EVERY_TRIGGER
COMPONENTS [/managed_subsystem/camera, /managed_subsystem/sensor_fusion, /managed_subsystem/segmentation]
IF (current_phase == const_flight_phase_descent)
THEN action_activate
END RULES
All the variables used in the conditions have to be readable from the blackboard (either blackboard setter sends this values to the BT or you define them via constants)
An example on how to configure the communication for your node is shown in here
The following is a more extensive example showcasing a configuration for each communication possibility supported by the base class.
from python_base_class.node_config import CommunicationTypes
from std_msgs.msg import String
from std_srvs.srv import SetBool
comm_types = [
{
"comm_type": CommunicationTypes.PUBLISHER,
"name": "/example_ns2/example_name",
"msg_type": String,
"time_period": 0.5,
"callback": "publisher_callback",
"always_on": False
},
{
"comm_type": CommunicationTypes.PUBLISHER,
"name": "/example_ns2/publisher_wo_callback",
"msg_type": String,
"always_on": False
},
{
"comm_type": CommunicationTypes.SUBSCRIPTION,
"name": "/example_ns/example_name",
"msg_type": String,
"callback": "subscription_callback",
"always_on": False
},
{
"comm_type": CommunicationTypes.SERVICE,
"name": "/example_ns/service",
"srv_type": SetBool,
"callback": "service_callback",
"always_on": False
},
{
"comm_type": CommunicationTypes.SERVICE_CLIENT,
"name": "/example_ns/service_2",
"srv_type": SetBool,
"always_on": False
},
]
This example dictionary also specifies exactly how a config has to look like. Each key is necessary and will also be validated by the base class.
If there is a typo or a key missing, the base class will throw an error.
If configuring a publisher, you can choose between using a timer or publishing the data in whatever way you prefer as seen in the example above.
In case you need a callback for the communication way, the name of the callback in the configuration has to match exactly the name of the function that you define in your node.
If you want that your communication is always available (ONLY IN RARE CASES!), you can set the always_on
parameter to True.
For parameters there is a central config file that is built like parameter files are usually used in ROS2 (see ROS2 documentation)
Imagine the params.yaml file looks like this:
camera_node:
ros__parameters:
param1: 2.3
topic_test_input: ''
lidar_node:
ros__parameters:
param2: 1
sensor_fusion:
ros__parameters:
topic_camera_input: "camera_raw"
There are two types of parameters:
- describing a value that you can use in your algorithms that should be adaptable, e.g. a confidence threshold in your object detection model. If you adapt a parameter here, the respective class member variable will be changed to the new value, i.e. the name of the parameter has to match the name of the member variable
- describing a communication interface, e.g. which topic to listen to if you want to have camera images. If you adapt this kind of parameter, the respective communication interface will be changed. To which communication interface a parameter refers, is defined in the communication configuration with the key "param".
Currently the difference between these two parameters is detected by the name, i.e. if the substring "topic" is in the parameter name, it will be handled as communication change, otherwise only the class member variable will be changed.
You are implementing the camera_node e.g. like this:
import rclpy
from typing import List
import rclpy.parameter
from sensor_msgs.msg import Image
from python_base_class.engel_base_class import ENGELBaseClass
from managed_subsystem.config.camera_config import comm_types
from ament_index_python.packages import get_package_share_directory
import os
class CameraNode(ENGELBaseClass):
def __init__(
self,
comm_types: List,
node_name: str = "camera_node",
param_file: str = "params.yaml",
):
config_file = os.path.join(
get_package_share_directory("managed_subsystem"), "config", param_file
)
super().__init__(node_name, comm_types, config_file)
self.param1 = None # instantiating a class member variable that will be filled with value by the parameters
self.topic_test_input = None
self.trigger_configure()
self.trigger_activate()
if self.validate_parameters() is False:
self.logger.warn(
f"Not all parameters are initialized correctly. Every parameter in \
the params.yaml file has to be a class member of {self.get_name()}"
)
def publisher_callback(self):
msg = Image()
publisher = self.get_comm_object("/camera_raw")
msg.header.frame_id = "/camera_raw"
msg.header.stamp = self.get_clock().now().to_msg()
publisher.publish(msg)
def subscription_callback(self, msg: Image):
pass
with your communication configuration looking like this:
from python_base_class.node_config import CommunicationTypes
from sensor_msgs.msg import Image
comm_types = [
{
"comm_type": CommunicationTypes.PUBLISHER,
"name": "/camera_raw",
"msg_type": Image,
"time_period": 0.5,
"callback": "publisher_callback",
"always_on": False,
},
{
"comm_type": CommunicationTypes.SUBSCRIPTION,
"name": "/test_img",
"msg_type": Image,
"callback": "subscription_callback",
"always_on": False,
'param': 'topic_test_input',
}
]
The following things have to be considered now:
- the name of the node (specified in your CameraNode class constructor) has to match one entry in the params.yaml file that describes the parameters used for this node.
- there has to be a class member variable with the same name as the parameter
- The handling of the parameters happens in the base class
- At the end of the constructor you can validate the parameters with the self.validate_parameters function. This will log a warning if there is something wrong.