Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep domain id if ROS_DOMAIN_ID is invalid. #689

Merged
merged 3 commits into from
Jun 22, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions rcl/src/rcl/domain_id.c
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

#include "rcl/domain_id.h"

#include <errno.h>
#include <limits.h>

#include "rcutils/get_env.h"
Expand All @@ -40,10 +41,14 @@ rcl_get_default_domain_id(size_t * domain_id)
get_env_error_str);
return RCL_RET_ERROR;
}
if (ros_domain_id) {
if (ros_domain_id && strcmp(ros_domain_id, "") != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a else {*domain_id = 0u;} be added instead of all the PRs addressing rmw implementations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if they don't want to use 0 as the default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if they don't want to use 0 as the default?

We used to force that in the past, and a lot of the rmw implementations aren't handling the default correctly (thus the other PRs are needed).
I don't mind if we want to change this.

Copy link
Member

@ivanpauno ivanpauno Jul 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR, rcl_node_get_domain_id started to return SIZE_MAX when the environment variable is unset.

The purpose of that API was to return the actual domain id that the node was using (see here), but the "actual" domain id is now erroneous.

IMO, the behavior before this PR was correct, i.e. using 0 as the default domain id if the environment variable is not set.
I really don't see any value in letting the rmw implementation define the actual domain id, and adding a layer in rmw to query the default domain id actually used seems unnecessary.

@hidmic @wjwwood what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the behavior before this PR was correct, i.e. using 0 as the default domain id if the environment variable is not set.

As I've stated elsewhere (here?) I disagree.

I really don't see any value in letting the rmw implementation define the actual domain id, and adding a layer in rmw to query the default domain id actually used seems unnecessary.

I wonder what's the use case for rcl_get_node_domain_id.

After this PR, rcl_node_get_domain_id started to return 0 when the environment variable is unset.

Hmm, how's that possible? I would expect it to be RCL_DEFAULT_DOMAIN_ID w/o further changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this point:

There's absolutely no guarantee that rmw_init_options_init() will use RMW_DEFAULT_DOMAIN_ID as opposed to 42 or 5. Which would then break environment checks.

I think it must use RMW_DEFAULT_DOMAIN_ID. When the domain id is in the init options, that is just a default value, the user may override it. If the user wants the middleware to select the domain id, then RMW_DEFAULT_DOMAIN_ID should be used.

The purpose of rmw_init_options_init() was not to allow the middleware to specify a custom default domain id. Though it could have been that or still could be, though I don't think that makes sense... The user needs to be able to explicitly ask the middleware to pick a domain ID regardless of what is in the init options initially, and that is what the RMW_DEFAULT_DOMAIN_ID is about, from my perspective.


I think that we should remove the domain id from the node (node options, actual_domain_id, etc) and move it into the rmw context object. I think we should keep the RMW_DEFAULT_DOMAIN_ID sentinel value, and it can be set in the rmw init options (we could require middlewares initialize it to that value) and then it will always be resolved in the resulting rmw context object.

The reason for keeping the RMW_DEFAULT_DOMAIN_ID, in my opinion, is to let the domain id being used be configured in the middleware, maybe based on some middleware specific configuration file (similar to our _DEFAULT QoS settings we have) or as @hidmic said based on some dynamic heuristic like a pool of domain id's.

I do not think we need a function to query what the default domain ID would be if we initialized with RMW_DEFAULT_DOMAIN_ID (e.g. rmw_get_default_domain_id()), but I could be convinced otherwise. I think it would be sufficient to say you need to initialize the system in order to find out what the default is.

In rcl, if the environment variable ROS_DOMAIN_ID is set, then it should be honored, always. The only time the middleware would choose a domain id, is if the ROS_DOMAIN_ID is not set and the RMW_DEFAULT_DOMAIN_ID value is used, or if ROS_DOMAIN_ID is somehow set to the RMW_DEFAULT_DOMAIN_ID value.

I think rcl_get_default_domain_id() should be renamed probably, since it does not get the default, instead it gets the value from the environment variable if it is set otherwise it does nothing to the given size_t *.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should remove the domain id from the node (node options, actual_domain_id, etc) and move it into the rmw context object. I think we should keep the RMW_DEFAULT_DOMAIN_ID sentinel value, and it can be set in the rmw init options (we could require middlewares initialize it to that value) and then it will always be resolved in the resulting rmw context object.

👍

The reason for keeping the RMW_DEFAULT_DOMAIN_ID, in my opinion, is to let the domain id being used be configured in the middleware, maybe based on some middleware specific configuration file (similar to our _DEFAULT QoS settings we have) or as @hidmic said based on some dynamic heuristic like a pool of domain id's.

👍

I do not think we need a function to query what the default domain ID would be if we initialized with RMW_DEFAULT_DOMAIN_ID (e.g. rmw_get_default_domain_id()), but I could be convinced otherwise. I think it would be sufficient to say you need to initialize the system in order to find out what the default is.

We currently don't have any way to introspect the domain_id that the rmw implementation set, we would need either:

  • rmw_context_get_domain_id(rmw_context_t * node)
  • rmw_node_get_domain_id(rmw_node_t * node)

the first one makes more sense.
We can call that function after initing the rmw_context in rcl, and store it in the corresponding rcl_context_t structure.

Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a function in rmw to get that field? Could it not just be a public member of the rmw_context_t object? I mean either would work, I'm just wondering if there's a reason to hide this member internally or not.

Otherwise, lgtm.


One lingering doubt in my mind, should we leave the domain id in the node just so middlewares that choose to do it that way can still map domain id to nodes rather than contexts? I'm not sure it makes sense to do that, but I'm just asking if anyone thinks we'll need that at some point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a function in rmw to get that field? Could it not just be a public member of the rmw_context_t object? I mean either would work, I'm just wondering if there's a reason to hide this member internally or not.

Yeah, a member variable also sounds good.

One lingering doubt in my mind, should we leave the domain id in the node just so middlewares that choose to do it that way can still map domain id to nodes rather than contexts? I'm not sure it makes sense to do that, but I'm just asking if anyone thinks we'll need that at some point?

My mental idea is that you have to create two contexts if you need different domain ids, regardless if the implementation can have a different domain id for each node or not.
Multi domain id applications are strange (mainly domain bridges?), and support for setting the domain id with node granularity doesn't seem to add much value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's settled then!

My mental idea is that you have to create two contexts if you need different domain ids, regardless if the implementation can have a different domain id for each node or not.
Multi domain id applications are strange (mainly domain bridges?), and support for setting the domain id with node granularity doesn't seem to add much value.

I agree. I had my doubts about this at some point in time, but I see now those were unfounded.

unsigned long number = strtoul(ros_domain_id, NULL, 0); // NOLINT(runtime/int)
if (number == ULONG_MAX) {
RCL_SET_ERROR_MSG("failed to interpret ROS_DOMAIN_ID as integral number");
if (number == 0UL && strspn(ros_domain_id, "0") != strlen(ros_domain_id)) {
ivanpauno marked this conversation as resolved.
Show resolved Hide resolved
RCL_SET_ERROR_MSG("ROS_DOMAIN_ID is not an integral number");
return RCL_RET_ERROR;
}
if (number == ULONG_MAX && errno == ERANGE) {
RCL_SET_ERROR_MSG("ROS_DOMAIN_ID is out of range");
return RCL_RET_ERROR;
}
*domain_id = (size_t)number;
Expand Down
22 changes: 19 additions & 3 deletions rcl/test/rcl/test_domain_id.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,31 @@

TEST(TestGetDomainId, test_nominal) {
ASSERT_TRUE(rcutils_set_env("ROS_DOMAIN_ID", "42"));
size_t domain_id = 0u;
size_t domain_id = RCL_DEFAULT_DOMAIN_ID;
EXPECT_EQ(RCL_RET_OK, rcl_get_default_domain_id(&domain_id));
EXPECT_EQ(42u, domain_id);

ASSERT_TRUE(rcutils_set_env("ROS_DOMAIN_ID", ""));
domain_id = RCL_DEFAULT_DOMAIN_ID;
EXPECT_EQ(RCL_RET_OK, rcl_get_default_domain_id(&domain_id));
EXPECT_EQ(RCL_DEFAULT_DOMAIN_ID, domain_id);

ASSERT_TRUE(rcutils_set_env("ROS_DOMAIN_ID", "0000"));
domain_id = RCL_DEFAULT_DOMAIN_ID;
EXPECT_EQ(RCL_RET_OK, rcl_get_default_domain_id(&domain_id));
EXPECT_EQ(0u, domain_id);

ASSERT_TRUE(rcutils_set_env("ROS_DOMAIN_ID", "0 not really"));
domain_id = RCL_DEFAULT_DOMAIN_ID;
EXPECT_EQ(RCL_RET_ERROR, rcl_get_default_domain_id(&domain_id));
rcl_reset_error();
EXPECT_EQ(RCL_DEFAULT_DOMAIN_ID, domain_id);

ASSERT_TRUE(rcutils_set_env("ROS_DOMAIN_ID", "998446744073709551615"));
domain_id = 0u;
domain_id = RCL_DEFAULT_DOMAIN_ID;
EXPECT_EQ(RCL_RET_ERROR, rcl_get_default_domain_id(&domain_id));
rcl_reset_error();
EXPECT_EQ(0u, domain_id);
EXPECT_EQ(RCL_DEFAULT_DOMAIN_ID, domain_id);

EXPECT_EQ(RCL_RET_INVALID_ARGUMENT, rcl_get_default_domain_id(nullptr));
}