Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bazel] rosbag is unable to record custom types #311

Closed
EricCousineau-TRI opened this issue Oct 30, 2023 · 25 comments
Closed

[bazel] rosbag is unable to record custom types #311

EricCousineau-TRI opened this issue Oct 30, 2023 · 25 comments
Assignees

Comments

@EricCousineau-TRI
Copy link
Collaborator

@kunimatsu-tri and I ran into this on Anzu. Kuni and I will work to produce an example for this repository.

@calderpg-tri @IanTheEngineer Can I ask for some time from someone at OSRC to look at this?
It seems specific to how rosbag is attempting to load IDL support at runtime in ways that do not work with the current Bazel runfiles layout.

@kunimatsu-tri

This comment was marked as outdated.

@kunimatsu-tri

This comment was marked as outdated.

@kunimatsu-tri

This comment was marked as outdated.

@EricCousineau-TRI

This comment was marked as outdated.

@EricCousineau-TRI
Copy link
Collaborator Author

Here's an automated reproduction, with what we believe should be the right data declarations not working:
#312

@EricCousineau-TRI
Copy link
Collaborator Author

To narrow the scope initially, I would like to know what we are missing in our Bazel runfiles that should make ros2 bag record work.
Kuni and I looked, but we don't know what exactly it's looking for.

Once we identify what's missing and perhaps a hacky way to fix it (e.g. "just copy these files"), then someone from TRI can make the adjustments to our Bazel mechanics.

@IanTheEngineer
Copy link
Member

Another data point, this relevant test passes when run (either as an executable or test) from drake-ros/ros2_example_bazel_installed/:

$ bazel run ros2_example_apps:custom_message_echo_test

which uses the simple_talker and ros2 topic echo ros2_example_apps_msgs/msg/Status. This is the output:

bazel run ros2_example_apps:custom_message_echo_test
...
INFO: Build option --test_env has changed, discarding analysis cache.
INFO: Analyzed target //ros2_example_apps:custom_message_echo_test (0 packages loaded, 7367 targets configured).
INFO: Found 1 target...
Target //ros2_example_apps:custom_message_echo_test up-to-date:
  bazel-bin/ros2_example_apps/_custom_message_echo_test_shim.py
  bazel-bin/ros2_example_apps/custom_message_echo_test
INFO: Elapsed time: 1.232s, Critical Path: 0.07s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //ros2_example_apps:custom_message_echo_test
-----------------------------------------------------------------------------
stamp:
  sec: 0
  nanosec: 0
status:
  sequence_id: 0
  message: ''
origin: ''
---
[ Done ]

@jwnimmer-tri
Copy link
Contributor

BTW Running under strace --trace=file --follow-forks --output=$(pwd)/log would log which directories / files are being searched / opened. That's usually how I debug programs that are being tricky about loading their resources.

@EricCousineau-TRI
Copy link
Collaborator Author

custom_message_echo_test passes when run under the appropriate target because it has IDL paths exposed (via AMENT_PREFIX_PATH, improved in #305) due to data declarations.
(For Anzu, we actually add IDL data declarations to //tools:ros2 so that things like this work)

The expectation from #312 is that those same data declarations should make ros2 bag record work properly.

@adityapande-1995
Copy link
Collaborator

adityapande-1995 commented Oct 30, 2023

Looking into this, something funny happening with ament index here :

Typesupport library for ros2_example_apps_msgs does not exist in '/usr/local/google/home/adityapande/.cache/bazel/_bazel_adityapande/ad9a0e8d87dfe19d768804a6f0483cb8/sandbox/linux-sandbox/59/execroot/ros2_example_bazel_installed/bazel-out/k8-fastbuild/bin/ros2_example_apps/custom_message_rosbag_test.runfiles/rosidl_generate_ament_index_entry'.

Tried so far :
Copying/symlinking the correct libraries into rosidl_generate_ament_index_entry/lib, appending to AMENT_PREFIX_PATH did not work. Error :

dlopen error: libros2_example_common_msgs__rosidl_typesupport_cpp.so: cannot open shared object file: No such file or directory, at /tmp/ws/src/ros2/rcutils/src/shared_library.c:99

@kunimatsu-tri
Copy link

For the ros2 topic echo case, how exactly should I add IDL to @ros2//:ros2?

In ros2_example_apps:custom_message_echo_test, it adds IDL to the test process itself, which calls ros2 topic echo but that's not a normal use case.

@EricCousineau-TRI
Copy link
Collaborator Author

@adityapande-1995 Were you able to turn up anything by comparing strace output per Jeremy's suggestion?
Ideally, a comparison should be made between colcon-based workflow and a Bazel-based workflow.

@kunimatsu-tri In Anzu, we have a root-level roll-up target, //:ros_msgs_all_py. Then for //tools:ros2, we add data = ["//:ros_msgs_all_py"].
In this repository, we can do the same - make a wrapped version via //tools:ros2 and add similar data dependency.
It's perhaps not as convenient, as every similar tool needs something like that (e.g. rosidl, rviz2 with plugins, etc.), but it seems sufficient for our Anzu usages.

We can add this any time, but I don't believe I'll have time this week.

@kunimatsu-tri
Copy link

Thanks, Eric. I'll ignore that one for now.

FYI, I tried running ros2 bag record under strace that @jwnimmer-tri mentioned (thank you for this!), and I've got this error:

1194927 openat(AT_FDCWD, "/home/kunimatsu/.cache/bazel/_bazel_kunimatsu/852eecae282a87916242397fbb011613/execroot/ros2_example_bazel_installed/bazel-out/k8-fastbuild/bin/external/ros2/ros2.runfiles/ros2/archive/ros2bag/share/ament_index/resource_index/packages/ros2_example_apps_msgs", O_RDONLY) = -1 ENOENT (No such file or directory)

And this is just one of tons of similar errors.

@adityapande-1995
Copy link
Collaborator

adityapande-1995 commented Oct 31, 2023

Yeah I did run it. The libs that come from the original ros2 archive live in that path that @kunimatsu-tri mentioned, but in our case it lives in ros2_example_apps, as it is a "new" package that we wrote. I suspect either the shared lib for the message is in the wrong path, or it is looking for it in the wrong path. AMENT_PREFIX_PATH should handle that, but modifying it did not help. Continuing investigation.

@EricCousineau-TRI
Copy link
Collaborator Author

Gotcha. Can you post a GitHub gist of the strace command via #312 and via a colcon build?
I think that would help answer our questions more concretely.

Also, I would assume that a shared library should be loaded according to LD_LIBRARY_PATH, not AMENT_PREFIX_PATH...

@adityapande-1995
Copy link
Collaborator

adityapande-1995 commented Oct 31, 2023

Yup will do. Its like 40k lines of trace haha. Debug progress so far :

TL;DR :

  1. We have a bug in rosidl_interfaces_group rule most probably. libros2_example_apps_msgs__rosidl_typesupport_cpp.so should depend on libros2_example_apps_msgs__rosidl_typesupport_introspection_cpp.so, but it does not, and hence can't find a symbol called rosidl_typesupport_introspection_cpp__get_message_type_support_handle__ros2_example_apps_msgs__msg__Status. Hence dlopen() fails, and the rosbag recorder crashes.

  2. There are still paths that are messed up, but that isn't a big problem.

Procedure followed :

  1. Start the talker in a terminal using : bazel run //ros2_example_apps:simple_talker
  2. Create a dummy target :
ros_py_binary(
    name = "ap_custom_msg_rosbag",
    srcs = ["test/ap_custom_msg_rosbag.py"],
    data = [
        # Provide both Python and C++ IDL to see if it can get picked up...
        ":ros2_example_apps_msgs_py",
        ":ros2_example_apps_msgs_cc",
        # "@ros2//:rosidl_typesupport_cpp_cc",
        # "@ros2//:rosidl_typesupport_cpp_py",

        # Binaries to run.
        ":simple_talker",
        "@ros2",
    ],
    main = "test/ap_custom_msg_rosbag.py",
    deps = [
        ":ros2_example_apps_msgs_py",
        "@bazel_tools//tools/python/runfiles",
        "@ros2//resources/rmw_isolation:rmw_isolation_py",
    ],
)

and ap_custom_msg_rosbag.py is just

import subprocess
subprocess.run(['/bin/bash'])

Run this in a new terminal to give you a shell access :
bazel run //ros2_example_apps:ap_custom_msg_rosbag

  1. In the shell inside the sandbox, fix the missing libraries first :
mkdir -p ../rosidl_generate_ament_index_entry/lib; 
cp ./ros2_example_apps/*.so ../rosidl_generate_ament_index_entry/lib/;
cp ./ros2_example_common/*.so ../rosidl_generate_ament_index_entry/lib/; 
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/ros2_example_common

Then try to record using :
./external/ros2/ros2 bag record --all, which throws the missing symbol error.

Suspects

Looking at

rosidl_typesupport_cc_library(
name = _make_public_name(name, "__rosidl_typesupport_cpp"),
typesupports = typesupports,
group = group or name,
interfaces = interfaces,
includes = [_make_public_label(dep, "_defs") for dep in deps],
deps = [_make_private_label(name, "__rosidl_cpp")] + [
_make_public_label(dep, "_cc")
for dep in deps
],
cc_binary_rule = cc_binary_rule,
**kwargs
)
, I feel that should contain the "typesupport_introspection" library as a dependency. Working on it.

@adityapande-1995
Copy link
Collaborator

adityapande-1995 commented Oct 31, 2023

Relevant strace logs after copying the correct libs over (after following the Procedure above) :

.
46136 501573 newfstatat(AT_FDCWD, "~/.cache/bazel/_bazel_adityapande/ad9a0e8d87dfe19d768804a6f0483cb8/execroot/ros2_example_bazel_installed/bazel-out/k8-fastbuild/bin/ros2_example_apps/ap_custom_msg_rosbag.runfiles/rosidl_generate_ament_index_e    ntry/lib/libros2_example_apps_msgs__rosidl_typesupport_cpp.so", {st_mode=S_IFREG|0555, st_size=16384, ...}, 0) = 0
  
46137 501573 openat(AT_FDCWD, "~/.cache/bazel/_bazel_adityapande/ad9a0e8d87dfe19d768804a6f0483cb8/execroot/ros2_example_bazel_installed/bazel-out/k8-fastbuild/bin/ros2_example_apps/ap_custom_msg_rosbag.runfiles/rosidl_generate_ament_index_entry    /lib/libros2_example_apps_msgs__rosidl_typesupport_cpp.so", O_RDONLY|O_CLOEXEC) = 10
.
.

So it does find it, working on fixing the missing symbol.

@adityapande-1995
Copy link
Collaborator

adityapande-1995 commented Oct 31, 2023

Alright I got it working, by force loading the library using LD_PRELOAD. I'll create a draft PR against @EricCousineau-TRI 's branch, but will need to refine the solution.

Inside the bash in the sandbox :

LD_PRELOAD=../rosidl_generate_ament_index_entry/lib/libros2_example_apps_msgs__rosidl_typesupport_introspection_cpp.so ./external/ros2/ros2 bag record --all

@adityapande-1995
Copy link
Collaborator

adityapande-1995 commented Nov 1, 2023

Having some trouble making the test work here with the hack. For now, if this is a blocker, the following wrapper will work for ros2 bag record. Very very hacky, but the bare minimum to get it running. Working on improving this, or filing tickets :

In ros2_example_bazel_installed/ros2_example_apps/BUILD.bazel :

ros_py_binary(
    name = "temp_rosbag_wrapper",
    srcs = ["test/temp_rosbag_wrapper.py"],
    data = [
        ":ros2_example_apps_msgs_cc",
        "@ros2",
    ],
    main = "test/temp_rosbag_wrapper.py",
)

And in ros2_example_bazel_installed/ros2_example_apps/test/temp_rosbag_wrapper.py

import subprocess

# Resolves "shared lib not found in .....ament_index_entry" error
# It is looking in AMENT_PREFIX_PATH
command = "mkdir -p ../rosidl_generate_ament_index_entry/lib;"
command += "cp -f ros2_example_apps/*.so ../rosidl_generate_ament_index_entry/lib/ ;"
command += "cp -f ros2_example_common/*.so ../rosidl_generate_ament_index_entry/lib/ ;"

command += "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/ros2_example_common ;"

# Force load typesupport introspection
command += "LD_PRELOAD=../rosidl_generate_ament_index_entry/lib/libros2_example_apps_msgs__rosidl_typesupport_introspection_cpp.so"
command += " ./external/ros2/ros2 bag record --all"

subprocess.run(command, shell=True)

Start the recorder using bazel run //ros2_example_apps:temp_rosbag_wrapper

Action items :

  1. Why are the libraries in the wrong place ? Which paths needs to be changed to fix this ?
    The new msg "package" lives in the generated rosidl_generate_ament_index directory, and that needs a lib
    which contain the shared libs.
  2. Why is the type introspection lib not a dependency of the type support library for libros2_example_apps_msgs__rosidl_typesupport_cpp.so ?
  3. Make the test work using the fixes in above points.

@EricCousineau-TRI
Copy link
Collaborator Author

EricCousineau-TRI commented Nov 1, 2023

Thanks!
If you are hacking .runfiles directories, running bazel test will rebuild and discard your hacks.
You should use ./run if you are messing with the guts of the runfiles tree, and/or apply the hacks to the Bazel rules themselves.
EDIT: To clarify, your wrapper script should be good in terms of restoring it to a working state.

That being said, I am still confused why rosbag is searching in ${AMENT_PREFIX_PATH}/lib instead of ${LD_LIBRARY_PATH}.

Can you point out, in source code, where this (broken?) assumption of ${AMENT_PREFIX_PATH}/lib is made?
Or am I misunderstanding?

@adityapande-1995
Copy link
Collaborator

adityapande-1995 commented Nov 2, 2023

My guess here is that ros2 bag record executable assumes that the new message is contained in a ros package, and hence searches for it in AMENT_PREFIX_PATH. We do create an entry for it, just that the shared libraries aren't copied over. The relevant code probably live in rcutils upstream though. Looking into a way to copy over the shared libs gracefully.

On checking the ros2 archive, I see all of them contain the lib and share directories, we're missing a lib when building the custom msgs. This is definitely an inconsistency.

@ahcorde
Copy link
Collaborator

ahcorde commented Nov 15, 2023

@adityapande-1995 and me are working on this solution that we are trying to explore which is the following

def _copy_typesupport_libs_impl(ctx):
    my_path = paths.join(
        ctx.attr.prefix,
        "lib/lib" + ctx.attr.pkgname + ".so",
    )
    temp_path = paths.join(
        ctx.attr.prefix,
        "lib/lib" + ctx.attr.pkgname + ".hello",
    )
    temp_file = ctx.actions.declare_file(temp_path)
    ctx.actions.write(
        output = temp_file,
        content = "",
    )

    runfiles_symlinks = {
      my_path : ctx.file.libdep,
      temp_path : temp_file
    }

    return [
        AmentIndex(prefix = ctx.attr.prefix),
        DefaultInfo(
            runfiles = ctx.runfiles(root_symlinks = runfiles_symlinks),
        ),
    ]

copy_typesupport_libs = rule(
    attrs = dict(
        pkgname = attr.string(mandatory = True),
        prefix = attr.string(default = "rosidl_generate_ament_index_entry"),
        libdep = attr.label(
          mandatory = True,
          allow_single_file = True,
        ),
    ),
    implementation = _copy_typesupport_libs_impl,
    output_to_genfiles = True,
    provides = [AmentIndex],
)

and then added this rule

    copy_typesupport_libs(
        name = name + "_copy_libs",
        pkgname = name, "__rosidl_typesupport_cpp",
        libdep = ":" +_make_public_name(name, "__rosidl_typesupport_cpp"),
        **kwargs
    )

here

The .hello files show up on running bazel build //ros2_example_common:ros2_example_common_msgs_copy_libs but not the .so ones. Not really sure why the .so file is not showing up.

@adityapande-1995
Copy link
Collaborator

Is there a way to get logs or some verbosity on why one step fails and the other doesn't ? Theres no error thrown.

@EricCousineau-TRI
Copy link
Collaborator Author

Fixed by #317

@adityapande-1995
Copy link
Collaborator

Raised an issue upstream : ros2/rosbag2#1517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants