-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ 11 ] Resource temporarily unavailable #1347
Comments
The message originates right in the beginning when the application calls So we have the situation that roudi is running since the socket is present but not answering - we have to understand why! Here some questions:
Another possibility is that roudi was somehow blocked by the blocking policy, I will dig into this and let you know but it would be very helpful when you could provide me in the meantime some hints by answering the questions from above. |
I digged a little around and when a blocking publisher is unable to send data it enters a busy loop, which is perfect for latency but horrible for the CPU load. So when the subscriber is much slower than the publisher you should see a cpu load spike to 100% in a system monitor like Could you implement your system without blocking by decreasing the publisher frequency and increasing the subscriber queue size? |
Thanks for the feedback!
Requirements are that publisher frequency is fixed, data can't be lost and consumers must keep up. If we get to a blocking halt something is wrong and thats a critical error. I wouldn't see the above symptoms in that case would I ? |
I think the 20 producer are the cause of your issue. I will implement a smarter waiting mechanism in the next days to solve this problem once and for all. I will ping you when the PR is out. The |
…lures from duration Signed-off-by: Christian Eltzschig <me@elchris.org>
Signed-off-by: Christian Eltzschig <me@elchris.org>
Signed-off-by: Christian Eltzschig <me@elchris.org>
Signed-off-by: Christian Eltzschig <me@elchris.org>
…readme and release notes Signed-off-by: Christian Eltzschig <me@elchris.org>
…readme and release notes Signed-off-by: Christian Eltzschig <me@elchris.org>
…readme and release notes Signed-off-by: Christian Eltzschig <me@elchris.org>
…readme and release notes Signed-off-by: Christian Eltzschig <me@elchris.org>
we experienced the very same issue this morning (10publishers 1subscriber) and nothing (rm -r /tmp/roudi & restarting roudi, subscribers and publishers) but a reboot solved it. |
@elfenpiff many thanks. I'll monitor it |
… much simpler Signed-off-by: Christian Eltzschig <me@elchris.org>
… much simpler Signed-off-by: Christian Eltzschig <me@elchris.org>
…atory doxygen comments to adaptive_wait and duration. Add test to verify increasing wait of adaptive_wait Signed-off-by: Christian Eltzschig <me@elchris.org>
…atory doxygen comments to adaptive_wait and duration. Add test to verify increasing wait of adaptive_wait Signed-off-by: Christian Eltzschig <me@elchris.org>
Signed-off-by: Christian Eltzschig <me@elchris.org>
Signed-off-by: Christian Eltzschig <me@elchris.org>
@niclar Could you please try your setup with the newest master. Your problem should be solved now, if not please reopen this issue again. |
@elfenpiff I can't seem to be able to reopen this issue. But we just experienced this issue again with the HEAD 7ef8462. Publisher: -we had a few "Version mismatch" in RouDi earlier today, but that's fixed, and shouldn't matter I reckon. -No more publishers can join and introspection also bails
|
@niclar It may be possible that some weird high cpu load is somewhere present. Could you please send me the output of |
Another problem could be that you have a lot of applications and you try to start them all at once. Then your system load increases suddenly and RouDi does not get enough cpu time to answer all of your requests. One simple solution could be to start all applications sequentially with one or two seconds sleep in between. Then all applications should have enough time to register and roudi should get enough cpu time to handle them. Furthermore, could you please start roudi with |
@elfenpiff starting roudi with the highest priority seems to have remedied the issue. I'll close the issue (and re-open it with debug output if we encounter it again) Thanks for your support |
This issue occurs for us on some nodes but not others, but as soon as it occurs on one all become non-functional. It seems to happen more-so when we autostart the set of nodes at boot, beginning execution after multi-user.target at the default user session target. Manually restarting iox-roudi along with all the nodes sometimes solves the issue, but it is in no way deterministic. Have also tried adjusting the memory pools, but iox-introspection shows no pool exhausted. Also built iox-roudi with various permutations of the build flags https://github.com/eclipse-iceoryx/iceoryx/blob/master/doc/website/advanced/configuration-guide.md, which solved some issues with port exhaustion, but no combination made this issue reliably go away. Further, no matter how it was tweaked, warnings about too many chunks being held in parallel persisted. What do you suspect the root cause of this issue with Roudi and the domain sockets is? I saw a suggestion about changing Roudi's priority, but even if that worked, it strikes me as incredibly non-deterministic, which unfortunately negates some of the key touted advantages of Iceoryx. For context we run the ROS2 Nav2 stack, along with 4 Intel Realsense cameras, 4 other additional cameras, and a few other low-bandwidth nodes. The issue does seem to happen less frequently when not running the Nav2 stack, I suspect because it's quite heavy on pub/sub connections. |
@ciandonovan, "too many chunks being held in parallel persisted." -is a different issue/limit, change IOX_MAX_CHUNKS_ALLOCATED_PER_PUBLISHER_SIMULTANEOUSLY and/or IOX_MAX_CHUNKS_HELD_PER_SUBSCRIBER_SIMULTANEOUSLY respectively |
Increasing IOX_MAX_CHUNKS_ALLOCATED_PER_PUBLISHER_SIMULTANEOUSLY from the default 8 to 16 causes publishing and subscribing to silently hang, although listing topics still works fine interestingly. Increasing IOX_MAX_CHUNKS_HELD_PER_SUBSCRIBER_SIMULTANEOUSLY from the default 256 to 512 causes a similar issue with Resetting both to their defaults restores basic functionality. Could be an issue with the RMW. |
Hi, we just experienced our first roudi (v.2.0.0) outage. Ubuntu 20.04 LTS. clang 14
most of the clients worked but two of the publishing clients received the below, & the introspection program did not work, a machine restart was needed to resolve it;
Any pointers as to why ? Pub/sub is setup as;
publisherOptions.subscriberTooSlowPolicy = iox::popo::ConsumerTooSlowPolicy::WAIT_FOR_CONSUMER;
subscriberOptions.queueFullPolicy = iox::popo::QueueFullPolicy::BLOCK_PRODUCER;
/mnt/c/src/thirdparty/vcpkg/buildtrees/iceoryx/src/bfd6602e5f-2435b68bfd.clean/iceoryx_hoofs/source/posix_wrapper/unix_domain_socket.cpp:249 { cxx::expected iox::posix::UnixDomainSocket::timedSend(const std::string &, const units::Duration &) const -> iox_sendto } ::: [ 11 ] Resource temporarily unavailable
2022-05-04 06:53:18.758 [ Fatal ]: Timeout registering at RouDi. Is RouDi running?
2022-05-04 06:53:18.759 [ Error ]: ICEORYX error! IPC_INTERFACE__REG_ROUDI_NOT_AVAILABLE
libc++abi: terminating
/Thanks
The text was updated successfully, but these errors were encountered: