Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downstream next event tag (DNET), a new signal for more efficient centralized federated execution #349

Open
wants to merge 52 commits into
base: main
Choose a base branch
from

Conversation

byeonggiljun
Copy link
Collaborator

@byeonggiljun byeonggiljun commented Feb 5, 2024

This PR replaces #176 and #337.

Relevant discussion and issue: lf-lang/lingua-franca#1626, #264

A companion PR in lingua-franca: lf-lang/lingua-franca#2400
A companion PR in reactor-ts: lf-lang/reactor-ts#296

In this PR, the new message type, downstream next event tag (DNET), is introduced to reduce unnecessary NET signals. This signal is sent from the RTI to a federate to notify it that NET tags below some threshold are unnecessary, and not needed by its downstream federates.

Suppose the RTI is computing the DNET tag for a federate A where A has one downstream federate B. The RTI
uses the minimum delay between the federates and B's NET value to compute the DNET tag. For example, suppose the delay from A to B is NEVER and A has an event every 10 ms (0, 10 ms, ...). Also, assume B's next event is at (100 ms, 0). The RTI knows B's next earliest event tag by finding the minimum tag of the most recent NET_B and the head of the in-transit message queue of B is (100 ms, 0).

In this case, the RTI can send DNET (100 ms, 0) to A. Based on the DNET signal, if A is not producing outputs, A does not need to send any NET signals with tags earlier than or equal to (100 ms, 0) (NET with 10 ms, 20 ms, ..., 100 ms) because the RTI cannot send TAG (100 ms, 0) using those NET signals. The RTI can grant TAG (100 ms, 0) to B after receiving NET (110 ms, 0) from A.

Additionally, in this PR, the RTI computes the tag of the signal tag advance grant (TAG) with the earliest incoming message tag (EIMT) to prevent blocking a federate that is eligible to advance its tag. Currently, if a federate has its next event scheduled at (100 ms, 0) and its EIMT is (200 ms, 1), the RTI sends TAG (100 ms, 0). However, this incurs unnecessary NET signals if the federate has an event at the tag (200 ms, 0). So in this PR, the RTI sends TAG (200 ms, 0) (the latest earliest tag of EIMT) so that the federate can execute any potential events with tags earlier than (200 ms, 1) as well as the event at (100 ms, 0).

This PR also contains some refactoring for the data structure used by the RTI to store delays between federates. A two-dimensional matrix stores the delays now.

byeonggiljun and others added 21 commits January 31, 2024 14:14
…ery possible connection

This is a preparation step for the DNET message calculation. This matrix
enables each federate to search min_delays from upstreams as well as to downstreams.

Use a matrix instead of multiple arrays to store minimum delays of every possible connection

This is a preparation step for the DNET message calculation. This matrix
enables each federate to search min_delays from upstreams as well as to downstreams.
@byeonggiljun byeonggiljun added enhancement Enhancement of existing feature feature New feature federated labels Apr 22, 2024
@byeonggiljun byeonggiljun changed the title Draft: Introduce Downstream Next Event Tag message (DNET) to optimize the communication in centralized federated execution A new signal downstream next event tag (DNET) for more efficient centralized federated execution Aug 14, 2024
@byeonggiljun byeonggiljun changed the title A new signal downstream next event tag (DNET) for more efficient centralized federated execution Downstream next event tag (DNET), a new signal for more efficient centralized federated execution Aug 14, 2024
@byeonggiljun byeonggiljun force-pushed the rti-DNET branch 2 times, most recently from 121e21f to 79e2fdb Compare August 15, 2024 18:16
@byeonggiljun byeonggiljun marked this pull request as ready for review August 15, 2024 18:48
Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need more time to review this, but here is a starting point. This is looking very good so far. I have a few suggestions, some of which are just nits and you should feel free to ignore them. I will follow up with a more complete review as soon as I can.

core/federated/RTI/main.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.h Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.h Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.h Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.h Outdated Show resolved Hide resolved
core/tag.c Outdated Show resolved Hide resolved
@edwardalee
Copy link
Contributor

The example given in the PR comment is confusing. It says:

For example, if the delay from A to B is NEVER and B's next event is scheduled at (100 ms, 0), the RTI can send DNET (100 ms, 0) to A; if A sends NET (100 ms, 0) to the RTI, the RTI cannot send TAG (100 ms, 0), and if A sends NET (100 ms, 1), the RTI can grant TAG (100 ms, 0) to B.

What does it mean "B's next event is scheduled at (100 ms, 0)"? I think you mean something like "The most recently received NET from B is (100 ms, 0)"? Also, doesn't this have to be transitive NET?

The rest of the example is irrelevant to this PR. The statement it makes is true without this PR and has always been true. Can you instead give an example that actually uses this PR?

Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the n^2 complexity and memory cost. I've added some suggestions, but I think the all_upstream, all_downstream, and min_delay arrays all have n^2 complexity in the worst case and they seem redundant. I'm suggesting consolidating to just use the min_delay array for all these purposes.

core/federated/RTI/main.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.c Outdated Show resolved Hide resolved
core/federated/RTI/rti_common.h Outdated Show resolved Hide resolved
@byeonggiljun
Copy link
Collaborator Author

Hello, @edwardalee.

What does it mean "B's next event is scheduled at (100 ms, 0)"? I think you mean something like "The most recently received NET from B is (100 ms, 0)"? Also, doesn't this have to be transitive NET?

What I meant was "the minimum tag of the most recent NET_B and the head of the in-transit message queue of B is (100 ms, 0)" as the RTI uses in-transit message queue as well as NET from B to predict B's next earliest event. I'll modify the comment to clarify it.

Also, doesn't this have to be transitive NET?

I don't get what transitive NET is. Could you please elaborate on it?

The rest of the example is irrelevant to this PR. The statement it makes is true without this PR and has always been true. Can you instead give an example that actually uses this PR?

I tried to explain how a DNET signal is used in the example. Could you read it and share your thoughts?

@edwardalee
Copy link
Contributor

OK, the description is much better now. I would suggest a slight enhancement. Instead of:

"Based on the DNET signal, A does not send any NET signals with tags earlier than or equal to (100 ms, 0) (NET with 10 ms, 20 ms, ..., 100 ms)"

you could say:

"Based on the DNET signal, if A is not producing outputs, A does not need to send any NET signals with tags earlier than or equal to (100 ms, 0) (NET with 10 ms, 20 ms, ..., 100 ms)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement of existing feature feature New feature federated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants