`Transport` trait definition and in-memory implementation #363

akoshelev · 2022-12-23T01:10:00Z

As we discussed in #352, we need another abstraction on the network layer that is capable of sending messages and delivering events. This change brings the trait defition to main branch along with in-memory implementation for it (both @thurstonsand and I were working on it). TestWorld have been migrated to use transport layer too - hence the size of this change. The reason why I want to merge it to main is because @martinthomson, @andyleiserson are working on the same codebase at the moment and merging it later will be super painful.

We can focus this review around Transport trait definition, InMemoryTransport and Network structs. Other changes are pretty mechanical.

Some hooks to make it compile but nothing really works until I have in memory transport implementation

I really need to change `TransportCommand` definition, so interim commit is required

They require `Transport` adoption

Also fix a reordering bug inside the `Gateway` - you must wait until `Gateway` subscribes to the given query before returning

but it doesn't. Abort simply puts task into detached state and does not cause drop.

akoshelev · 2022-12-23T01:12:31Z

benches/oneshot/ipa.rs

@@ -9,7 +9,7 @@ async fn main() -> Result<(), Error> {
    let mut config = TestWorldConfig::default();
    config.gateway_config.send_buffer_config.items_in_batch = 1;
    config.gateway_config.send_buffer_config.batch_count = 1000;
-    let world = TestWorld::new_with(config);
+    let world = TestWorld::new_with(config).await;


lots of files changed just because TestWorld constructor is async now. It must be async because Transport is capable of serving multiple queries running in parallel (i.e. there must be an event loop somewhere) and Gateway needs to subscribe and wait until Transport acknowledges the request to route query-specific commands to that gateway

akoshelev · 2022-12-23T01:17:14Z

src/helpers/messaging.rs


        let control_handle = tokio::spawn(async move {
            const INTERVAL: Duration = Duration::from_secs(3);

            let mut receive_buf = ReceiveBuffer::default();
            let mut send_buf = SendBuffer::new(config.send_buffer_config);
-
-            let sleep = ::tokio::time::sleep(INTERVAL);
+            let mut pending_sends = FuturesUnordered::new();


send may block if flow control is in place. this allows event loop to continue serving other traffic if send is blocked. There will be at most one in-flight send

akoshelev · 2022-12-23T01:19:15Z

src/helpers/mod.rs

+/// resolve this identifier into something (Uri, encryption keys, etc) must consult configuration
+#[derive(Debug, Clone, Eq, PartialEq, Hash)]
+pub struct HelperIdentity {
+    id: u8,


as we discussed, helper identity needs to be an opaque identifier. Maybe u8 is not the right choice for it, once we shape out helper config, we may decide to change it. but something lightweight is needed (this struct must be cloned often)

akoshelev · 2022-12-23T01:23:03Z

src/helpers/network.rs

+    /// if `roles_to_helpers` does not have all 3 roles
+    pub async fn send(&self, message_chunks: MessageChunks) -> Result<(), Error> {
+        let (channel, payload) = message_chunks;
+        let destination = self.roles.identity(channel.role);


this is the key responsibility of this struct: resolve helper identifiers (something that transport understand) to roles (infra) before sending them up

akoshelev · 2022-12-23T01:23:45Z

src/helpers/old_network.rs

+
+/// Network interface for components that require communication.
+#[async_trait]
+pub trait Network: Sync {


everything in this file was moved from original network and is deprecated now

akoshelev · 2022-12-23T01:24:33Z

src/helpers/time.rs

+/// make it a no-op.
+#[cfg(not(all(test, feature = "shuttle")))]
+#[pin_project::pin_project]
+pub struct Timer {


I realized that shuttle tests are broken again after I added tokio sleep because it requires Tokio runtime. So I had to add a wrapper to make it work in both environments

akoshelev · 2022-12-23T01:27:04Z

src/net/discovery/mod.rs

@@ -29,20 +29,11 @@ pub mod peer {
    #[derive(Clone, Debug)]
    #[cfg_attr(feature = "enable-serde", derive(serde::Deserialize))]
    pub struct Config {
-        #[cfg_attr(feature = "enable-serde", serde(deserialize_with = "uri_from_str"))]
+        #[cfg_attr(feature = "enable-serde", serde(with = "crate::uri"))]


URI struct serialization is needed in more than one place

martinthomson

LGTM

src/test_fixture/transport/network.rs

martinthomson · 2022-12-23T02:57:12Z

src/test_fixture/transport/routing.rs

+            loop {
+                ::tokio::select! {
+                    Some(command) = rx.recv() => {
+                        match command {


Once we get this deep, it might pay to have a command dispatch function.

very likely yes, I would probably postpone that change till I implement query management commands - would have a more clear picture how it would look like

martinthomson · 2022-12-23T02:58:33Z

src/test_fixture/transport/routing.rs

+    /// Starts listening to the incoming messages in a separate task. Can only be called once
+    /// and only when it is in the `Idle` state.
+    pub fn listen(&mut self) {
+        let State::Idle(mut rx, peers) = std::mem::replace(&mut self.state, State::Preparing) else {


I'm a fan of having this be impossible; that is, you have a type that manages preparation with a consuming function that does the preparation and returns the final, functional entity.

That pattern tends to bubble upwards, but I would be inclined to let it.

100% agree, it is me being lazy again

martinthomson · 2022-12-23T02:59:08Z

src/test_fixture/transport/routing.rs

+
+impl Switch {
+    pub fn new(id: HelperIdentity) -> Self {
+        let (tx, rx) = mpsc::channel(1);


Some amount of buffering here is likely to be very useful.

thurstonsand

looks good, and i like the changes you made to my code

thurstonsand · 2022-12-23T03:41:31Z

src/helpers/mod.rs

+    #[must_use]
+    pub fn role(&self, id: &HelperIdentity) -> Role {
+        for (idx, item) in self.helper_roles.iter().enumerate() {
+            if item == id {


maybe you could use .find here? easier to read than early return.

alternatively, would it be worth it to flip the map (HelperIdentity -> Role) and store that for usage instead of a short loop every time?

I am not sure I can make .find work because it returns the value but I need an index here. itertools can do it but I am still hesitant to bring it :(

wrt to reverse - this lookup is basically free, the whole array fits inside L1 entry.

yeah, that's fine. minor detail

src/helpers/network.rs

thurstonsand · 2022-12-23T03:49:12Z

src/helpers/transport/error.rs

+#[cfg(any(test, feature = "test-fixture"))]
+impl From<tokio::sync::mpsc::error::SendError<TransportCommand>> for Error {
+    fn from(value: tokio::sync::mpsc::error::SendError<TransportCommand>) -> Self {
+        Self::SendFailed {


yeah this is easier than what i had

not sure if it is going to be good enough for long term, I'd like to see the destination in the error message. but good enough for now I think

src/helpers/transport/mod.rs

thurstonsand · 2022-12-23T03:51:21Z

src/helpers/transport/mod.rs

+    /// Query/step data received from a helper peer.
+    /// TODO: this is really bad for performance, once we have channel per step all the way
+    /// from gateway to network, this definition should be (QueryId, Step, Stream<Item = Vec<u8>>) instead
+    StepData(QueryId, Step, Vec<u8>),


so how will QPL respond to commands?

I am thinking that we would need something at the TransportCommand layer, but not sure. I decided to keep it out of scope to make this review easier

thurstonsand · 2022-12-23T03:53:54Z

src/test_fixture/transport/mod.rs

+        // during runtime shutdown. Other schedulers (ahem shuttle) may not do that and what
+        // happens is 3 switch tasks remain blocked awaiting messages from each other. In this
+        // case a deadlock is detected. Hence this code just tries to explicitly close the switch
+        // but because async drop is not a thing yet, we must hot loop here to drive it to completion


I switched to use weak references and it looks much better now - I don't need this hack anymore

I think the semantic is correct - once test world goes out of scope, it invalidates all transport references

akoshelev

Thanks @thurstonsand and @martinthomson for prompt review!

akoshelev · 2022-12-23T04:34:07Z

src/helpers/mod.rs

+    #[must_use]
+    pub fn role(&self, id: &HelperIdentity) -> Role {
+        for (idx, item) in self.helper_roles.iter().enumerate() {
+            if item == id {


I am not sure I can make .find work because it returns the value but I need an index here. itertools can do it but I am still hesitant to bring it :(

wrt to reverse - this lookup is basically free, the whole array fits inside L1 entry.

src/helpers/network.rs

akoshelev · 2022-12-23T05:15:30Z

src/helpers/transport/error.rs

+#[cfg(any(test, feature = "test-fixture"))]
+impl From<tokio::sync::mpsc::error::SendError<TransportCommand>> for Error {
+    fn from(value: tokio::sync::mpsc::error::SendError<TransportCommand>) -> Self {
+        Self::SendFailed {


not sure if it is going to be good enough for long term, I'd like to see the destination in the error message. but good enough for now I think

src/helpers/transport/mod.rs

akoshelev · 2022-12-23T05:17:41Z

src/helpers/transport/mod.rs

+    /// Query/step data received from a helper peer.
+    /// TODO: this is really bad for performance, once we have channel per step all the way
+    /// from gateway to network, this definition should be (QueryId, Step, Stream<Item = Vec<u8>>) instead
+    StepData(QueryId, Step, Vec<u8>),


I am thinking that we would need something at the TransportCommand layer, but not sure. I decided to keep it out of scope to make this review easier

akoshelev · 2022-12-23T05:18:37Z

src/test_fixture/transport/mod.rs

+        // during runtime shutdown. Other schedulers (ahem shuttle) may not do that and what
+        // happens is 3 switch tasks remain blocked awaiting messages from each other. In this
+        // case a deadlock is detected. Hence this code just tries to explicitly close the switch
+        // but because async drop is not a thing yet, we must hot loop here to drive it to completion


I switched to use weak references and it looks much better now - I don't need this hack anymore

akoshelev · 2022-12-23T06:31:53Z

src/test_fixture/transport/routing.rs

+    /// Starts listening to the incoming messages in a separate task. Can only be called once
+    /// and only when it is in the `Idle` state.
+    pub fn listen(&mut self) {
+        let State::Idle(mut rx, peers) = std::mem::replace(&mut self.state, State::Preparing) else {


100% agree, it is me being lazy again

akoshelev · 2022-12-23T06:32:52Z

src/test_fixture/transport/routing.rs

+            loop {
+                ::tokio::select! {
+                    Some(command) = rx.recv() => {
+                        match command {


very likely yes, I would probably postpone that change till I implement query management commands - would have a more clear picture how it would look like

thurstonsand

yeah way easier to understand

src/test_fixture/transport/network.rs

Thurston Sandberg and others added 23 commits December 22, 2022 00:28

first draft of network + transport

3299c4d

move ChannelId, MessageChunks

6459371

better roles_to_helpers

3ecabfb

make HelperIdentity more opaque

675e839

Make network struct use transport

c08f599

Some hooks to make it compile but nothing really works until I have in memory transport implementation

First look at InMemoryTransport

a25d210

I really need to change `TransportCommand` definition, so interim commit is required

InMemoryTransport is almost functional

36a9d8d

Everything works except role resolving

8deb0ef

Now everything works except e2e tests

e65f6e7

Some code cleanup

aa5675d

Temporarily disable e2e tests

1b438d4

They require `Transport` adoption

Lints and formatting

d4d1f8b

rename some things

3f5d719

Make shuttle code compile and run the tests

fa9ffa1

Also fix a reordering bug inside the `Gateway` - you must wait until `Gateway` subscribes to the given query before returning

Lots of changes trying to make shuttle work

44aa3ae

but it doesn't. Abort simply puts task into detached state and does not cause drop.

Mamma mia, it works now

223dd43

Lint and formatting

572f055

Fix benchmarks

7f76a77

Fix concurrency tests

038f0ab

Some more fixes

ff2b5b2

Remove commented code

2b6d288

TryInto instead of Into for identity

49eea5b

Doc changes

df2ced1

akoshelev commented Dec 23, 2022

View reviewed changes

akoshelev marked this pull request as ready for review December 23, 2022 01:56

martinthomson approved these changes Dec 23, 2022

View reviewed changes

thurstonsand approved these changes Dec 23, 2022

View reviewed changes

akoshelev added 3 commits December 22, 2022 20:24

Use weak references instead of halt hack

f5267f7

I think the semantic is correct - once test world goes out of scope, it invalidates all transport references

Remove NetworkEventData

562939d

Simplify Switch and InMemoryNetwork

522d2e5

akoshelev commented Dec 23, 2022

View reviewed changes

thurstonsand approved these changes Dec 23, 2022

View reviewed changes

src/test_fixture/transport/network.rs Outdated Show resolved Hide resolved

Remove commented code

e5bdcd1

akoshelev merged commit 82f7a48 into private-attribution:main Dec 23, 2022

This was referenced Jan 7, 2023

Enable concurrency testing #374

Merged

[Bug] wrong permutation indices when task scheduling reordering occurs #256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Transport` trait definition and in-memory implementation #363

`Transport` trait definition and in-memory implementation #363

akoshelev commented Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

martinthomson left a comment

martinthomson Dec 23, 2022

akoshelev Dec 23, 2022

martinthomson Dec 23, 2022

akoshelev Dec 23, 2022

martinthomson Dec 23, 2022

thurstonsand left a comment

thurstonsand Dec 23, 2022

akoshelev Dec 23, 2022

thurstonsand Dec 23, 2022

thurstonsand Dec 23, 2022

akoshelev Dec 23, 2022

thurstonsand Dec 23, 2022

akoshelev Dec 23, 2022

thurstonsand Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev left a comment

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

akoshelev Dec 23, 2022

thurstonsand left a comment

Transport trait definition and in-memory implementation #363

Transport trait definition and in-memory implementation #363

Conversation

akoshelev commented Dec 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinthomson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thurstonsand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akoshelev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thurstonsand left a comment

Choose a reason for hiding this comment

`Transport` trait definition and in-memory implementation #363

`Transport` trait definition and in-memory implementation #363