Skip to content

Conversation

@okdistribute
Copy link

@okdistribute okdistribute commented Sep 30, 2020

This is a v1 prototype of Mapeo Web where each mapeo sync instance is coordinated by a single node.js HTTP server.

  1. Server loads all the project keys it is able to replicate.
  2. Server starts listening for HTTP requests on HOST/project/DISCOVERY_KEY/sync
  3. Client pings /project/DISCOVERY_KEY/sync
    1. Server starts a new Mapeo Core subprocess (this could be a Lambda function or otherwise at some point)
    2. Server returns PORT to client
  4. Client starts replicating with HOST and PORT using mapeo.sync.replicateNetwork over a raw TCP socket

TODO

  • If a new Mapeo fails to get it's own lock on the database, that means there is a stale Node.js process that is running and maintaining a lock. Ensure that the old lock/pid is killed and try to start Mapeo again. (if we are using Lambda functions, is this a moot point?)
  • Write best practice for clients; they should retry with exponential backoff in case he server is restarting or had a recoverable error

@gmaclennan
Copy link
Member

gmaclennan commented Oct 1, 2020

Great start on this @okdistribute. I am a bit nervous about the approach of pinging to get a port and then connecting, because it opens up several possibilities of race conditions where the server may not have yet started listening, or has stopped listening, when the client actually connects, and it's not clear what guarantees there are after the ping that the port will remain available.

I wrote some ideas about how to use http for this in the google doc. Another option using pure tcp (this would require the sync protocol to always start with the discovery key, like the DAT protocol):

  1. TCP server running on fixed port + (if needed) discover swarm announcing all discovery keys
  2. Client starts sync with server hostname + Port.
  3. Server reads first message of incoming TCP socket to get discovery key (e.g. with buffer-peek-stream)
  4. Server gets project key from discovery key, creates mapeo instance.
  5. Server passes socket to mapeo.sync.onConnection()

This would avoid the coordination between the HTTP GET and opening and closing ports, and the risks of race conditions.

I like this approach with direct TCP connections because it appears to be the most efficient, however it does come with a few caveats vs. HTTP + websockets:

  1. We need to implement our own mechanism in the TCP protocol for returning error codes to the client
  2. If we want to add some kind of authentication we would need to implement that ourselves (although we could likely use the discovery key / project key as auth, so this is likely not an issue)
  3. The "web-facing" server would always need to be hosted and managed by us, even if we then hand-off sync processes to something else
  4. We wouldn't be able to use AWS Lambda or similar, since most only expose an HTTP interface.

I think all of those caveats are surmountable, and I'm currently on the fence about how the pros and cons weigh up, because the direct TCP approach has less overhead and likely works better over flaky connections. Would love to hear your and @noffle's reflections on this.

@okdistribute
Copy link
Author

okdistribute commented Oct 1, 2020

@gmaclennan The race condition is avoided by waiting for the HTTP request to return until the tcp server is listening, in subprocess.on('message', cb).

I think we should use direct TCP connections unless we find an issue with direct TCP on remote mobile/sat connections in the field.

If you do decide you want to go with HTTP/ws, the mapeo core onConnection function expects a TCP socket as a parameter, so you would have to ensure there is a wrapper for an HTTP/websocket that behaves the same as a TCP socket or further refactor out each stage in the peer connection state machine to make it not so reliant on TCP socket behaviors.

Co-authored-by: Gregor MacLennan <gmaclennan@digital-democracy.org>
@gmaclennan
Copy link
Member

gmaclennan commented Oct 1, 2020

@gmaclennan The race condition is avoided by waiting for the HTTP request to return until the tcp server is listening, in subprocess.on('message', cb).

We need to be clear about the guarantees - how long after the http request will the tcp server continue listening?

If you do decide you want to go with HTTP/ws, the mapeo core onConnection function expects a TCP socket as a parameter, so you would have to ensure there is a wrapper for an HTTP/websocket that behaves the same as a TCP socket or further refactor out each stage in the peer connection state machine to make it not so reliant on TCP socket behaviors.

I believe onConnection just expects a duplex stream. I don't see anything in the code that uses anything other than that, so a duplex stream from a websocket connection would be fine. But yes, over http has caveats too.

@okdistribute
Copy link
Author

@gmaclennan The race condition is avoided by waiting for the HTTP request to return until the tcp server is listening, in subprocess.on('message', cb).

We need to be clear about the guarantees - how long after the http request will the tcp server continue listening?

With this code, it'll continue listening indefinitely. I figure we will want to have a garbage collection process after X minutes without an active synchronization.

If you do decide you want to go with HTTP/ws, the mapeo core onConnection function expects a TCP socket as a parameter, so you would have to ensure there is a wrapper for an HTTP/websocket that behaves the same as a TCP socket or further refactor out each stage in the peer connection state machine to make it not so reliant on TCP socket behaviors.

I believe onConnection just expects a duplex stream. I don't see anything in the code that uses anything other than that, so a duplex stream from a websocket connection would be fine. But yes, over http has caveats too.

We have no tests either automated or in the field using this mapeo core code over http/ws so I think it would require extra time and care to go that route to ensure there are no unknown unknowns.

@okdistribute
Copy link
Author

okdistribute commented Oct 1, 2020

The lifecycle here will need to be


| Discovery | -> | Encryption | -> | Handshake | -> | Replication |


Discovery is the process by which a peer can know other peers it is able to connect to. Discovery now is only MDNS. We need a way to discover the host and port of the Digital Democracy Mapeo Web server.

The most simple (centralized) way to do this is to hardcode a single server and port (in this case, lets say https://web.mapeo.app:433 or tcp:192.1.1.4:4838). This web server knows the host and port of where to find many Mapeo Core instances. This is very similar to the DNS discovery mechanism in discovery-swarm. (In case this is being seen as documentation for Mapeo Web, I strongly advise against using discovery swarm in production for over-the-internet use cases. There have been many issues documented and it has since been deprecated.)

Having a DNS hostname is better in case the IP needs to change for some reason. It is much easier to maintain long term for client versions that may be long-lasting in the field.

Having a port that is fixed (.e.g, HTTPS) is nice for similar reasons. It also adds extra security which is great.

When we have NOISE in the new mapeo protocol, this is less important as we can then have a more secure distributed discovery mechanism (e.g., Kademlia) 👍 (NOISE was implemented specifically as a TLS-like protocol for p2p discovery).

For these reasons I think HTTPS is a good bet for the discovery phase, until we have the new Mapeo protocol and can think about more sophisticated methods.

The "web-facing" server would always need to be hosted and managed by us, even if we then hand-off sync processes to something else

Correct. As long as we use a centralized discovery mechanism, we will have to maintain it. This discovery mechanism creates reliance on a single server to tell the Mapeo client where they will find the peer they're looking for. This causes long-term maintenance concerns where we will need to maintain that particular host and port for as long as that mapeo client is in use.

This can be avoided by using a distributed discovery mechanism, like a Kademlia DHT, which allows you to specify a list of bootstrap servers. However, at least one of these servers will need to be maintained long-term. We can depend upon a larger community for this, but clients that have this hardcoded list will always necessarily need at least one to be online in order to find peers. Sadly, we're probably stuck maintaining something, even if it's just a discovery service. This can be mitigated in the future by allowing users to specify their own discovery & backup services in their mapeo project configuration, and making those easy to host yourself. I think this is critical for user agency and should be considered early on in the product roadmap for mapeo web & project configuration pages, as often these things are thought about too late when money runs out or otherwise and then users are stuck.

Server reads first message of incoming TCP socket to get discovery key (e.g. with buffer-peek-stream)
Server gets project key from discovery key, creates mapeo instance.

To know what host/port to give to the client, the server needs to know which discovery key the client is looking for (this is necessarily the case with all discovery services, (M)DNS, DHT, or otherwise). Sending the first message of the incoming TCP socket is one option. This would mean that state would need to be kept on the server side about which sockets correspond with which clients. Mapeo web would then need to spin up the correct instance and then start replicating with the already opened tcp socket. It also means that spinning off mapeo core instances into Lambda or other one-off compute services would be more complex. This would require that the same discovery server has access to the mapeo core instance on the same machine, or will have to pipe that information from another process or over another network. I don't think this will be very efficient in practice and adds extra complexity to the discovery service.

When the discovery server is asked to simply route clients to the correct TCP host/port, we separate concerns between routing requests and maintaining mapeo core sync services. It gives flexibility to the discovery service to change the way it's implemented over time, either through a DHT or otherwise. It also gives us the ability to route incoming clients to a list of third-party host/ports (rather than just one that we maintain). This opens the door to having other organizations have self-hosted mapeo web storage instances that are long-running, as well as being forwards-compatible with peer-to-peer network replication (2+ syncs at once) which hypercore does already support.

If we want to add some kind of authentication we would need to implement that ourselves (although we could likely use the discovery key / project key as auth, so this is likely not an issue)

Encryption for multifeed already does this, transporting the discovery key of the project in the first message of hypercore sent over the TCP connection, called a Feed message. This prevents replication (of both media and db) if the discovery keys do not match. Authentication then in the Mapeo protocol (both old and new) is enforced using the project key shared out of band. The media data is not currently encrypted but this will be fixed in the new protocol.

We need to implement our own mechanism in the TCP protocol for returning error codes to the client

If there is a failure in starting up a mapeo core instance, this can be first handled in the client as a retry with exponential backoff. If multiple retries fail to connect to the remote server, the client can safely assume that the server was unable to start itself. The server should send any errors to a notification service so we are aware of the issue and can start. The client can be informed to try again later. Any useful information in an error would be extra state to handle on the server and client side and might not give that much information to the client anyway, that we couldn't pick up faster from a server-side notification service, or we would get those notifications twice from the client and server. Any client side only errors can be sent via bugsnag which is already implemented. Therefore, I do not think this would be worth our time to implement.

@gmaclennan
Copy link
Member

Thanks for all the info @okdistribute. I'm not sure if I did a good explanation about what I meant in #3 (comment) about sync without the discovery step. I created a quick bit of example code (just using hypercore, but with Mapeo it would be the same principle) of a server listening on a single TCP port which is used for all sync requests: https://github.com/gmaclennan/hypercore-sync-server-test

@okdistribute
Copy link
Author

okdistribute commented Oct 2, 2020

@gmaclennan yes this is what I thought you were talking about.

How would you scale this?

My concern from above

spinning off mapeo core instances into Lambda or other one-off compute services would be more complex. That would require that the mapeo web service must have access to the mapeo core instance on the same machine. If not, it will have to pipe that information from another process or over another network.

If scaling past a single machine or a single node process (memory etc) is not an issue anymore, then a single tcp server would work fine.

I do remember that @noffle said in practice we might have to restart the process periodically if it's all in one node process (due to the memory leak in multifeed).

@gmaclennan
Copy link
Member

@okdistribute my goal with this was thinking through an infrastructure model that would scale from a single server and process to multiple processes/servers. One reason I first suggested http is because this is a problem that has already been solved: how to manage and scale multiple http (over tcp) connections and run code according to parameters sent in the first chunk of the tcp socket (i.e. the http headers). I recognize the network overhead of http and websockets, so with the tcp code I was suggesting a similar model to how you would set up an http server with separation of concerns: one process that is exposed to the internet via a single port that manages connections and proxies them to a process running either on the same machine or somewhere else, e.g. this is how nginx and HAProxy work, or in serverless setups like AWS Lambda, the API Gateway manages the http connection and proxies it to the appropriate lambda function.

The tcp server in the code I shared does not need to run any Mapeo instances: it can be a small app that is responsible for accepting connections and connecting to a function or process that can carry out the sync:

  1. Client makes TCP connection to server
  2. Server reads key at start of stream, confirms whether server supports syncing that key
  3. Server starts a process to handle incoming socket, or connects to an existing one.
  4. sync happens over that same TCP connection.

The connection in (3) can happen in a few ways:

  1. If sync is to run in the same process, then you can just pipe the sync stream to the incoming socket
  2. If you start a new process, you can pass the socket to the process.
  3. If you run on a different server, you can create a TCP connection to that server and then pipe the incoming socket to the outgoing one.

My thoughts behind this approach is having a single place where mapeo instance lifecycles are managed: there is a single port exposed to the internet where clients connect to sync and the server proxies these a sync instance. This could initially be a function or process running on the same server, but if we wanted to scale we could either start other VPS, or create a duplex stream over a websocket to a serverless function (it would need to be over http I think since I'm not aware of serverless functions that support direct TCP connections).

The issue I remain concerned about with the approach in this PR is responsibility for managing the lifecycle of mapeo instances / sync processes. There is no central point of coordination, since the client makes a direct connection to the sync process and I'm not sure how the lifecycle of this would be controlled?

I'm also slightly concerned about proxies, either on the client side or the server, which might block direct TCP connections and non-standard ports. I think we need more information about this:

  1. Mobile networks, which often proxy outgoing connections and can limit ports
  2. Corporate LANs (convention centers etc)
  3. Serverless providers who have infrastructure behind HTTP reverse proxies e.g. Heroku etc.

"raw" tcp carries similar risks to the original approach I outlined with duplex over an http connection: a proxy between the client and the server may not allow the connection through. The nice thing about websockets is there is much broader support (finally, although it took some time). Websockets over http would also allow us, in the future, to run the sync process "on the edge" with something like Cloudflare Workers or AWS Lambda Edge, e.g. run the process in the CDN.

I do lean towards "raw" tcp is the way to go at this time, but with the code I was thinking through how we could proxy those connections in the future to connect to back-end services running on sync that might require a websocket connection, e.g. running sync on Lambda, e.g.

net.createServer(async (conn) => {
  const [pid, outputStream] = peekStream(conn, 32)
  ws = websocket('ws://mapeo.lambdaserver.com/[pid]')
  ws.pipe(outputStream).pipe(ws)
})

@okdistribute
Copy link
Author

okdistribute commented Oct 2, 2020

@gmaclennan yeah this is what i was saying about 'If not, it will have to pipe that information from another process or over another network.'

I'd like to go back to our discussion about high-level requirements. There are tradeoffs in all of these possibilities, and we still don't have a sense of how many projects we're willing to support in which time frames, which I think should guide the implementation.

Here's an example project plan that would be useful for this work. I'd expect Gregor would need to sign off on this, but perhaps other people from programme team would need to be involved.

v1

Months 1-12
* 10 projects
* Many mobile clients may not be able to be updated.
* All or most desktop clients could be updated if necessary.
* 5GB per project
* 10 syncronizations/week per project = 100 syncronizations per week
* Up to 4 simultaneous syncronizations at any given time.
* No peer to peer connections; all connections must be proxied through Mapeo Web

v2

Months 12-24
* 100 projects
* All of the applications will be updated to Mapeo Protocol v9
* 10GB per project
* 10 syncronizations/week per project = 1,000 syncronizations per week
* Up to 40 simultaneous syncronizations at any given time.
* Peer to peer connections possible on internet (wifi)
* Most Mobile (3g/Sat) connections will need to be proxied through Mapeo Web
* Self-hosting possible

v3

Months 25-36
* 250 projects
etc...

I think it would be wise to try to minimize the amount of code that needs to be changed in the clients over v1, v2, v3 of Mapeo Web.

I'm fine with using websockets for the Mapeo Web sync rather than tcp.

I'd say if you're married to coupling discovery, handshake, and replication all over a single server, it would be much simpler to use websockets (e.g., ws://web.mapeo.app/discovery_key) than hack our own TCP server over a single connection, which may be more complex to maintain. If we couple discovery and replication, using something like this custom routing tcp you wrote above, it will be incompatible with existing mapeo core sync. It would require changes to both client and server for sending and receiving the first 32 byte payload discovery key. If we want to enable direct p2p connections then this code will need to be re-decoupled again in all clients at some point.

Decoupling discovery and replication the way I've done in this pr would enable routing clients to other clients by changing the discovery mechanism in the server without having to upgrade clients at all! Remote mobile clients would be able to get this for free, without upgrading Mapeo, far into the future as long as they are still able to reach the mapeo web discovery server.

Understanding the roadmap of these features, especially p2p connections and self hosted instances, will help us understand the value of decoupling discovery and replication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants