Abstract

This specification explains WAKU-SYNC which enables the synchronization of messages between nodes storing sets of 14/WAKU2-MESSAGE

Specification

Waku Sync consists of two libp2p protocols: reconciliation and transfer. The Reconciliation protocol finds differences in sets of messages. The Transfer protocol is used to exchange the differences found with other peers. The end goal being that peers have the same set of messages.

Terminology

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC2119.

Reconciliation

Libp2p Protocol identifier: /vac/waku/reconciliation/1.0.0

The protocol finds differences between two peers by comparing fingerprints of ranges of message IDs. Ranges are encoded into payloads, exchanged between the peers and when the range fingerprints are different, split into smaller (sub)ranges. This process repeats until ranges include a small number of messages. At this point lists of message IDs are sent for comparison instead of fingerprints over entire ranges of messages.

Overview

The reconciliation protocol follows the following heuristic:

The requestor chooses a time range to sync.
The range is encoded into a payload and sent.
The requestee receives the payload and decodes it.
The range is processed and, if a difference with the local range is detected, a set of subranges are produced.
The new ranges are encoded and sent.
This process repeats while differences found are sent to the transfer protocol.
The synchronization ends when all ranges have been processed and no differences are left.

Message IDs

Message IDs MUST be composed of the timestamp and the hash of the 14/WAKU2-MESSAGE.

The timestamp MUST be the time of creation and the hash MUST follow the deterministic message hashing specification

This way the message IDs can always be totally ordered, first chronologically according to the timestamp and then disambiguated based on the hash lexical order in cases where the timestamp is the same.

Range Bounds

A range MUST consist of two IDs, the first bound is inclusive the second bound exclusive. The first bound MUST be strictly smaller than the second one.

Range Fingerprinting

The fingerprint of a range MUST be the XOR operation applied to the hash of all message IDs included in that range.

Range Type

Every range MUST have one of the following types; fingerprint, skip or item set.

Fingerprint type contains a fingerprint.
Skip type contains nothing and is used to signal already processed ranges.
Item set type contains message IDs and a resolved boolean.

Item sets are an optimization, sending multiple IDs instead of recursing further reduce the number of round-trips.

Range Processing

Ranges have to be processed differently according to their types.

Fingerprint ranges MUST be compared.
- Equal ranges MUST become skip ranges.
- Unequal ranges MUST be split into smaller fingerprint or item set ranges based on a implementation specific threshold.
Unresolved item set ranges MUST be compared, differences sent to the transfer protocol and marked resolved.
Resolved item set ranges MUST be compared, differences sent to the transfer protocol and become skip ranges.
Skip ranges MUST be merged with other consecutive skip ranges.

In the case where only skip ranges remains, the synchronization is done.

Delta Encoding

Ranges and timestamps MUST be delta encoded as follows for efficient transmission.

All ranges to be transmitted MUST be ordered and only upper bounds used.

Inclusive lower bounds can be omitted because they are always the same as the exclusive upper bounds of the previous range or zero.

To achieve this, it MAY be needed to add skip ranges.

For example, a skip range can be added with an exclusive upper bound equal to the first range lower bound. This way the receiving peer knows to ignore the range from zero to the start of the sync time window.

Every ID's timestamps after the first MUST be noted as the difference from the previous one. If the timestamp is the same, zero MUST be used and the hash MUST be added. The added hash MUST be truncated up to and including the first differentiating byte.

Timestamp	Hash	Timestamp (encoded)	Hash (encoded)
1000	0x4a8a769a...	1000	-
1002	0x351c5e86...	2	-
1002	0x3560d9c4...	0	0x3560
1003	0xbeabef25...	1	-

Varints

All varints MUST be little-endian base 128 variable length integers (LEB128) and minimally encoded.

Payload encoding

The wire level payload MUST be encoded as follow.

The & denote concatenation.

Refer to RELAY-SHARDING RFC for cluster and shard specification.

varint bytes of the node's cluster ID &
varint bytes of the node's number of shard supported &
varint bytes for each shard index supported &
varint bytes of the delta encoded timestamp &
if timestamp is zero, 1 byte for the hash bytes length & the hash bytes &
1 byte, the range type &
either
- 32 bytes fingerprint or
- varint bytes of the item set length & bytes of every items or
- if skip range, do nothing
repeat steps 4 to 7 for all ranges.

Transfer Protocol

Libp2p Protocol identifier: /vac/waku/transfer/1.0.0

The transfer protocol SHOULD send messages as soon as a difference is found via reconciliation. It MUST only accept messages from peers the node is reconciliating with. New message IDs MUST be added to the reconciliation protocol. The payload sent MUST follow the wire specification below.

Wire specification

syntax = "proto3";

package waku.sync.transfer.v1;

import "waku/message/v1/message.proto";

message WakuMessageAndTopic {
  // Full message content and associated pubsub_topic as value
  optional waku.message.v1.WakuMessage message = 1;
  optional string pubsub_topic = 2;
}

Implementation

The flexibility of the protocol implies that much is left to the implementers. What will follow is NOT part of the specification. This section was created to inform implementations.

Cluster & Shards

To prevent nodes from synchronizing messages from shard they don't support, cluster and shards information has been added to each payload. On reception, if two peers don't share the same set of shards the sync is aborted.

Parameters

Two useful parameters to add to your implementation are partitioning count and the item set threshold.

The partitioning count is the number of time a range is split. A higher value reduces round trips at the cost of computing more fingerprints.

The item set threshold determines when item sets are sent instead of fingerprints. A higher value sends more items which means higher chance of duplicates but reduces the amount of round trips overall.

Storage

The storage implementation should reflect the context. Most messages that will be added will be recent and removed messages will be older ones. When differences are found some messages will have to be inserted randomly. It is expected to be a less likely case than time based insertion and removal. Last but not least it must be optimized for fingerprinting as it is the most often used operation.

Sync Interval

Ad-hoc syncing can be useful in some cases but continuous periodic sync minimize the differences in messages stored across the network. Syncing early and often is the best strategy. The default used in Nwaku is 5 minutes interval between sync with a range of 1 hour.

Sync Window

By default we offset the sync window by 20 seconds in the past. The actual start of the sync range is T-01:00:20 and the end T-00:00:20 in most cases. This is to handle the inherent jitters of GossipSub. In other words, it is the amount of time needed to confirm if a message is missing or not.

Peer Choice

Wrong peering strategies can lead to inadvertently segregating peers and reduce sampling diversity. Nwaku randomly select peers to sync with for simplicity and robustness.

More sophisticated strategies may be implemented in future.

Attack Vectors

Nodes using WAKU-SYNC are fully trusted. Message hashes are assumed to be of valid messages received via Waku Relay or Light push.

Further refinements to the protocol are planned to reduce the trust level required to operate. Notably by verifying messages RLN proof at reception.

Copyright

Copyright and related rights waived via CC0.

References

RBSR
Negentropy Explainer
Master Thesis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync.md

sync.md

Abstract

Specification

Terminology

Reconciliation

Overview

Message IDs

Range Bounds

Range Fingerprinting

Range Type

Range Processing

Delta Encoding

Varints

Payload encoding

Transfer Protocol

Wire specification

Implementation

Cluster & Shards

Parameters

Storage

Sync Interval

Sync Window

Peer Choice

Attack Vectors

Copyright

References

Files

sync.md

Latest commit

History

sync.md

File metadata and controls

Abstract

Specification

Terminology

Reconciliation

Overview

Message IDs

Range Bounds

Range Fingerprinting

Range Type

Range Processing

Delta Encoding

Varints

Payload encoding

Transfer Protocol

Wire specification

Implementation

Cluster & Shards

Parameters

Storage

Sync Interval

Sync Window

Peer Choice

Attack Vectors

Copyright

References