Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Messages] Preserve original message time #254

Closed

Conversation

odesenfans
Copy link
Contributor

@odesenfans odesenfans commented May 6, 2022

Fixed an issue where the CCN would overwrite the original "time"
field of the message by the time at which the message is confirmed
on-chain. This issue could result in inversions of order between
messages if two consecutive messages arrive in the opposite order
on the node in charge of confirming messages.

Example: A user sends two messages at time T and T+1.
If these messages are processed in a different order because of
a different routing on the network, the node in charge of pushing
data on-chain could receive the message sent at T+1 before
receiving the message T. This CCN could then end up publishing
the message sent at T+1 in on-chain block B and the message
sent at T in block B+1. On reception of the TX, the CCNs would
then overwrite the time field with the timestamp of the block,
inverting the order intended by the user.

Added a migration script to reset the message time field using
the content.time field. The migration script does not update
forgotten messages as their content is null. The extra effort
to update these messages is not worth it given that it requires
to parse the whole chain data history for a few tens of messages.

@hoh
Copy link
Member

hoh commented May 9, 2022

Can you detail the inversion of order and its consequences ?

@odesenfans
Copy link
Contributor Author

@hoh done, updated the description.

@moshemalawach
Copy link
Member

Mhh... The idea here is that the user specified time can't be trusted for message ordering. While the block inclusion order is in essence set in stone.
Changing this behavior would open a message reordering attack vector (adding new messages later that change history).

@odesenfans
Copy link
Contributor Author

I thought about that, but as long as the time field is signed, it just represents the ordering desires of the user. The user can then mess with his own data, but not much besides that (at the moment).

IMO we should keep track of both times: the time represents the order desired by the user, and the confirmation time can be used to detect malicious intent (ex: a message with a time field way in the past appearing in a recent block). This solution at least avoids the introduction of a nonce/sequence ID, which can be a bit impractical.

@moshemalawach
Copy link
Member

The issue is that we use that time in queries ordering... Also across multiple users.
Less malign: a user changes the time to get his post always on top of a twitter-like tool
More malign: a user adds a node creation message in the middle of the history, and the staking tool gets the whole history wrong

@odesenfans
Copy link
Contributor Author

odesenfans commented May 10, 2022 via email

Fixed an issue where the CCN would overwrite the original "time"
field of the message by the time at which the message is confirmed
on-chain. This issue could result in inversions of order between
messages if two consecutive messages arrive in the opposite order
on the node in charge of confirming messages.

Added a migration script to reset the message time field using
the content.time field. The migration script does not update
forgotten messages as their content is null. The extra effort
to update these messages is not worth it given that it requires
to parse the whole chain data history for a few tens of messages.
@odesenfans odesenfans force-pushed the od-preserve-on-chain-message-times branch from 44a6c60 to a450a14 Compare May 10, 2022 19:07
@odesenfans
Copy link
Contributor Author

So, here's my plan on the topic.

  • This PR basically wishes to preserve the user message time, even for forgotten messages. I'm sure that we will need to preserve both the user time and the network reception time for a message. The goal for this PR is to replace the time field by a rx_time field representing the time at which the message enters the network, and an original_time field which is equal to content.time. The sole purpose of original_time is, as mentioned, to preserve the content time even if the message is forgotten. Renaming the time field might be problematic, so I'm okay with keeping time and just adding a field for the original message time. Or we decide that we don't care about the original time of forgotten messages, and I can simplify this PR quite a bit by just using content.time.
  • After this, a second PR will modify the aggregates code to use the right time field to avoid ordering issues. Multiple senders? -> rx_time. Single address? -> original_time.
  • Finally, a third PR will add the confirmed/unconfirmed flag to aggregates to make sure that we don't have ordering issues linked to confirmation (which are pretty much unavoidable without a sequence ID).

This is not high-priority at the moment though, so I see it more as a background discussion. What do you think @moshemalawach @hoh?

@hoh
Copy link
Member

hoh commented May 13, 2022

Mhh... The idea here is that the user specified time can't be trusted for message ordering. While the block inclusion order is in essence set in stone.

Shouldn't we use the timestamp of the chain block instead of the timestamp of when the message was received by the node in that case ? That is not only set in stone but difficult to lie on. I am especially thinking about when multiple CCNs will write on chains.

This would give:

  • user time (message.content.time)
  • node received time (message.time)
  • chains write times (message.confirmations[:].timestamp).

When time may not be falsified, we should use min(message.confirmations[trusted_chains].timestamp). We may want to write this on the document for faster/easier processing.

Regarding the user creation time for forgotten messages, I think we should not save it. Deleted content is deleted content, and we have other timestamps we can use when needed.

@hoh
Copy link
Member

hoh commented May 13, 2022

After discussion with @odesenfans , the idea would be to:

  • depreciate the field message.time
  • add a field reception_time that contains the timestamp when the node received the message
  • add the timestamp of chain blocks in the confirmation data (future messages + migration)
  • add a field confirmation_time as a mutable cache of the minimum timestamp from confirmations

@odesenfans
Copy link
Contributor Author

Regarding the implementation, I think it will make more sense if we merge reception_time and confirmation_time. Why?

  • We don't really care about the reception time on the node once a message is confirmed, this value will be different across nodes anyway. For chain data, we'll set this value to the time when we process the transaction.
  • Having to coalesce both values in all pipelines is gonna be a pain, and probably a performance hit.

@hoh
Copy link
Member

hoh commented May 13, 2022

We may not want to expose the reception_time, but would it not be useful internally to fetch the last 30 messages received by a node (before confirmation) and other similar queries ?

@odesenfans
Copy link
Contributor Author

Replaced by #267.

@odesenfans odesenfans closed this May 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants