Skip to content

Conversation

@dshulyak
Copy link
Contributor

@dshulyak dshulyak commented Oct 3, 2025

implementation of the #2091

@dshulyak
Copy link
Contributor Author

dshulyak commented Oct 3, 2025

this doesn't include integration code (e.g discovery, dataplane and raptorcast changes) and i still need to add more docs to the code itself

@dshulyak dshulyak force-pushed the dmitry/wireauth branch 2 times, most recently from f088da4 to 070fb7c Compare October 7, 2025 07:05

pub fn generate_cookie(cookie_secret: &[u8; 32], nonce: u64, remote_addr: &SocketAddr) -> [u8; 16] {
let mut address_bytes = [0u8; 16];
match remote_addr.ip() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remote port is not used. According to the spec: The initiator address is a concatenation of the message source IP address and source port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i forgot about port, will add

}
}

#[derive(Clone, Debug, Zeroize)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no ZeroizeOnDrop. Is it intentional? It could be useful for tmp_x variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i am using Zeroizing for chaining_key and hash output. i think there was no need to zeroize every tmp variable, will recheck

.expect("ephemeral private key must be set"),
&msg.ephemeral_public,
)?;
let temp = keyed_hash!(state.chaining_key.as_ref(), ecdh_ee.as_ref());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ephemeral private key of the initiator's state can be erased here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess i can .take() it from the state, but the state itself is dropped (and private key is erased at that point) soon after exiting this function

Ok(result) => {
result.map(|(timer, r)| (timer, None, r.rekey, Some(r.terminated)))
}
Err(_) => None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it safe to ignore an error here and at similar places below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tick() shouldn't be fallible, i will remove Result from every such function in a follow up

)
.map_err(CookieError::CookieDecryptionFailed)?;

Ok(reply.encrypted_cookie)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is worth adding a comment that this function decrypts the cookie and returns its value. No comments, and a returned reply.encrypted_cookie is misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add

Ok((terminated, rekey))
}

pub fn handle_rekey_timer(&mut self) -> RekeyEvent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called handle_rekey_timer, but nothing is handled here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i will refactor this


let mut session = Initiator {
handshake_state,
common,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be called session state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean whole struct, e.g rename CommonSessionData to SessionState and all related variables?
i can rename, seems like a better choice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct:
common -> session_state
CommonSessionData -> SessionState

The Noise protocol framework uses CipherState, SymmetricState, HandshakeState, you can follow that naming for other relevant types: SessionState, ProtocolState, ...


use crate::error::{ProtocolErrorContext, Result};

pub struct Cookies {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no zeroization for secrets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think that it accomplishes anything. those secrets can be erased only when authentication protocol is terminated (e.g program closed for example).
and it doesn't matter if they are stolen when protocol is no longer working

refresh_duration: Duration,
) -> Self {
let mut cookie_secret = [0u8; 32];
rng.fill_bytes(&mut cookie_secret);
Copy link
Contributor

@dnkolegov-ar dnkolegov-ar Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the responder is under load, sends a handshake response with a cookie, and then immediately crashes? The initiator is an honest node, and it will send the new handshake initiation with the cookie that can't be verified after reloading. Will the reloaded node accept that handshake after timeout?

P.S. It is easier to ask that explicitly than to go through the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if responder is under load after crash then such message will have invalid mac2 and will be rejected.
if responder is not under load then invlalid mac2 will be ignored and initiation will pass.

do you think it is worth it to persist cookie secret to handle this case? to me it seems not worth it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if the responder is still under load after its crash, the honest initiator will not be able to establish the connection until some timer is triggered. In WireGuard, it seems to me, this would be Reject-After-Time × 3 seconds.
I am trying to understand which timeout defines that, and in how many seconds the honest peer will be able to establish a connection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initiator will retry if it doesn't get reply in session_timeout with some jitter

pub ip_rate_limit_window: Duration,
pub max_requests_per_ip: usize,
pub ip_history_capacity: usize,
pub psk: [u8; 32],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add zeroization for Config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it makes sense to erase psk, as it meant to be long term secret

pub rekey_deadline: Option<Duration>,
pub session_timeout_deadline: Option<Duration>,
pub max_session_duration_deadline: Option<Duration>,
pub stored_cookie: Option<[u8; 16]>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add zeroization for the cookie secret?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it only allows attacker to bypass dos-filter.
it doesn't cost much to wrap this in Zerozing, but practically it seemed to me unnecessary

use crate::{Initiator, Responder, Transport};
use monad_wireauth_session::{Config, SessionIndex};

pub struct API<C: Context> {
Copy link
Contributor

@dnkolegov-ar dnkolegov-ar Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In WireGuard:

after an initiator receives a handshake response message (section 5.4.3), 
the responder cannot send transport data messages (section 5.4.6)
until it has received the first transport data message from the initiator.
And, further, transport data messages encrypted using the previous secure session might be in transit after a new secure session has been created.
For these reasons, WireGuard keeps in memory the current secure session,
the previous secure session, and the next secure session for the case of an unconfirmed session 

Could you point me to the code where this logic is implemented? Or explain why this is not implemented and why the protocol does not need that functionality.

I think it should be in

fn decrypt_data_packet(&mut self, packet: &mut [u8], remote_addr: SocketAddr) -> Result<bool> {
, but I do not see the entire workflow, only the mechanism that ensures that the initiator must send some data to the responder before the responder starts sending data to the initiator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after an initiator receives a handshake response message (section 5.4.3), 
the responder cannot send transport data messages (section 5.4.6)
until it has received the first transport data message from the initiator.

implemented by having a separate type for Responder, that type can't send messages, but can accept them (decrypt method), and on first successfully decrypted message it transitions to Transport type (establish method)

And, further, transport data messages encrypted using the previous secure session might be in transit after a new secure session has been created.
For these reasons, WireGuard keeps in memory the current secure session,
the previous secure session, and the next secure session for the case of an unconfirmed session 

in api crate (state.rs) i track sessions with different state separately, sessions in Transport state can't be replaced by unconfirmed sessions, once Transport is created there is a call handle_established that evicts older sessions. that covers current secure session (Transport type) and separate unconfirmed session. so initiator or responder will never evict established session until new session transitions to Transport state

there is currently an expectation that both sides will initiate connect before sending messages, and i track 1 of each initiated and accepted instead of previous secure session. it addresses a problem that older session may get evicted during rekey while there are inflight messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though this last part is not very intuitive, i will check how to transition to simply storing 2 last established sessions instead

initiating_sessions: HashMap::new(),
responding_sessions: HashMap::new(),
transport_sessions: HashMap::new(),
last_open_session_by_public_key: HashMap::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does open mean? Who opened it? Can we use initiator/responder-based word?
Does Open mean not yet established?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be established or transport last_transport_session_by_public_key


self.transport_sessions.insert(session_id, transport);

terminated_sessions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not obvious without comments why the function called handle_established returns terminated_sessions also in

let terminated_sessions =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are replaced sessions, i will take a look if it can be simplified


let mut initiator = self
.state
.remove_initiator(&response.receiver_index.get().into())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the State removed before the response is authenticated/validated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i forgot to update this method to use get -> validate -> remove , similarly to decrypt_data_packet

response: &mut HandshakeResponse,
remote_addr: SocketAddr,
) -> Result<()> {
if !self.check_under_load(remote_addr, response.sender_index.get(), response)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In WireGuard, verify MAC1 first — always. Check the MAC2 if the host is under load.
Here, first check is whether we are under load, and then check MAC1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will update to run mac1 check before this, the same way it is done in accept_handshake_init

}

let duration_since_start = self.context.duration_since_start();
let local_index = self.state.allocate_session_index();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allocate and insert a new index here. What if we allocate it and then fail on validate_init? My understanding is that the error is returned, but handle_terminate is not called, leaving the index permanently marked as “allocated.”

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a mistake

@dnkolegov-ar
Copy link
Contributor

@dshulyak When can I review the fixes for the findings? I think it makes sense to do one more round of the security review after you fix all current findings/comments.

@dshulyak
Copy link
Contributor Author

@dnkolegov-ar i will push changes by wed 29th

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants