Skip to content

Define NodeAlias struct and Display impl #1544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 16, 2022

Conversation

jkczyz
Copy link
Contributor

@jkczyz jkczyz commented Jun 14, 2022

Provide a wrapper struct for 32-byte node aliases, which implements Display for printing. Support the UTF-8 character encoding, but replace control characters and terminate at the first null character. Fall back to ASCII if the byte sequence is an invalid encoding.

This is being upstreamed from the sample node. But also adds support for invalid UTF-8 and replaces rather than filters control characters.

tnull
tnull previously approved these changes Jun 15, 2022
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a tiny formatting nit.

Also thought whether we want to use this in ChannelManager and introduce more checks for user-defined aliases via using a constructor (e.g., enforcing a non-empty alias), but that's probably too opinionated and it's best to keep things simple and local here.

@@ -923,6 +923,47 @@ impl_writeable_tlv_based!(NodeAnnouncementInfo, {
(10, addresses, vec_type),
});

/// A user-defined name for a node, which may be used when displaying the node in a graph.
///
/// Node aliases provide a potential avenue for injection attacks. See [security considerations].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify this, its not like its complicated - the node alias is set by random third party nodes, care needs to be taken with any processing of it period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, missed this

let alias = match core::str::from_utf8(bytes) {
Ok(alias) => {
alias.chars()
.map(|c| if c.is_control() { control_symbol } else { c })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth mapping to replacement and such? Maybe its easier to just run to the first non-printable character and move on, it'd save us a chunk of code and be much easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, it's all pretty straight forward while the suggestion would result in surprising behavior, IMHO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that surprising? You had garbage in the node alias, garbage being ignored seems fine. I'm not a huge fan of the amount of code here trying to make sure we print as much of the alias as we can even though the alias author shoved garbage in. That's not really our problem, we don't need to bend over backwards for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... the node in question isn't the user. Why should our users suffer for their bad behavior? If a node's alias contains a tab or newline, it would be surprising if we show a truncated or empty string, depending on its position.

I don't really understand the pushback here. This is pretty standard behavior and is a trivial amount of code. The alternative proposals are to truncate, as you initially suggested, or filter as you are now suggesting, which the only difference is swapping map for filter with a slightly simpler lambda. If you just don't like using a replacement character, that's fine, but I don't think the amount of code here is really a valid argument.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... the node in question isn't the user. Why should our users suffer for their bad behavior? If a node's alias contains a tab or newline, it would be surprising if we show a truncated or empty string, depending on its position.

Right, I guess I don't see it as punishing the user. I see it as punishing the node who shoved garbage in their alias. Eg a tab may break someone printing node aliases in a tab-delimited file, so it's totally reasonable to just remove the crap the node tried to shove into the next column.

I don't really understand the pushback here. This is pretty standard behavior and is a trivial amount of code.

Yea, I think it's mostly all the iterators and allocating a string. Writing is usually slow/allocating anyway, but it's annoying to allocate just to filter. More generally, I find a bunch of iterators and chaining to be really hard to read, especially when there's a limit applied based on an index calculated elsewhere. I'm probably special here so not gonna hold this up anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I guess I don't see it as punishing the user. I see it as punishing the node who shoved garbage in their alias. Eg a tab may break someone printing node aliases in a tab-delimited file, so it's totally reasonable to just remove the crap the node tried to shove into the next column.

Let's say confusing the user then. 😉 e.g., When comparing to what a node explorer may show, a malicious node could be similarly aliased with but with some control characters thrown in. Probably no real security concern, but such a node could attempt to masquerade as another and would at very least be confusing.

Yea, I think it's mostly all the iterators and allocating a string. Writing is usually slow/allocating anyway, but it's annoying to allocate just to filter. More generally, I find a bunch of iterators and chaining to be really hard to read, especially when there's a limit applied based on an index calculated elsewhere. I'm probably special here so not gonna hold this up anymore.

FWIW, the old sanitize_string from the sample was allocating a string. Not sure if we can get around that with Formatter's interface. I guess we could write char-by-char using char::encode_utf8, though not sure how much more readable that would be.

Copy link
Contributor Author

@jkczyz jkczyz Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, without any allocations:

diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index 0369a87c..1e3be5dd 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -935,20 +935,24 @@ impl fmt::Display for NodeAlias {
                let control_symbol = core::char::REPLACEMENT_CHARACTER;
                let first_null = self.0.iter().position(|b| *b == 0).unwrap_or(self.0.len());
                let bytes = self.0.split_at(first_null).0;
-               let alias = match core::str::from_utf8(bytes) {
+               match core::str::from_utf8(bytes) {
                        Ok(alias) => {
-                               alias.chars()
-                                       .map(|c| if c.is_control() { control_symbol } else { c })
-                                       .collect::<String>()
+                               for c in alias.chars() {
+                                       let mut bytes = [0u8; 4];
+                                       let c = if !c.is_control() { c } else { control_symbol };
+                                       f.write_str(c.encode_utf8(&mut bytes))?;
+                               }
                        },
                        Err(_) => {
-                               bytes.iter().map(|b| *b as char)
+                               for c in bytes.iter().map(|b| *b as char) {
                                        // Display printable ASCII characters
-                                       .map(|c| if c >= '\x20' && c <= '\x7e' { c } else { control_symbol })
-                                       .collect::<String>()
+                                       let mut bytes = [0u8; 4];
+                                       let c = if c >= '\x20' && c <= '\x7e' { c } else { control_symbol };
+                                       f.write_str(c.encode_utf8(&mut bytes))?;
+                               }
                        },
                };
-               f.write_str(&alias)
+               Ok(())
        }
 }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a malicious node could be similarly aliased with but with some control characters thrown in. Probably no real security concern, but such a node could attempt to masquerade as another and would at very least be confusing.

Nodes can always alias with others? There's nothing stopping a node from putting whatever they want?

FWIW, the old sanitize_string from the sample was allocating a string.

Yea, I was more okay with it in the sample, which tends to be gratuitous with the allocations in a few places....it is just a sample, after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, true... guess that example doesn't make much sense. Pushed a version without allocations.

@codecov-commenter
Copy link

codecov-commenter commented Jun 15, 2022

Codecov Report

Merging #1544 (be40a82) into main (22dc964) will increase coverage by 0.75%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1544      +/-   ##
==========================================
+ Coverage   90.94%   91.70%   +0.75%     
==========================================
  Files          80       80              
  Lines       43469    47899    +4430     
  Branches    43469    47899    +4430     
==========================================
+ Hits        39533    43925    +4392     
- Misses       3936     3974      +38     
Impacted Files Coverage Δ
lightning/src/routing/gossip.rs 94.08% <100.00%> (+2.39%) ⬆️
lightning/src/ln/features.rs 99.28% <0.00%> (-0.19%) ⬇️
lightning/src/ln/functional_tests.rs 97.13% <0.00%> (-0.03%) ⬇️
lightning/src/ln/payment_tests.rs 99.31% <0.00%> (+0.07%) ⬆️
lightning/src/routing/router.rs 93.27% <0.00%> (+0.80%) ⬆️
lightning/src/chain/onchaintx.rs 94.90% <0.00%> (+0.92%) ⬆️
lightning/src/ln/functional_test_utils.rs 96.71% <0.00%> (+1.48%) ⬆️
lightning/src/ln/channelmanager.rs 87.79% <0.00%> (+3.46%) ⬆️
lightning/src/ln/channel.rs 92.06% <0.00%> (+3.48%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22dc964...be40a82. Read the comment docs.

tnull
tnull previously approved these changes Jun 15, 2022
@TheBlueMatt
Copy link
Collaborator

LGTM, feel free to squash.

Provide a wrapper struct for 32-byte node aliases, which implements
Display for printing. Support the UTF-8 character encoding, but replace
control characters and terminate at the first null character. Fall back
to ASCII if the byte sequence is an invalid encoding.
@jkczyz jkczyz force-pushed the 2022-06-node-alias branch from be40a82 to 21aff6f Compare June 15, 2022 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants