Skip to content

discovery+lnmock: add gossip state machine fuzz tests#10605

Open
NishantBansal2003 wants to merge 1 commit intolightningnetwork:masterfrom
NishantBansal2003:fuzz-gossip
Open

discovery+lnmock: add gossip state machine fuzz tests#10605
NishantBansal2003 wants to merge 1 commit intolightningnetwork:masterfrom
NishantBansal2003:fuzz-gossip

Conversation

@NishantBansal2003
Copy link
Contributor

This commit adds fuzz tests for the gossip state machine. The fuzzer provides inputs, and based on those inputs, I simulate various random gossiper states. These states include connecting and disconnecting peers, sending node announcements, channel announcements, channel updates, announcement signatures, query messages, reply messages, gossip timestamp range messages, updating the block height, and triggering historical sync. Both valid and malformed messages are passed to help uncover any unexpected behavior caused by malicious nodes.

Currently, the execution rate is around ~20 execs/sec, which is not ideal. However, considering the large number of possible inputs and state transitions being exercised, this is expected. To discover more bugs, we likely need to run heavy, continuous fuzzing over an extended period.

I would also appreciate reviewers dedicating some CPU cycles to this to help ensure that the fuzz tests themselves are stable. After running the fuzzer for several days, I have accumulated more than 3,500 corpus entries. With this corpus, we are achieving over 60% coverage (in set mode) of the discovery package, which I believe can be improved further with continued fuzzing.

Also, this PR depends on #10589. So, some code semantics may need to change depending on how that PR evolves, but I believe this is ready for review.

corpus:
corpus.tar.gz

coverage:
coverage.html

cc: @morehouse

Signed-off-by: Nishant Bansal <nishant.bansal.282003@gmail.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @NishantBansal2003, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the testing suite by adding fuzz tests for the gossip state machine. The new tests aim to improve the robustness and reliability of the gossiper by simulating a wide array of network interactions, including both valid and malformed messages. This proactive approach helps identify edge cases and potential vulnerabilities that might not be caught by traditional unit tests, ultimately contributing to a more stable and secure Lightning Network implementation.

Highlights

  • Gossip State Machine Fuzz Tests: Introduced comprehensive fuzz tests for the AuthenticatedGossiper's state machine, simulating various network conditions and message types to uncover unexpected behavior.
  • Extensive State Simulation: The fuzzer simulates connecting/disconnecting peers, sending node/channel announcements, channel updates, announcement signatures, query/reply messages, gossip timestamp ranges, updating block height, and triggering historical sync.
  • Malformed Message Handling: Includes logic to conditionally malform lnwire messages, allowing the fuzzer to test the gossiper's robustness against invalid or malicious inputs.
  • Mocking Infrastructure Enhancements: Added noopTicker and noopChainView mocks, and enhanced lnmock.MockChain to allow dynamic block height returns, improving the isolation and control of the fuzzing environment.
  • Validation Semaphore Adjustment: Increased the buffer size of the error channel for validation semaphores in the gossiper, likely to prevent deadlocks or blocking during concurrent announcement processing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • discovery/fuzz_test.go
    • Added new file containing the core fuzz testing logic for the gossip state machine.
    • Implemented fuzzNetwork harness to manage mock peers, gossiper, chain, and notifier.
    • Defined fuzzState enum to represent various gossip protocol actions.
    • Included helper functions for parsing private keys, checking data availability, selecting peers, and connecting/disconnecting peers.
    • Developed maybeMalformMessage function to introduce controlled corruption into lnwire messages for robustness testing.
    • Added functions to simulate sending remote node announcements, channel announcements, channel updates, announcement signatures, and various query/reply messages.
    • Implemented functions to update block height and trigger historical sync within the fuzzing environment.
    • Introduced waitForValidationSemaphore to ensure serial processing of announcements and prevent race conditions during fuzzing.
  • discovery/gossiper.go
    • Increased the buffer size of the err channel in processRemoteAnn from 1 to 2.
  • discovery/mock_test.go
    • Added noopTicker to provide a no-operation implementation of the ticker.Ticker interface for mocking purposes.
    • Added noopChainView to provide a no-operation implementation of the chainview.FilteredChainView interface for mocking purposes.
    • Imported time, graphdb, chainview, and ticker packages.
  • lnmock/chain.go
    • Modified MockChain.GetBestBlock to dynamically evaluate its return value if it's a function, allowing for variable block height simulation during fuzzing.
Activity
  • The fuzzer has been run for several days, accumulating over 3,500 corpus entries.
  • Achieved over 60% coverage (in set mode) of the discovery package through fuzzing, with potential for further improvement.
  • The current execution rate is approximately 20 execs/sec, which is considered expected given the complexity of the state transitions being exercised.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a comprehensive fuzz test for the gossip state machine, which is a great addition for improving the robustness of the discovery package. The test covers a wide range of states and messages, including malformed ones, which is crucial for uncovering edge cases and potential vulnerabilities. The changes to the mock infrastructure to support dynamic behavior are also a good improvement for testability. I've found a few minor issues related to typos in function names and comments that should be addressed to improve code clarity and maintainability.

return binary.BigEndian.Uint32(data)
}

// getUint16 extracts a uint32 from a byte slice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for getUint16 incorrectly states that it extracts a uint32. According to the repository style guide (line 17), every function must be commented with its purpose. An incorrect comment can be misleading.

Suggested change
// getUint16 extracts a uint32 from a byte slice.
// getUint16 extracts a uint16 from a byte slice.
References
  1. Every function must be commented with its purpose and assumptions. (link)

return binary.BigEndian.Uint16(data)
}

// getInt64 extracts a non-negative int32 from a byte slice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for getInt32 incorrectly refers to the function as getInt64. The repository style guide (line 18) states that function comments must begin with the function name.

Suggested change
// getInt64 extracts a non-negative int32 from a byte slice.
// getInt32 extracts a non-negative int32 from a byte slice.
References
  1. Function comments must begin with the function name. (link)

Comment on lines +1274 to +1277
// udpateBlockHeight updates the best known block height in the fuzz network.
// The new height is selected from the fuzz data and is guaranteed to be
// monotonically increasing.
func (fn *fuzzNetwork) udpateBlockHeight(offset int) int {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a typo in the function name udpateBlockHeight; it should be updateBlockHeight. This typo is also present in the fuzzState enum (line 106) and in the runGossipStateMachine function (line 1394). For consistency and readability, please correct the function name here and in all its usages. This aligns with the style guide's emphasis on readable code (line 6).

Suggested change
// udpateBlockHeight updates the best known block height in the fuzz network.
// The new height is selected from the fuzz data and is guaranteed to be
// monotonically increasing.
func (fn *fuzzNetwork) udpateBlockHeight(offset int) int {
// updateBlockHeight updates the best known block height in the fuzz network.
// The new height is selected from the fuzz data and is guaranteed to be
// monotonically increasing.
func (fn *fuzzNetwork) updateBlockHeight(offset int) int {
References
  1. Readable code is the most important requirement for any commit created. Typographical errors in function names hinder readability. (link)

@Roasbeef Roasbeef added discovery Peer and route discovery / whisper protocol related issues/PRs testing Improvements/modifications to the test suite labels Feb 24, 2026
Copy link
Collaborator

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK, nice fuzz target!

gossiper *AuthenticatedGossiper
chain *lnmock.MockChain
notifier *mockNotifier
blockHeight *int32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is blockHeight a pointer?

}

// createGossiper creates and starts a gossiper for fuzz testing.
func createGossiper(t *testing.T, blockHeight *int32,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is blockHeight a pointer?

Comment on lines +49 to +50
// total number of fuzz state actions.
numFuzzStates = 13
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make this the last element to the enum below, it will automatically count the number of actions.

Comment on lines +36 to +46
var (
testRBytes, _ = hex.DecodeString("8ce2bc69281ce27da07e6683571319d18e" +
"949ddfa2965fb6caa1bf0314f882d7")
testSBytes, _ = hex.DecodeString("299105481d63e0f4bc2a88121167221b67" +
"00d72a0ead154c03be696a292d24ae")
testRScalar = new(btcec.ModNScalar)
testSScalar = new(btcec.ModNScalar)
_ = testRScalar.SetByteSlice(testRBytes)
_ = testSScalar.SetByteSlice(testSBytes)
testSig = ecdsa.NewSignature(testRScalar, testSScalar)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only global value we actually use from here is testSig, correct?

We should move this entire testSig calculation into createGossiper where it is used and add a comment explaining where it was copied from and why.


// runGossipStateMachine executes the gossip state machine with fuzz input data.
func (fn *fuzzNetwork) runGossipStateMachine() {
fn.t.Helper()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we should remove all of the t.Helper() calls in this file.

Getting the full stack trace is very helpful for debugging failures, and t.Helper() hides stack frames.


filter, ok := malformedMsg.(*lnwire.GossipTimestampRange)
require.True(fn.t, ok)
_ = syncer.ApplyGossipFilter(fn.t.Context(), filter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, could we hang here if syncerSema is saturated from previous GossipTimestampRange messages?

Comment on lines +438 to +444
// NodeID
if canMutate(33) {
var nodeID [33]byte
copy(nodeID[:], fn.data[cursor:cursor+33])
out.NodeID = nodeID
cursor += 33
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: a lot of the if canMutate(X) { ... } blocks in maybeMalformMessage are identical across message types. Helper functions could be more readable and less code.

return
default:
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it spin-waits, which likely wastes CPU. Perhaps we should sleep for 1-10ms each loop iteration.

Comment on lines +905 to +907
scid := lnwire.NewShortChanIDFromInt(getUint64(
fn.data[offset+2 : offset+10],
))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A random scid will ~never match an actual channel ID the gossiper knows about.

Perhaps we should allow reusing a previously announced scid most of the time, or at least create a TODO for it.

Comment on lines +541 to +550
fn.chain.On("GetBlockHash", int64(scid.BlockHeight)).
Return(nil, fmt.Errorf("block not found")).
Once()
fn.chain.On("GetBlock", tmock.Anything).
Return(nil, fmt.Errorf("block not found")).
Once()
fn.chain.On(
"GetUtxo", tmock.Anything, tmock.Anything,
scid.BlockHeight, tmock.Anything,
).Return(nil, fmt.Errorf("utxo not found")).Once()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the scid somehow matches an existing valid one? Will returning errors here cause problems?

Also I worry about registering all 3 of these calls -- are we sure they all eventually get consumed? I could imagine a scenario where GetBlockHash returns an error, so GetBlock and GetUtxo are never called. If that happens, then we'll have unconsumed mock calls that could incorrectly poison future messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

discovery Peer and route discovery / whisper protocol related issues/PRs testing Improvements/modifications to the test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants