Skip to content

Conversation

@veronica-m-ef
Copy link
Contributor

@veronica-m-ef veronica-m-ef commented Jan 9, 2026

Changes

This PR introduces a significant refactor of the eBPF memory architecture and performance tracking logic to improve scalability and data accuracy.

1. Protocol-Specific Map Architecture (Split-Maps)

To reduce kernel memory footprint, the monolithic FlowStats struct was split into a base struct and two optional extension maps:

  • FLOW_STATS (Base): Reduced from 176 bytes to 120 bytes. Contains Ethernet/IP layer data common to all flows.
  • TCP_STATS (Extension): 56-byte map allocated only for TCP flows. Stores handshake timings, state, and transaction performance metrics.
  • ICMP_STATS (Extension): 4-byte map allocated only for ICMP flows. Stores type and code metadata.
    Impact: Saves ~37% of kernel memory for typical traffic mixes (40% TCP, 50% UDP, 10% ICMP).

2. Direction-Agnostic Latency Logic

Upgraded TCP transaction timing to track timestamps for both directions independently. This allows for accurate latency and jitter calculation regardless of which side (client or server) initiates a data transaction, and correctly handles "Late Start" scenarios where monitoring begins mid-flow.

3. Userspace "Join" Logic & Multi-Map Cleanup

  • Updated record_flow to perform conditional lookups (Joins) of protocol extensions based on the base protocol.
  • Refactored EbpfFlowGuard, timeout_and_remove_flow, and orphan_scanner_task with a macro-based approach to ensure keys are wiped from all eBPF tables simultaneously, preventing kernel memory leaks.
  • Fixed naming collisions in OpenTelemetry attributes by distinguishing between Forward and Reverse IP metadata (e.g., flow.ip.ttl vs flow.reverse.ip.ttl).

Fixes ENG-372

Type of change

  • Bug fix
  • New feature
  • Breaking change (eBPF map schema version bumped to v2)
  • Documentation
  • Refactor

Testing

  • Unit Tests: Updated mermin-common tests to verify the new 120-byte, 56-byte, and 4-byte memory layouts.
  • Integration Tests: Fixed "IO Safety violation" crashes in test runners by preventing the drop of zeroed eBPF map handles using std::mem::forget on leaked Arcs.
  • E2E Manual Testing:
    • Verified TCP latency/jitter and state transitions using local loopback and cross-pod traffic.
    • Verified ICMP split-map capture using kubectl debug with ping -4 to confirm Echo Request/Reply (8/0) mapping.

Proof it works

TCP Span with Split-Map Join & Performance Metrics

Kind         : Client
Attributes:
     ->  flow.community_id: String(Owned("1:1DcPbOHop8MKMtzGlsL8x0A1zY0="))
     ->  flow.connection.state: String(Static("established"))
     ->  flow.tcp.handshake.latency: I64(53750)
     ->  flow.tcp.rndtrip.latency: I64(590029)
     ->  flow.tcp.rndtrip.jitter: I64(210813)
     ->  source.k8s.pod.name: String(Owned("mermin-vdfd4"))

ICMPv6 Span showing successful Join of ICMP_STATS

Attributes:
     ->  network.transport: String(Static("icmpv6"))
     ->  flow.icmp.type.id: I64(136)
     ->  flow.icmp.type.name: String(Owned("neighbor_advertisement"))
     ->  flow.reverse.ip.ttl: I64(0) // Naming collision resolved
Screenshot 2026-01-09 at 1 31 17 PM

Checklist

  • I've tested my changes
  • I've updated relevant documentation
  • My code follows the project's style (run cargo fmt and cargo clippy)
  • All tests pass

@veronica-m-ef veronica-m-ef self-assigned this Jan 9, 2026
@veronica-m-ef veronica-m-ef changed the title protocol specific maps feat protocol specific maps Jan 9, 2026
@veronica-m-ef veronica-m-ef changed the title feat protocol specific maps feat: flowstats protocol-specific maps Jan 9, 2026
@veronica-m-ef veronica-m-ef reopened this Jan 9, 2026
@veronica-m-ef veronica-m-ef requested review from ElastiJAM, reggieross and svencowart and removed request for ElastiJAM and svencowart January 12, 2026 12:55
@veronica-m-ef
Copy link
Contributor Author

This PR is dependent on 370 to be merged since it contains the dashboards to ensure the metrics have not been altered with this refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants