Skip to content

Fix IP Allocation Bug: Reserved Range Not Detected#2657

Open
neddp wants to merge 5 commits intomainfrom
fix_ip_allocation_from_reserved_range_in_dynamic_networks
Open

Fix IP Allocation Bug: Reserved Range Not Detected#2657
neddp wants to merge 5 commits intomainfrom
fix_ip_allocation_from_reserved_range_in_dynamic_networks

Conversation

@neddp
Copy link
Member

@neddp neddp commented Jan 29, 2026

Fix IP Allocation Bug: Non-Deterministic CIDR Deduplication

What is this change about?

This PR fixes a bug in the IP allocation algorithm where IPs from reserved CIDR ranges were incorrectly allocated, causing CPI failures with "Address is in subnet's reserved address range" errors.

Root Cause: The deduplication logic that merges overlapping CIDR blocks (e.g., individual /32 IPs contained within a /30 block) relied on Ruby Set iteration order, which is non-deterministic. Depending on the iteration order, the algorithm would sometimes fail to deduplicate properly, keeping individual /32 IPs even though they were fully contained within a larger reserved CIDR block.

The Fix: Sort IP addresses by [ip.to_i, ip.prefix] before performing deduplication. This ensures deterministic, correct behavior regardless of Set iteration order.

Please provide contextual information.

Production Failure Example:

  • Error: CPI error 'Bosh::Clouds::CloudError' with message 'Failed to create network interface for nic_group 'credhub_network': Address is in subnet's reserved address range'

Example Network Configuration:

  • Subnet: 10.0.11.32/27
  • Reserved range: 10.0.11.32 - 10.0.11.35 (10.0.11.32/30)
  • Static range: 10.0.11.36 - 10.0.11.40
  • Database state: Had individual /32 entries (10.0.11.32/32, .33/32, .34/32, .35/32) from previous failed allocation attempts

Technical Details:
The old deduplication code:

addresses_we_cant_allocate.reject! do |ip|
  addresses_we_cant_allocate.any? do |other_ip|
    includes = other_ip.include?(ip)
    includes && other_ip.prefix < ip.prefix
  end
end

Example of Buggy Set Iteration Order:

Set contains: [10.0.11.32/32, 10.0.11.33/32, 10.0.11.34/32, 10.0.11.35/32, 10.0.11.32/30]

Scenario 1 - Bad Order (Bug Manifests):
Set iterates in order: 10.0.11.34/32, 10.0.11.35/32, 10.0.11.32/32, 10.0.11.33/32, 10.0.11.32/30

  1. Processing 10.0.11.34/32: "Does ANY other IP contain me with smaller prefix?"
  2. Checks 10.0.11.35/32 - No (same prefix)
  3. Checks 10.0.11.32/32 - No (same prefix)
  4. Checks 10.0.11.33/32 - No (same prefix)
  5. Checks 10.0.11.32/30 - YES, but this happens AFTER the decision to keep/reject
  6. Result: 10.0.11.34/32 is KEPT
  7. Algorithm later allocates .34 → CPI error!

Scenario 2 - Good Order (Works by Chance):
Set iterates in order: 10.0.11.32/30, 10.0.11.32/32, 10.0.11.33/32, 10.0.11.34/32, 10.0.11.35/32

  1. Processing 10.0.11.34/32: "Does ANY other IP contain me with smaller prefix?"
  2. Checks 10.0.11.32/30 first - YES (contains .34/32 with prefix 30 < 32)
  3. Result: 10.0.11.34/32 is REJECTED
  4. Algorithm correctly skips entire reserved range

The bug depends entirely on whether Set#any? happens to iterate the /30 block before or after checking the /32 entry. Ruby Sets do not guarantee iteration order.
Reference - class Set

What tests have you run against this PR?

Unit Tests:

  • All existing unit tests in ip_repo_spec.rb pass.
  • Added 6 new comprehensive edge case tests covering:
    1. Database with individual /32 IPs + reserved range with /30 block
    2. Database with /32 blocks that should be deduplicated with /30 reserved range
    3. Overlapping CIDR blocks with different prefix sizes
    4. Reserved range as /28 block with database /32s
    5. Multiple overlapping reserved ranges
    6. Nested CIDR blocks requiring deduplication to outermost block

Production Validation:

  • Verified the code changes work as expected in a development environment

How should this change be described in bosh release notes?

Fixed: IP allocation no longer fails with "Address is in subnet's reserved address range" errors. The algorithm now correctly deduplicates overlapping CIDR blocks (e.g., individual /32 IPs within a /30 reserved range) by sorting IPs before deduplication, ensuring deterministic behavior regardless of internal Set iteration order. This fixes intermittent failures where IPs from reserved ranges were incorrectly allocated.

Does this PR introduce a breaking change?

No. This is a bug fix that makes the IP allocation algorithm work correctly and deterministically.

@neddp neddp marked this pull request as ready for review January 29, 2026 11:34
@neddp neddp requested a review from fmoehler January 29, 2026 11:49
@aramprice aramprice requested review from a team and s4heid and removed request for a team January 29, 2026 16:03
@aramprice aramprice moved this from Inbox to Pending Review | Discussion in Foundational Infrastructure Working Group Jan 29, 2026
@neddp neddp changed the title Fix IP Allocation Bug: Non-Deterministic CIDR Deduplication Fix IP Allocation Bug: Reserved Range Not Detected Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pending Review | Discussion

Development

Successfully merging this pull request may close these issues.

2 participants