Skip to content

Conversation

@MattKobayashi
Copy link
Contributor

@MattKobayashi MattKobayashi commented Jul 25, 2025

Change summary

This PR adds dhclient hooks to the image that re-configure static routes configured with dhcp-interface when an DHCP state change occurs on one of those interfaces. The code is inspired by (and largely identical to) the existing dhclient hooks for IPsec interfaces. This resolves the issue reported in T3680 of static routes with dhcp-interface as the next hop not being added to or removed from FRRouting when DHCP is renewed on an interface. The other issues mentioned in T3680 persist.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes)
  • Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component
  • Other (please describe):

Related Task(s)

https://vyos.dev/T3680

Related PR(s)

How to test / Smoketest result

 INFO - Executing VyOS smoketests
DEBUG - vyos@vyos:~$ /usr/bin/vyos-smoketest
DEBUG - /usr/bin/vyos-smoketest
DEBUG - Running Testcase: /usr/libexec/vyos/tests/smoke/cli/test_protocols_static.py
DEBUG - test_01_static (__main__.TestProtocolsStatic.test_01_static) ... ok
DEBUG - test_02_static_table (__main__.TestProtocolsStatic.test_02_static_table) ... ok
DEBUG - test_03_static_vrf (__main__.TestProtocolsStatic.test_03_static_vrf) ... ok
DEBUG - test_04_static_multicast (__main__.TestProtocolsStatic.test_04_static_multicast) ... ok
DEBUG - test_05_dhcp_default_route (__main__.TestProtocolsStatic.test_05_dhcp_default_route) ... ok
DEBUG - test_06_dhcp_default_route_for_vrf (__main__.TestProtocolsStatic.test_06_dhcp_default_route_for_vrf) ... ok
DEBUG - test_07_dhcp_interface_static_routes (__main__.TestProtocolsStatic.test_07_dhcp_interface_static_routes) ... ok
DEBUG -
DEBUG - ----------------------------------------------------------------------
DEBUG - Ran 7 tests in 183.038s
DEBUG -
DEBUG - OK

Checklist:

  • I have read the CONTRIBUTING document
  • I have linked this PR to one or more Phabricator Task(s)
  • I have run the components SMOKETESTS if applicable
  • My commit headlines contain a valid Task id
  • My change requires a change to the documentation
  • I have updated the documentation accordingly

@github-actions
Copy link

github-actions bot commented Jul 25, 2025

👍
No issues in PR Title / Commit Title

@MattKobayashi MattKobayashi force-pushed the T3680 branch 25 times, most recently from 10bfcfa to 302bb7c Compare July 27, 2025 08:31
@MattKobayashi MattKobayashi marked this pull request as ready for review July 27, 2025 08:51
@sever-sever sever-sever requested a review from Copilot July 27, 2025 15:45
@MattKobayashi
Copy link
Contributor Author

@sever-sever My apologies, I've been busy with other projects. I believe I've resolved the smoketest issue, one is running now.

@sever-sever
Copy link
Member

sever-sever commented Oct 3, 2025

@hedrok could you check and approve this PR if all works fine?
Until we find/implement/check another solution, I'm happy with this one.
Plus, it will be easier to backport.

Copy link
Member

@sever-sever sever-sever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a similar logic for ipsec-dhclient-hook
I don't expect any harm here.

@sever-sever sever-sever added the bp/circinus Create automatic backport for circinus label Oct 3, 2025
@hedrok
Copy link
Contributor

hedrok commented Oct 3, 2025

Thanks for your work and patience.
I've tested these changes and all three issues in https://vyos.dev/T3680 are still present.

I've tried simple topology
vyos-server [VyOS]
vyos-client [VyOS]
vyos-server.eth0 -- vyos-client.eth0

With configurations:

vyos-server:

    configure
    set system host-name vyos-server

    set interfaces ethernet eth0 address '10.1.1.1/24'
    set interfaces ethernet eth0 description 'LAN'

    set service dhcp-server shared-network-name LAN subnet 10.1.1.0/24 option default-router '10.1.1.1'
    set service dhcp-server shared-network-name LAN subnet 10.1.1.0/24 option name-server '10.1.1.1'
    set service dhcp-server shared-network-name LAN subnet 10.1.1.0/24 option domain-name 'vyos'
    set service dhcp-server shared-network-name LAN subnet 10.1.1.0/24 lease '86400'
    set service dhcp-server shared-network-name LAN subnet 10.1.1.0/24 range 0 start '10.1.1.9'
    set service dhcp-server shared-network-name LAN subnet 10.1.1.0/24 range 0 stop '10.1.1.254'
    set service dhcp-server shared-network-name LAN subnet 10.1.1.0/24 subnet-id '1'
    
    set service dns forwarding cache-size '0'
    set service dns forwarding listen-address '10.1.1.1'
    set service dns forwarding allow-from '10.1.1.0/24'

    set interfaces ethernet eth1 address '10.1.2.1/24'
    commit comment "vyos-server"
    save

vyos-client

    configure
    set system host-name vyos-client
    set interfaces ethernet eth0 address dhcp
    set protocols static route 10.1.2.0/24 dhcp-interface 'eth0'
    commit
    save

With such configuration I can still reproduce all 3 issues, to reproduce 2nd one I've deleted vyos-server and created new one with 10.147.1 instead of 10.1.1, but client after that still has

ip route 10.1.2.0/24 10.1.1.1 eth0
ip route 10.1.4.0/24 10.1.1.1 eth0

In /etc/frr/frr.conf.

I think the reason is that FRRender (python/build/lib/vyos/frrender.py) caches previous config and regenerates FRR configuration only if something has changed in VyOS configuration, so when you call /usr/libexec/vyos/conf_mode/protocols_static.py in dhclient hook FRR configuration is not regenerated. (I haven't checked this theory).

Do you check your changes on VyOS-rolling release?

Do I understand correctly that this PR should fix the issue

  1. Renewing the DHCP lease after changing the default-router option or renumbering the network doesn't update the dhcp-interface next-hop in FRR.

?

If not, please describe what exactly this PR fixes so that I can check that, thanks again.

@MattKobayashi
Copy link
Contributor Author

MattKobayashi commented Oct 17, 2025

Hi @hedrok,

With such configuration I can still reproduce all 3 issues, to reproduce 2nd one I've deleted vyos-server and created new one with 10.147.1 instead of 10.1.1

Did you renew DHCP on the client interface after changing the server subnet? DHCP won't instantly renew under such conditions, you have to trigger it with renew dhcp interface eth0 for the change to be picked up (or wait for expiry / 2 per RFC 2131 4.4.5).

I think the reason is that FRRender (python/build/lib/vyos/frrender.py) caches previous config and regenerates FRR configuration only if something has changed in VyOS configuration, so when you call /usr/libexec/vyos/conf_mode/protocols_static.py in dhclient hook FRR configuration is not regenerated. (I haven't checked this theory).

I can't identify any logic in protocols_static.py that caches the previous FRR configuration, so I don't believe this theory holds.

Do I understand correctly that this PR should fix the issue

  1. Renewing the DHCP lease after changing the default-router option or renumbering the network doesn't update the dhcp-interface next-hop in FRR.

Your understanding is correct. For total clarity:

  1. Setting the interface to DHCP and adding a dhcp-interface static route in the same commit doesn't add the route to FRR.

This can't be easily solved as the DHCP hook gets called before the DHCP interface list at /tmp/static_dhcp_interfaces is created. A second DHCP renewal after the commit should catch this, but I don't know of an easy way to catch it during the commit.

  1. Rebooting after changing the default-router option or renumbering the network leaves the old dhcp-interface next-hop in FRR.

This issue will also occur as a result of the DHCP hook being called before the interface list file is generated (same problem as 1).

@MattKobayashi
Copy link
Contributor Author

@hedrok I've done some work on a new approach that I believe may solve the entire original issue. Please take a look and give me your thoughts: MattKobayashi/vyos-1x@current...MattKobayashi:vyos-1x:T3680-dev

@hedrok
Copy link
Contributor

hedrok commented Oct 17, 2025

I've almost finalized an update to your previous solution: 856a88e

At a glance my variant updates vyos-configd/FRRender so that it takes into account that dhcp interfaces could have changed and yours forces protocols_static to apply FRR configuration without vyos-configd.

I like my option more as if other places (not only protocols_static) require updates according to DHCP update it will be trivial to update my solution so that everything is done through vyos-configd in one go.
Plus it is just harder to predict what happens when most of FRR changes go through vyos-configd and in this specific case are applied directly in protocols_static.

But anyway my plan is:

  1. Finalize my solution. (after this I'll write comment here so that you can take a look and test my variant if you have time).
  2. Review your update more thougroughly.
  3. Write here comparison, pros and cons of both options so that other reviewers decide which is better.

@hedrok
Copy link
Contributor

hedrok commented Oct 17, 2025

I've finalized and tested my solution, it has two commits:

Fixes all three subissues.

Now I'll review your new changes, I'll write back on Monday latest. If you have a chance to take a look and test my commits, it would be great.

For review convienience I've created PR to your branch in your repo: MattKobayashi#1
But please write overall comments here so that there is one place with whole discussion.

@hedrok
Copy link
Contributor

hedrok commented Oct 17, 2025

High-level review of suggested changes:

  1. This is important update, we need not to loose this even if my solution is taken:
-    if [ "$old_ip_address" == "$new_ip_address" ]; then
+    if [ "$old_ip_address" == "$new_ip_address" ] && [ "$old_routers" == "$new_routers" ]; then

We can even leave just

-    if [ "$old_ip_address" == "$new_ip_address" ]; then
+    if [ "$old_routers" == "$new_routers" ]; then

As IP address of the interface is not important for update.

  1. Test: for more realistic testing you can use virtual-ethernet to lauch dhcp server and client on same machine. You can see example here https://github.com/vyos/vyos-1x/pull/4783/files#diff-0ae71b5cc4b1cf4fa2b5112c8c75309271752e98769b349ce3d3d9a252dbd08a (still under review). Personally I am ok with mocking, but other maintainers prefer more realistic tests and in this particular case I've updated function that gets router IP so that it uses another file.

  2. File with interfaces vs reading config

Personally I prefer previous version: reading whole config is relatively costly operation + in such case when adding dhcp somewhere in configuration we'll have to remember to update hooks, which can lead to bugs in the future.

  1. Force apply: as I've written above, it seems risky to me. I would prefer using vyos-configd.

So if you are ok with my version, you can merge my PR, add your change in 1 with "$old_routers" == "$new_routers" as separate commit and I hope we'll be able to merge this PR soon. If not - please write details.

Thanks for your time.

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link

Conflicts have been resolved. A maintainer will review the pull request shortly.

@MattKobayashi
Copy link
Contributor Author

Hi @hedrok, your changes seem like a more robust solution to me, so I've merged your PR and added a commit to keep the comparison of routers instead of IP addresses. Hopefully this can be merged soon!

@hedrok
Copy link
Contributor

hedrok commented Oct 21, 2025

Only test

2025-10-21T00:52:52.9344085Z DEBUG - test_pppoe_client (__main__.PPPoEInterfaceTest.test_pppoe_client) ... FAIL

fails which is known issue in FRR.

@MattKobayashi thanks for fixing my linting issues - sorry you had to do that.

MattKobayashi and others added 2 commits October 21, 2025 19:57
Solves the problem that vyos-configs in FRRender caches configuation and
DHCP changes are ignored.

* Add src/helpers/vyos-request-configd-update.py that requests vyos-configd
  to update FRR configuration.
* Make dhclient hooks use it instead of calling protocols_static.py
* Make FRRender cache not only configuration but also DHCP gateways so
  that is any of them changes, FRR configuration is updated
@github-actions
Copy link

CI integration ❌ failed!

Details

CI logs

  • CLI Smoketests (no interfaces) 👍 passed
  • CLI Smoketests VPP 👍 passed
  • CLI Smoketests (interfaces only) ❌ failed
  • Config tests 👍 passed
  • Config tests VPP 👍 passed
  • RAID1 tests 👍 passed
  • TPM tests 👍 passed

@hedrok
Copy link
Contributor

hedrok commented Oct 22, 2025

I've rebased branch, squashed most commits.

Rechecked everything, fixes all issues in https://vyos.dev/T5811 and https://vyos.dev/T3680:

  1. Simultanious dhcp address + dhcp route: OK
  2. After reboot, DHCP server down, then up: OK
  3. DHCP gateway network change (prefix changes): OK
  4. DHCP gateway only change (IP and network stay same): OK

CI for test_interfaces_cli failed only for known failure:

2025-10-22T09:34:00.8294552Z DEBUG - Running Testcase: /usr/libexec/vyos/tests/smoke/cli/test_interfaces_pppoe.py
...
2025-10-22T09:34:20.1811897Z DEBUG - test_pppoe_client (__main__.PPPoEInterfaceTest.test_pppoe_client) ... FAIL

@sever-sever please re-review and merge.

@sever-sever sever-sever requested a review from Copilot October 22, 2025 11:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

socket.send_string(request)

message = socket.recv()
print(f"Received reply {request} [ {message} ]")
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name in the f-string is incorrect. It should print {message} in the context of 'Received reply', but the variable used is {request} which was already sent. This should likely be printing information about the message received, not re-printing the request.

Suggested change
print(f"Received reply {request} [ {message} ]")
print(f"Received reply {message}")

Copilot uses AI. Check for mistakes.
# - FAIL: DHCP failed, remove routes
# - RELEASE: lease released, remove routes
# - STOP: dhclient stopped, remove routes
if [ "$reason" == "PREINIT" ] || [ "$reason" == "EXPIRE" ] || [ "$reason" == "FAIL" ] || [ "$reason" == "RELEASE" ] || [ "$reason" == "STOP" ]; then
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bash scripts, use = instead of == for POSIX compliance in string comparisons within [ ] brackets, or use [[ ]] with ==. While == works in [ ] on many systems, it's not POSIX standard and using = is more portable.

Suggested change
if [ "$reason" == "PREINIT" ] || [ "$reason" == "EXPIRE" ] || [ "$reason" == "FAIL" ] || [ "$reason" == "RELEASE" ] || [ "$reason" == "STOP" ]; then
if [ "$reason" = "PREINIT" ] || [ "$reason" = "EXPIRE" ] || [ "$reason" = "FAIL" ] || [ "$reason" = "RELEASE" ] || [ "$reason" = "STOP" ]; then

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +31
if [ "$reason" == "RENEW" ] || [ "$reason" == "REBIND" ]; then
if [ "$old_routers" == "$new_routers" ]; then
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bash scripts, use = and != instead of == and != for POSIX compliance in string comparisons within [ ] brackets, or use [[ ]]. While these operators work in [ ] on many systems, using = is more portable for POSIX compliance.

Suggested change
if [ "$reason" == "RENEW" ] || [ "$reason" == "REBIND" ]; then
if [ "$old_routers" == "$new_routers" ]; then
if [ "$reason" = "RENEW" ] || [ "$reason" = "REBIND" ]; then
if [ "$old_routers" = "$new_routers" ]; then

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +31
if [ "$reason" == "RENEW" ] || [ "$reason" == "REBIND" ]; then
if [ "$old_routers" == "$new_routers" ]; then
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bash scripts, use = and != instead of == and != for POSIX compliance in string comparisons within [ ] brackets, or use [[ ]]. While these operators work in [ ] on many systems, using = is more portable for POSIX compliance.

Suggested change
if [ "$reason" == "RENEW" ] || [ "$reason" == "REBIND" ]; then
if [ "$old_routers" == "$new_routers" ]; then
if [ "$reason" = "RENEW" ] || [ "$reason" = "REBIND" ]; then
if [ "$old_routers" = "$new_routers" ]; then

Copilot uses AI. Check for mistakes.
@hedrok
Copy link
Contributor

hedrok commented Oct 22, 2025

As I'm not an author of PR, I can't resolve comments from CoPilot, so I'll reply here.
All issues can be ignored in my opinion.

print(f"Received reply {request} [ {message} ]")

Yes, maybe it is extra information, but it is just copied from official zmq documentation and I see no harm in it: https://zeromq.org/get-started/?language=python#
Plus in such way you see in one line request and message.

As for using = or [[ in bash - I didn't check, maybe it would be better, but we should change it in all scripts then, as in most of VyOS dhclient hooks comparisons are same as here.

@sever-sever sever-sever merged commit c284938 into vyos:current Oct 22, 2025
15 of 16 checks passed
@vyosbot vyosbot added mirror-initiated This PR initiated for mirror sync workflow mirror-completed and removed mirror-initiated This PR initiated for mirror sync workflow labels Oct 22, 2025
@hedrok
Copy link
Contributor

hedrok commented Oct 22, 2025

@MattKobayashi thanks again, merged at last :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bp/circinus Create automatic backport for circinus current mirror-completed rebase

Development

Successfully merging this pull request may close these issues.

6 participants