Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YANG operational-data fragmentation #6371

Closed

Conversation

rwestphal
Copy link
Member

This PR extends the northbound layer with infrastructure for asynchronous YANG-modeled operational data retrieval. This should solve the problem of blocking the main pthread of the daemons for too long while fetching bulk state data (e.g. millions of routes).

The nb_oper_data_iterate() API function was modified to accept a starting offset, a maximum number of elements to iterate over and outputs where the iteration stopped. The outputted offset can be feeded back to the input to resume the iteration from where it stopped. It's up to the NB client whether to fetch and buffer all data before delivering it to the external client, or demand the external client to be in charge of resuming the iteration as necessary. In any case, canceling the iteration before it finishes should be possible (small chunk sizes are better for more responsiveness).

The challenge with asynchronous iteration is that data can be modified while it's being iterated over. In particular, the iteration might stop at a given YANG list entry, then once the iteration resumes that list entry might not exist anymore. This means the pointer returned by the latest get_next() callback is unsafe to use once the iteration is resumed. To deal with this problem, the NB should use the lookup_entry() callback as a form of synchronization to ensure the list entry referent to the base offset of the data iteration still exists. Further, a new exact_match parameter was added to that callback so that it can find the next available entry in the list in case the one we're looking for doesn't exist anymore. All instances of the lookup_entry() callback were updated to take that new parameter into account. Commit f88fb89 explains how that can be done (e.g. use RB_FIND for exact matches and RB_NFIND for non-exact matches when iterating over an rb-tree).

Here's an example of all callbacks called to fetch this data:

One limitation of the current implementation is that the data iteration can't stop in the middle of a leaf-list or keyless list. This is because the NB doesn't provide the get_keys() and lookup_entry() callbacks for such nodes, so there's no way to perform the required synchronization step once the iteration is resumed. In the future it should be possible to work around this limitation by implementing pseudo-keys that are internal to the FRR implementation and not visible to the external world. The ConfD API uses a similar strategy to deal with this problem.

As of now, none of the existing NB clients (CLI, gRPC, ConfD and Sysrepo) can leverage this new infrastructure since none of them have async capabilities (#6368 should resolve this problem for gRPC). In spite of that, the CLI "show yang operational-data" command was extended with the max-elements (1-1000000) [repeat] parameters for testing purposes (the data iteration can be stopped and resumed several times, but the command still runs synchronously). These new parameters should be removed once the CLI gets async capabilities (in that case a global knob to enable fragmentation at a system level should be better).

Copy link

@polychaeta polychaeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution to FRR!

Click for style suggestions

To apply these suggestions:

curl -s https://gist.githubusercontent.com/polychaeta/0f55fb6e1dfacf3ac6d17f3fed360b1b/raw/4b88986d99b403733352457a41120a7fde768332/cr_6371_1588947839.diff | git apply

diff --git a/lib/northbound.c b/lib/northbound.c
index b8b70f03e..91d04f5c5 100644
--- a/lib/northbound.c
+++ b/lib/northbound.c
@@ -954,8 +954,8 @@ int nb_callback_get_keys(const struct nb_node *nb_node, const void *list_entry,
 			strlcpy(kbuf, "(error)", sizeof(kbuf));
 
 		zlog_debug(
-		       "northbound callback (get_keys): node [%s] list_entry [%p] -> %s",
-		       nb_node->xpath, list_entry, kbuf);
+			"northbound callback (get_keys): node [%s] list_entry [%p] -> %s",
+			nb_node->xpath, list_entry, kbuf);
 	}
 
 	return ret;
diff --git a/lib/northbound_cli.c b/lib/northbound_cli.c
index 79e84712a..4a658a14a 100644
--- a/lib/northbound_cli.c
+++ b/lib/northbound_cli.c
@@ -1289,26 +1289,25 @@ error:
 	return NB_ITER_ABORT;
 }
 
-DEFPY (show_yang_operational_data,
-       show_yang_operational_data_cmd,
-       "show yang operational-data XPATH$xpath\
+DEFPY(show_yang_operational_data, show_yang_operational_data_cmd,
+      "show yang operational-data XPATH$xpath\
          [{\
 	   format <json$json|xml$xml>\
 	   |translate WORD$translator_family\
 	   |max-elements (1-1000000)$max_elements [repeat$repeat]\
 	 }]",
-       SHOW_STR
-       "YANG information\n"
-       "Show YANG operational data\n"
-       "XPath expression specifying the YANG data path\n"
-       "Set the output format\n"
-       "JavaScript Object Notation\n"
-       "Extensible Markup Language\n"
-       "Translate operational data\n"
-       "YANG module translator\n"
-       "Maximum number of elements to fetch at once\n"
-       "Maximum number of elements to fetch at once\n"
-       "Fetch all data using batches of the provided size\n")
+      SHOW_STR
+      "YANG information\n"
+      "Show YANG operational data\n"
+      "XPath expression specifying the YANG data path\n"
+      "Set the output format\n"
+      "JavaScript Object Notation\n"
+      "Extensible Markup Language\n"
+      "Translate operational data\n"
+      "YANG module translator\n"
+      "Maximum number of elements to fetch at once\n"
+      "Maximum number of elements to fetch at once\n"
+      "Fetch all data using batches of the provided size\n")
 {
 	struct nb_oper_data_iter_input iter_input = {};
 	struct nb_oper_data_iter_output iter_output = {};
diff --git a/vtysh/vtysh.c b/vtysh/vtysh.c
index 14877258d..dc35685b5 100644
--- a/vtysh/vtysh.c
+++ b/vtysh/vtysh.c
@@ -2544,27 +2544,25 @@ DEFUN (vtysh_show_error_code,
 }
 
 /* Northbound. */
-DEFUN (show_yang_operational_data,
-       show_yang_operational_data_cmd,
-       "show yang operational-data XPATH\
+DEFUN(show_yang_operational_data, show_yang_operational_data_cmd,
+      "show yang operational-data XPATH\
          [{\
 	   format <json|xml>\
 	   |translate WORD\
 	   |max-elements (1-1000000) [repeat]\
 	 }]" DAEMONS_LIST,
-       SHOW_STR
-       "YANG information\n"
-       "Show YANG operational data\n"
-       "XPath expression specifying the YANG data path\n"
-       "Set the output format\n"
-       "JavaScript Object Notation\n"
-       "Extensible Markup Language\n"
-       "Translate operational data\n"
-       "YANG module translator\n"
-       "Maximum number of elements to fetch at once\n"
-       "Maximum number of elements to fetch at once\n"
-       "Fetch all data using batches of the provided size\n"
-       DAEMONS_STR)
+      SHOW_STR
+      "YANG information\n"
+      "Show YANG operational data\n"
+      "XPath expression specifying the YANG data path\n"
+      "Set the output format\n"
+      "JavaScript Object Notation\n"
+      "Extensible Markup Language\n"
+      "Translate operational data\n"
+      "YANG module translator\n"
+      "Maximum number of elements to fetch at once\n"
+      "Maximum number of elements to fetch at once\n"
+      "Fetch all data using batches of the provided size\n" DAEMONS_STR)
 {
 	int idx_protocol = argc - 1;
 	char *fcmd = argv_concat(argv, argc - 1, 0);

If you are a new contributor to FRR, please see our contributing guidelines.

@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented May 8, 2020

Continuous Integration Result: SUCCESSFUL

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12226/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Checkout code: Successful with additional warnings
Report for if.c | 5 issues
===============================================
< WARNING: strncat() is error-prone; please use strlcat() if possible#959: FILE: /tmp/f1-7024/if.c:959:
---
> WARNING: strncat() is error-prone; please use strlcat() if possible#959: FILE: /tmp/f2-7024/if.c:959:
103c103
< #1050: FILE: /tmp/f1-7024/if.c:1050:

Warnings Generated during build:

Debian 10 amd64 build: Successful with additional warnings

Debian Package lintian failed for Debian 10 amd64 build:
(see full package build log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12226/artifact/DEB10BUILD/ErrorLog/log_lintian.txt)

W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr-snmp: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200508-20-gf88fb89e4-0 (missing) -> 7.4-dev-20200508-20-gf88fb89e4-0~deb10u1
W: frr: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200508-20-gf88fb89e4-0 (missing) -> 7.4-dev-20200508-20-gf88fb89e4-0~deb10u1
W: frr-rpki-rtrlib: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200508-20-gf88fb89e4-0 (missing) -> 7.4-dev-20200508-20-gf88fb89e4-0~deb10u1
W: frr-doc: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200508-20-gf88fb89e4-0 (missing) -> 7.4-dev-20200508-20-gf88fb89e4-0~deb10u1
W: frr-pythontools: changelog-file-missing-explicit-entry 6.0-2 -> 7.4-dev-20200508-20-gf88fb89e4-0 (missing) -> 7.4-dev-20200508-20-gf88fb89e4-0~deb10u1

@LabN-CI
Copy link
Collaborator

LabN-CI commented May 8, 2020

Outdated results 💚

Basic BGPD CI results: SUCCESS, 0 tests failed

_ _
Result SUCCESS git merge/6371 f88fb89
Date 05/08/2020
Start 11:37:01
Finish 12:02:47
Run-Time 25:46
Total 1815
Pass 1815
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2020-05-08-11:37:01.txt
Log autoscript-2020-05-08-11:37:56.log.bz2
Memory 493 491 427

For details, please contact louberger

Copy link

@polychaeta polychaeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution to FRR!

Click for style suggestions

To apply these suggestions:

curl -s https://gist.githubusercontent.com/polychaeta/82dfcb3f7a334c553042b312cda9531d/raw/e078c02533d4a8434c28cc6956008d16d6408f7f/cr_6371_1589872160.diff | git apply

diff --git a/lib/northbound_cli.c b/lib/northbound_cli.c
index f9c242bce..f0504d854 100644
--- a/lib/northbound_cli.c
+++ b/lib/northbound_cli.c
@@ -1290,26 +1290,25 @@ error:
 	return NB_ITER_ABORT;
 }
 
-DEFPY (show_yang_operational_data,
-       show_yang_operational_data_cmd,
-       "show yang operational-data XPATH$xpath\
+DEFPY(show_yang_operational_data, show_yang_operational_data_cmd,
+      "show yang operational-data XPATH$xpath\
          [{\
 	   format <json$json|xml$xml>\
 	   |translate WORD$translator_family\
 	   |max-elements (1-1000000)$max_elements [repeat$repeat]\
 	 }]",
-       SHOW_STR
-       "YANG information\n"
-       "Show YANG operational data\n"
-       "XPath expression specifying the YANG data path\n"
-       "Set the output format\n"
-       "JavaScript Object Notation\n"
-       "Extensible Markup Language\n"
-       "Translate operational data\n"
-       "YANG module translator\n"
-       "Maximum number of elements to fetch at once\n"
-       "Maximum number of elements to fetch at once\n"
-       "Fetch all data using batches of the provided size\n")
+      SHOW_STR
+      "YANG information\n"
+      "Show YANG operational data\n"
+      "XPath expression specifying the YANG data path\n"
+      "Set the output format\n"
+      "JavaScript Object Notation\n"
+      "Extensible Markup Language\n"
+      "Translate operational data\n"
+      "YANG module translator\n"
+      "Maximum number of elements to fetch at once\n"
+      "Maximum number of elements to fetch at once\n"
+      "Fetch all data using batches of the provided size\n")
 {
 	struct nb_oper_data_iter_input iter_input = {};
 	struct nb_oper_data_iter_output iter_output = {};
diff --git a/vtysh/vtysh.c b/vtysh/vtysh.c
index 14877258d..dc35685b5 100644
--- a/vtysh/vtysh.c
+++ b/vtysh/vtysh.c
@@ -2544,27 +2544,25 @@ DEFUN (vtysh_show_error_code,
 }
 
 /* Northbound. */
-DEFUN (show_yang_operational_data,
-       show_yang_operational_data_cmd,
-       "show yang operational-data XPATH\
+DEFUN(show_yang_operational_data, show_yang_operational_data_cmd,
+      "show yang operational-data XPATH\
          [{\
 	   format <json|xml>\
 	   |translate WORD\
 	   |max-elements (1-1000000) [repeat]\
 	 }]" DAEMONS_LIST,
-       SHOW_STR
-       "YANG information\n"
-       "Show YANG operational data\n"
-       "XPath expression specifying the YANG data path\n"
-       "Set the output format\n"
-       "JavaScript Object Notation\n"
-       "Extensible Markup Language\n"
-       "Translate operational data\n"
-       "YANG module translator\n"
-       "Maximum number of elements to fetch at once\n"
-       "Maximum number of elements to fetch at once\n"
-       "Fetch all data using batches of the provided size\n"
-       DAEMONS_STR)
+      SHOW_STR
+      "YANG information\n"
+      "Show YANG operational data\n"
+      "XPath expression specifying the YANG data path\n"
+      "Set the output format\n"
+      "JavaScript Object Notation\n"
+      "Extensible Markup Language\n"
+      "Translate operational data\n"
+      "YANG module translator\n"
+      "Maximum number of elements to fetch at once\n"
+      "Maximum number of elements to fetch at once\n"
+      "Fetch all data using batches of the provided size\n" DAEMONS_STR)
 {
 	int idx_protocol = argc - 1;
 	char *fcmd = argv_concat(argv, argc - 1, 0);

If you are a new contributor to FRR, please see our contributing guidelines.

@rwestphal
Copy link
Member Author

Just updated this PR with several bug fixes, optimizations and code simplification. I'm quite satisfied with how powerful and robust the nb_oper_data_iterate() API function became at this point.

I still need to integrate the new fragmentation support with the gRPC plugin as requested in the last NB meeting. That should be my next step on this PR. I'm keeping the draft status since this is still not ready.

@LabN-CI
Copy link
Collaborator

LabN-CI commented May 19, 2020

Outdated results 💚

Basic BGPD CI results: SUCCESS, 0 tests failed

_ _
Result SUCCESS git merge/6371 d109788
Date 05/19/2020
Start 03:35:46
Finish 04:01:38
Run-Time 25:52
Total 1815
Pass 1815
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2020-05-19-03:35:46.txt
Log autoscript-2020-05-19-03:36:46.log.bz2
Memory 500 490 424

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented May 19, 2020

Continuous Integration Result: FAILED

Continuous Integration Result: FAILED

See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12334/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Successful

Basic Tests: Failed

Topo tests part 2 on Ubuntu 18.04 amd64: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TP2U1804AMD64-12334/test

Topology Tests failed for Topo tests part 2 on Ubuntu 18.04 amd64:

2020-05-19 07:47:54,097 ERROR: assert failed at "test_bgp_gr_functionality_topo1/test_BGP_GR_TC_51_p1": Testcase test_BGP_GR_TC_51_p1 : Failed 
   Error No bgp route found in rib of router r1..
assert 'No bgp route found in rib of router r1..' is True
r1: Daemon bgpd not running

From frr r1 bgpd log file:

2020/05/19 07:45:28 BGP: vty[??]@> enable
2020/05/19 07:45:28 BGP: vty[??]@# configure terminal
2020/05/19 07:45:28 BGP: vty[??]@(config)# no log file /tmp/topotests/test_bgp_gr_functionality_topo1/r1/zebra.log

2020/05/19 07:46:25 BGP: vty[??]@> enable
2020/05/19 07:46:25 BGP: vty[??]@# do show logging

2020/05/19 07:46:25 BGP: vty[??]@> enable
2020/05/19 07:46:25 BGP: vty[??]@# do show logging

2020/05/19 07:46:25 BGP: vty[??]@> enable
2020/05/19 07:46:25 BGP: vty[??]@# do write terminal

2020/05/19 07:46:25 BGP: bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 192.168.0.2 in vrf default
2020/05/19 07:46:25 BGP: bgp_update_receive: rcvd End-of-RIB for IPv6 Unicast from fd00::2 in vrf default
2020/05/19 07:46:25 BGP: vty[??]@> enable
2020/05/19 07:46:25 BGP: vty[??]@# configure terminal
2020/05/19 07:46:25 BGP: vty[??]@(config)# no log file /tmp/topotests/test_bgp_gr_functionality_topo1/r1/zebra.log


2020-05-19 08:10:59,793 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:-1 



2020-05-19 08:10:59,912 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:65536 



2020-05-19 08:11:00,029 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:4294967296 



2020-05-19 08:11:00,148 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:-1:1 



2020-05-19 08:12:55,788 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: line 2: % Command incomplete[4]: bgp large-community-list standard Test1 permit  



2020-05-19 08:18:08,546 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

2020-05-19 08:18:10,586 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

2020-05-19 08:18:12,625 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12334/artifact/TP2U1804AMD64/ErrorLog/log_topotests.txt

Topo tests part 2 on Ubuntu 16.04 i386: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TP2U1604I386-12334/test

Topology Tests failed for Topo tests part 2 on Ubuntu 16.04 i386:

2020-05-19 09:48:38,831 ERROR: assert failed at "test_bgp_gr_functionality_topo1/test_BGP_GR_TC_50_p1": Testcase test_BGP_GR_TC_50_p1 : Failed 
   Routes are still present 
   Error No bgp route found in rib of router r2..
assert 'No bgp route found in rib of router r2..' is True
r2: Daemon bgpd not running

From frr r2 bgpd log file:
2020/05/19 09:47:34 BGP: vty[??]@> enable
2020/05/19 09:47:34 BGP: vty[??]@# do show logging

2020/05/19 09:47:34 BGP: vty[??]@> enable
2020/05/19 09:47:34 BGP: vty[??]@# do show logging

2020/05/19 09:47:34 BGP: bgp_update_receive: rcvd End-of-RIB for IPv6 Unicast from fd00::1 in vrf default
2020/05/19 09:47:34 BGP: bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 192.168.0.1 in vrf default
2020/05/19 09:47:35 BGP: vty[??]@> enable
2020/05/19 09:47:35 BGP: vty[??]@# do write terminal

2020/05/19 09:47:35 BGP: vty[??]@> enable
2020/05/19 09:47:35 BGP: vty[??]@# configure terminal
2020/05/19 09:47:35 BGP: vty[??]@(config)# no log file /tmp/topotests/test_bgp_gr_functionality_topo1/r2/zebra.log

2020/05/19 09:48:09 BGP: bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 192.168.0.1 in vrf default
2020/05/19 09:48:10 BGP: [EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: fd00::1
2020/05/19 09:48:11 BGP: vty[??]@> enable
2020/05/19 09:48:11 BGP: vty[??]@# do write memory


2020-05-19 10:11:30,745 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:-1 



2020-05-19 10:11:30,883 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:65536 



2020-05-19 10:11:31,019 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:4294967296 



2020-05-19 10:11:31,155 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:-1:1 



2020-05-19 10:13:28,503 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: line 2: % Command incomplete[4]: bgp large-community-list standard Test1 permit  



2020-05-19 10:19:01,652 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

2020-05-19 10:19:03,702 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

2020-05-19 10:19:05,748 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12334/artifact/TP2U1604I386/ErrorLog/log_topotests.txt

Successful on other platforms/tests
  • Addresssanitizer topotests part 2
  • Topo tests part 0 on Ubuntu 16.04 i386
  • Topo tests part 1 on Ubuntu 18.04 amd64
  • Topo tests part 1 on Ubuntu 16.04 i386
  • CentOS 7 rpm pkg check
  • Addresssanitizer topotests part 0
  • Debian 8 deb pkg check
  • Debian 10 deb pkg check
  • Debian 9 deb pkg check
  • Topo tests part 0 on Ubuntu 16.04 amd64
  • Topo tests part 2 on Ubuntu 16.04 amd64
  • IPv4 protocols on Ubuntu 18.04
  • IPv4 ldp protocol on Ubuntu 18.04
  • Topo tests part 1 on Ubuntu 16.04 amd64
  • Static analyzer (clang)
  • Fedora 29 rpm pkg check
  • Ubuntu 20.04 deb pkg check
  • Topo tests part 0 on Ubuntu 18.04 amd64
  • IPv6 protocols on Ubuntu 18.04
  • Addresssanitizer topotests part 1
  • Ubuntu 18.04 deb pkg check
  • Ubuntu 16.04 deb pkg check

Warnings Generated during build:

Checkout code: Successful with additional warnings
Topo tests part 2 on Ubuntu 18.04 amd64: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TP2U1804AMD64-12334/test

Topology Tests failed for Topo tests part 2 on Ubuntu 18.04 amd64:

2020-05-19 07:47:54,097 ERROR: assert failed at "test_bgp_gr_functionality_topo1/test_BGP_GR_TC_51_p1": Testcase test_BGP_GR_TC_51_p1 : Failed 
   Error No bgp route found in rib of router r1..
assert 'No bgp route found in rib of router r1..' is True
r1: Daemon bgpd not running

From frr r1 bgpd log file:

2020/05/19 07:45:28 BGP: vty[??]@> enable
2020/05/19 07:45:28 BGP: vty[??]@# configure terminal
2020/05/19 07:45:28 BGP: vty[??]@(config)# no log file /tmp/topotests/test_bgp_gr_functionality_topo1/r1/zebra.log

2020/05/19 07:46:25 BGP: vty[??]@> enable
2020/05/19 07:46:25 BGP: vty[??]@# do show logging

2020/05/19 07:46:25 BGP: vty[??]@> enable
2020/05/19 07:46:25 BGP: vty[??]@# do show logging

2020/05/19 07:46:25 BGP: vty[??]@> enable
2020/05/19 07:46:25 BGP: vty[??]@# do write terminal

2020/05/19 07:46:25 BGP: bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 192.168.0.2 in vrf default
2020/05/19 07:46:25 BGP: bgp_update_receive: rcvd End-of-RIB for IPv6 Unicast from fd00::2 in vrf default
2020/05/19 07:46:25 BGP: vty[??]@> enable
2020/05/19 07:46:25 BGP: vty[??]@# configure terminal
2020/05/19 07:46:25 BGP: vty[??]@(config)# no log file /tmp/topotests/test_bgp_gr_functionality_topo1/r1/zebra.log


2020-05-19 08:10:59,793 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:-1 



2020-05-19 08:10:59,912 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:65536 



2020-05-19 08:11:00,029 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:4294967296 



2020-05-19 08:11:00,148 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:-1:1 



2020-05-19 08:12:55,788 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1804AMD64/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: line 2: % Command incomplete[4]: bgp large-community-list standard Test1 permit  



2020-05-19 08:18:08,546 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

2020-05-19 08:18:10,586 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

2020-05-19 08:18:12,625 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12334/artifact/TP2U1804AMD64/ErrorLog/log_topotests.txt

Topo tests part 2 on Ubuntu 16.04 i386: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TP2U1604I386-12334/test

Topology Tests failed for Topo tests part 2 on Ubuntu 16.04 i386:

2020-05-19 09:48:38,831 ERROR: assert failed at "test_bgp_gr_functionality_topo1/test_BGP_GR_TC_50_p1": Testcase test_BGP_GR_TC_50_p1 : Failed 
   Routes are still present 
   Error No bgp route found in rib of router r2..
assert 'No bgp route found in rib of router r2..' is True
r2: Daemon bgpd not running

From frr r2 bgpd log file:
2020/05/19 09:47:34 BGP: vty[??]@> enable
2020/05/19 09:47:34 BGP: vty[??]@# do show logging

2020/05/19 09:47:34 BGP: vty[??]@> enable
2020/05/19 09:47:34 BGP: vty[??]@# do show logging

2020/05/19 09:47:34 BGP: bgp_update_receive: rcvd End-of-RIB for IPv6 Unicast from fd00::1 in vrf default
2020/05/19 09:47:34 BGP: bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 192.168.0.1 in vrf default
2020/05/19 09:47:35 BGP: vty[??]@> enable
2020/05/19 09:47:35 BGP: vty[??]@# do write terminal

2020/05/19 09:47:35 BGP: vty[??]@> enable
2020/05/19 09:47:35 BGP: vty[??]@# configure terminal
2020/05/19 09:47:35 BGP: vty[??]@(config)# no log file /tmp/topotests/test_bgp_gr_functionality_topo1/r2/zebra.log

2020/05/19 09:48:09 BGP: bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 192.168.0.1 in vrf default
2020/05/19 09:48:10 BGP: [EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: fd00::1
2020/05/19 09:48:11 BGP: vty[??]@> enable
2020/05/19 09:48:11 BGP: vty[??]@# do write memory


2020-05-19 10:11:30,745 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:-1 



2020-05-19 10:11:30,883 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp community-list standard ANY permit 0:65536 



2020-05-19 10:11:31,019 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:4294967296 



2020-05-19 10:11:31,155 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: % Malformed community-list value
line 2: Failure to communicate[13] to bgpd, line: bgp large-community-list standard ANY permit 0:-1:1 



2020-05-19 10:13:28,503 ERROR: Traceback (most recent call last):
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 1649, in create_bgp_community_lists
    tgen, router, config_data, "bgp_community_list", build=build
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 238, in create_common_configuration
    load_config_to_router(tgen, router)
  File "/root/bamboo-agent-home/xml-data/build-dir/FRR-FRRPULLREQ-TP2U1604I386/topotests/lib/common_config.py", line 504, in load_config_to_router
    raise InvalidCLIError("%s" % output)
InvalidCLIError: line 2: % Command incomplete[4]: bgp large-community-list standard Test1 permit  



2020-05-19 10:19:01,652 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

2020-05-19 10:19:03,702 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

2020-05-19 10:19:05,748 ERROR: BGPd StdErr Log:neighbor_nexthop_self_cmd[neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?
no_neighbor_nexthop_self_cmd[no neighbor <A.B.C.D|X:X::X:X|WORD> next-hop-self]:
	node 38 (bgp evpn) already has this command installed.
	duplicate install_element call?

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12334/artifact/TP2U1604I386/ErrorLog/log_topotests.txt

Report for if.c | 5 issues
===============================================
< WARNING: strncat() is error-prone; please use strlcat() if possible#959: FILE: /tmp/f1-22381/if.c:959:
---
> WARNING: strncat() is error-prone; please use strlcat() if possible#959: FILE: /tmp/f2-22381/if.c:959:
103c103
< #1051: FILE: /tmp/f1-22381/if.c:1051:

Warnings Generated during build:

Debian 10 amd64 build: Successful with additional warnings

Debian Package lintian failed for Debian 10 amd64 build:
(see full package build log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12334/artifact/DEB10BUILD/ErrorLog/log_lintian.txt)

W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr-doc: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1
W: frr-pythontools: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1
W: frr: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1
W: frr-rpki-rtrlib: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1
W: frr-snmp: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1

@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented May 19, 2020

Continuous Integration Result: SUCCESSFUL

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12334/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Checkout code: Successful with additional warnings
Report for if.c | 5 issues
===============================================
< WARNING: strncat() is error-prone; please use strlcat() if possible#959: FILE: /tmp/f1-22381/if.c:959:
---
> WARNING: strncat() is error-prone; please use strlcat() if possible#959: FILE: /tmp/f2-22381/if.c:959:
103c103
< #1051: FILE: /tmp/f1-22381/if.c:1051:

Warnings Generated during build:

Debian 10 amd64 build: Successful with additional warnings

Debian Package lintian failed for Debian 10 amd64 build:
(see full package build log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12334/artifact/DEB10BUILD/ErrorLog/log_lintian.txt)

W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr-doc: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1
W: frr-pythontools: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1
W: frr: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1
W: frr-rpki-rtrlib: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1
W: frr-snmp: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200516-40-gd109788f1-0 (missing) -> 7.5-dev-20200516-40-gd109788f1-0~deb10u1

@mjstapp
Copy link
Contributor

mjstapp commented May 19, 2020

A couple of questions:

How does the client know when the operation is complete? that is, if the server-side (frr-side) is sending back chunks of information in response to a single incoming client request, how does the client know when the server is done?

Can a client issue any other requests while one of these long-running server-side operations is running? If a management system that's acting as an frr northbound client supports multiple clients of its own, I'm wondering how that would work.

@qlyoung qlyoung self-requested a review May 19, 2020 15:48
@rwestphal rwestphal force-pushed the yang-fragmentation branch from d109788 to 9c1fb6a Compare May 22, 2020 16:11
@rwestphal
Copy link
Member Author

Pushed another update including the following:

  • Rebase on top of master;
  • Bug fixes;
  • gRPC integration.

On the last item, the Get() RPC was extended with two new input parameters ("offset" and "chunk_size") and one new output parameter ("offset"). The way it's supposed to be used is simple: if data fragmentation is desired, Get() should be called inside of a loop where the input offset is taken from the outputted offset of the previous call (except in the case of the first call, where no start offset should be given). Once Get() doesn't return any offset, it means all data was already retrieved.

Here's an example ruby script that does that: https://gist.github.com/rwestphal/cf6b2a23138fc0d2011a8ee8f2e7fc0f

This implementation puts the external client in charge of requesting the data chunks one by one. This might be desirable in some cases, but the API is not as simple as it could be. Another possibility would be to have a single Get() call resulting in a stream of data chunks. For that to happen, the gRPC plugin would need to control the data iteration internally. That would need to be done asynchronously to ensure the main pthread can run other events between the iteration rounds (like cooperative multitasking). A small amount of state (an offset string) would be kept for each Get() request, until the request finishes or is canceled (e.g. the client can close the connection or specify a timeout timer). Fortunately we don't need to choose between one or the other approach, both can be implemented using either different RPCs or the same Get() RPC with different parameters.

Regarding the data chunks, they include all path and index information of the ancestor nodes to ensure they can be parsed correctly. To combine the data chunks it isn't as simple as combining the JSON objects using any standard JSON library. The management client needs to have some YANG intelligence and access to the FRR YANG modules in order to know how to combine the data chunks in the right places. It needs to know, for example, which JSON objects (or XML elements) correspond to YANG lists and which fields are their keys. There are lots of YANG SDKs out there that can be used for this purpose.

One important thing to say is that, the smaller the chunk size, the more overhead in terms of northbound callbacks called to fetch the data (more synchronization required) and encapsulation (all ancestor nodes need to be included in each chunk). Chunk sizes shouldn't be too small to avoid excessive overhead, nor too big to avoid blocking the main pthread for too long.

On performance, as of now the "show yang operational-data" command is as much as four time slower compared to using the counterpart "show ... json" commands to obtain similar JSON state data. The reason for this is that we're using the lyd_new_path() API from libyang to create YANG data trees, but that function is just too expensive. The code needs to be refactored to use lyd_new() and lyd_new_leaf() instead (I already wrote about this on #4082). Some work can also be done to refactor the get_elem() callback to avoid memory allocations (a terrible early design mistake). This of course is work for future PRs :D

A couple of questions:

How does the client know when the operation is complete? that is, if the server-side (frr-side) is sending back chunks of information in response to a single incoming client request, how does the client know when the server is done?

@mjstapp the northbound plugin will know the iteration is done when nb_oper_data_iterate() returns NB_ITER_FINISH, then it's up to the implementer of the northbound plugin how to communicate that information to the external client (above I explained how I did that for the gRPC plugin).

Can a client issue any other requests while one of these long-running server-side operations is running? If a management system that's acting as an frr northbound client supports multiple clients of its own, I'm wondering how that would work.

@mjstapp Yes, this shouldn't be a problem. The YANG modeled state-data is generated on demand using the northbound callbacks and there's nothing in the current design preventing multiple requests from being served concurrently (not in parallel though as we don't have the required infrastructure for that).

@rwestphal rwestphal marked this pull request as ready for review May 22, 2020 16:34
@LabN-CI
Copy link
Collaborator

LabN-CI commented May 22, 2020

Outdated results 💚

Basic BGPD CI results: SUCCESS, 0 tests failed

_ _
Result SUCCESS git merge/6371 9c1fb6a
Date 05/22/2020
Start 12:15:48
Finish 12:41:43
Run-Time 25:55
Total 1815
Pass 1815
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2020-05-22-12:15:48.txt
Log autoscript-2020-05-22-12:16:43.log.bz2
Memory 499 499 426

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented May 22, 2020

Continuous Integration Result: FAILED

Continuous Integration Result: FAILED

See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12372/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Successful

Basic Tests: Failed

Topo tests part 0 on Ubuntu 18.04 amd64: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1804-12372/test

Topology Tests failed for Topo tests part 0 on Ubuntu 18.04 amd64:

*** defaultIntf: warning: r1 has no interfaces
2020-05-22 16:53:36,983 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 16:53:39,058 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 16:53:45,228 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 16:55:47,224 ERROR: 'router_json_cmp' failed after 77.44 seconds
2020-05-22 16:55:47,226 ERROR: assert failed at "test_isis_sr_topo1/test_rib_ipv6_step1": "rt3" JSON output mismatches the expected result
assert Generated JSON diff error report:
  
  > $->2001:db8:1000::4/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 30,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::4/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt5-2"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-sw1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt5-1"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::4/128[0]->nexthops: d2 has the following element at index 2 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16041
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt5-1"
  		}
  	
  		Closest match in d1 is at index 0 with the following errors: 
  	
  		> $->2001:db8:1000::4/128[0]->nexthops[0]->labels: d1 has Array of length 0 but in d2 it is of length 1

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12372/artifact/TOPOU1804/ErrorLog/log_topotests.txt

Topo tests part 0 on Ubuntu 16.04 i386: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOI386-12372/test

Topology Tests failed for Topo tests part 0 on Ubuntu 16.04 i386:

*** defaultIntf: warning: r1 has no interfaces
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
2020-05-22 18:52:02,741 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:52:04,807 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:52:10,995 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:54:15,216 ERROR: 'router_json_cmp' failed after 79.41 seconds
2020-05-22 18:54:15,219 ERROR: assert failed at "test_isis_sr_topo1/test_rib_ipv6_step1": "rt3" JSON output mismatches the expected result
assert Generated JSON diff error report:
  
  > $->2001:db8:1000::4/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 30,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::4/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt5-2"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-sw1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt5-1"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::4/128[0]->nexthops: d2 has the following element at index 2 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16041
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt5-1"
  		}
  	
  		Closest match in d1 is at index 0 with the following errors: 
  	
  		> $->2001:db8:1000::4/128[0]->nexthops[0]->labels: d1 has Array of length 0 but in d2 it is of length 1

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12372/artifact/TOPOI386/ErrorLog/log_topotests.txt

Topo tests part 0 on Ubuntu 16.04 amd64: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1604-12372/test

Topology Tests failed for Topo tests part 0 on Ubuntu 16.04 amd64:

*** defaultIntf: warning: r1 has no interfaces
2020-05-22 18:51:51,880 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:51:53,934 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:52:00,078 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:54:02,806 ERROR: 'router_json_cmp' failed after 78.05 seconds
2020-05-22 18:54:02,809 ERROR: assert failed at "test_isis_sr_topo1/test_rib_ipv6_step1": "rt2" JSON output mismatches the expected result
assert Generated JSON diff error report:
  
  > $->2001:db8:1000::6/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 30,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::6/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16061
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16061
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-2"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::6/128[0]->nexthops: d2 has the following element at index 1 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16061
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt4-2"
  		}
  	
  		Closest match in d1 is at index 0 with the following errors: 
  	
  		> $->2001:db8:1000::6/128[0]->nexthops[0]->labels: d1 has Array of length 0 but in d2 it is of length 1
  	
  
  > $->2001:db8:1000::5/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 30,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::5/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16051
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16051
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-2"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                17051
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-sw1"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::5/128[0]->nexthops: d2 has the following element at index 1 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16051
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt4-2"
  		}
  	
  		Closest match in d1 is at index 1 with the following errors: 
  	
  		> $->2001:db8:1000::5/128[0]->nexthops[1]->labels: d1 has Array of length 0 but in d2 it is of length 1
  	
  
  > $->2001:db8:1000::4/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 20,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::4/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-2"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::4/128[0]->nexthops: d2 has the following element at index 1 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16041
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt4-2"
  		}
  	
  		Closest match in d1 is at index 0 with the following errors: 
  	
  		> $->2001:db8:1000::4/128[0]->nexthops[0]->labels: d1 has Array of length 0 but in d2 it is of length 1

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12372/artifact/TOPOU1604/ErrorLog/log_topotests.txt

Successful on other platforms/tests
  • Addresssanitizer topotests part 1
  • Topo tests part 2 on Ubuntu 16.04 amd64
  • CentOS 7 rpm pkg check
  • Ubuntu 16.04 deb pkg check
  • Topo tests part 2 on Ubuntu 16.04 i386
  • Topo tests part 1 on Ubuntu 16.04 i386
  • IPv6 protocols on Ubuntu 18.04
  • Debian 10 deb pkg check
  • Topo tests part 2 on Ubuntu 18.04 amd64
  • Topo tests part 1 on Ubuntu 16.04 amd64
  • Fedora 29 rpm pkg check
  • Addresssanitizer topotests part 2
  • Debian 8 deb pkg check
  • IPv4 protocols on Ubuntu 18.04
  • Ubuntu 20.04 deb pkg check
  • IPv4 ldp protocol on Ubuntu 18.04
  • Debian 9 deb pkg check
  • Addresssanitizer topotests part 0
  • Topo tests part 1 on Ubuntu 18.04 amd64
  • Static analyzer (clang)
  • Ubuntu 18.04 deb pkg check

Warnings Generated during build:

Checkout code: Successful with additional warnings
Topo tests part 0 on Ubuntu 18.04 amd64: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1804-12372/test

Topology Tests failed for Topo tests part 0 on Ubuntu 18.04 amd64:

*** defaultIntf: warning: r1 has no interfaces
2020-05-22 16:53:36,983 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 16:53:39,058 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 16:53:45,228 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 16:55:47,224 ERROR: 'router_json_cmp' failed after 77.44 seconds
2020-05-22 16:55:47,226 ERROR: assert failed at "test_isis_sr_topo1/test_rib_ipv6_step1": "rt3" JSON output mismatches the expected result
assert Generated JSON diff error report:
  
  > $->2001:db8:1000::4/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 30,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::4/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt5-2"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-sw1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt5-1"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::4/128[0]->nexthops: d2 has the following element at index 2 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16041
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt5-1"
  		}
  	
  		Closest match in d1 is at index 0 with the following errors: 
  	
  		> $->2001:db8:1000::4/128[0]->nexthops[0]->labels: d1 has Array of length 0 but in d2 it is of length 1

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12372/artifact/TOPOU1804/ErrorLog/log_topotests.txt

Topo tests part 0 on Ubuntu 16.04 i386: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOI386-12372/test

Topology Tests failed for Topo tests part 0 on Ubuntu 16.04 i386:

*** defaultIntf: warning: r1 has no interfaces
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
2020-05-22 18:52:02,741 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:52:04,807 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:52:10,995 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:54:15,216 ERROR: 'router_json_cmp' failed after 79.41 seconds
2020-05-22 18:54:15,219 ERROR: assert failed at "test_isis_sr_topo1/test_rib_ipv6_step1": "rt3" JSON output mismatches the expected result
assert Generated JSON diff error report:
  
  > $->2001:db8:1000::4/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 30,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::4/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt5-2"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-sw1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt5-1"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::4/128[0]->nexthops: d2 has the following element at index 2 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16041
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt5-1"
  		}
  	
  		Closest match in d1 is at index 0 with the following errors: 
  	
  		> $->2001:db8:1000::4/128[0]->nexthops[0]->labels: d1 has Array of length 0 but in d2 it is of length 1

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12372/artifact/TOPOI386/ErrorLog/log_topotests.txt

Topo tests part 0 on Ubuntu 16.04 amd64: Failed (click for details)

Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1604-12372/test

Topology Tests failed for Topo tests part 0 on Ubuntu 16.04 amd64:

*** defaultIntf: warning: r1 has no interfaces
2020-05-22 18:51:51,880 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:51:53,934 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:52:00,078 ERROR: PIMd StdErr Log:% No Path to RP address specified: 192.168.100.1

2020-05-22 18:54:02,806 ERROR: 'router_json_cmp' failed after 78.05 seconds
2020-05-22 18:54:02,809 ERROR: assert failed at "test_isis_sr_topo1/test_rib_ipv6_step1": "rt2" JSON output mismatches the expected result
assert Generated JSON diff error report:
  
  > $->2001:db8:1000::6/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 30,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::6/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16061
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16061
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-2"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::6/128[0]->nexthops: d2 has the following element at index 1 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16061
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt4-2"
  		}
  	
  		Closest match in d1 is at index 0 with the following errors: 
  	
  		> $->2001:db8:1000::6/128[0]->nexthops[0]->labels: d1 has Array of length 0 but in d2 it is of length 1
  	
  
  > $->2001:db8:1000::5/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 30,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::5/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16051
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16051
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-2"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                17051
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-sw1"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::5/128[0]->nexthops: d2 has the following element at index 1 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16051
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt4-2"
  		}
  	
  		Closest match in d1 is at index 1 with the following errors: 
  	
  		> $->2001:db8:1000::5/128[0]->nexthops[1]->labels: d1 has Array of length 0 but in d2 it is of length 1
  	
  
  > $->2001:db8:1000::4/128: d2 has the following element at index 0 which is not present in d1: 
  
  	{
  	    "distance": 115,
  	    "destSelected": true,
  	    "protocol": "isis",
  	    "metric": 20,
  	    "selected": true,
  	    "installed": true,
  	    "prefix": "2001:db8:1000::4/128",
  	    "nexthops": [
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-1"
  	        },
  	        {
  	            "fib": true,
  	            "active": true,
  	            "labels": [
  	                16041
  	            ],
  	            "afi": "ipv6",
  	            "interfaceName": "eth-rt4-2"
  	        }
  	    ]
  	}
  
  	Closest match in d1 is at index 0 with the following errors: 
  
  	> $->2001:db8:1000::4/128[0]->nexthops: d2 has the following element at index 1 which is not present in d1: 
  	
  		{
  		    "fib": true,
  		    "active": true,
  		    "labels": [
  		        16041
  		    ],
  		    "afi": "ipv6",
  		    "interfaceName": "eth-rt4-2"
  		}
  	
  		Closest match in d1 is at index 0 with the following errors: 
  	
  		> $->2001:db8:1000::4/128[0]->nexthops[0]->labels: d1 has Array of length 0 but in d2 it is of length 1

see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12372/artifact/TOPOU1604/ErrorLog/log_topotests.txt

Report for if.c | 5 issues
===============================================
< WARNING: strncat() is error-prone; please use strlcat() if possible#962: FILE: /tmp/f1-5035/if.c:962:
---
> WARNING: strncat() is error-prone; please use strlcat() if possible#962: FILE: /tmp/f2-5035/if.c:962:
103c103
< #1054: FILE: /tmp/f1-5035/if.c:1054:

Warnings Generated during build:

Debian 10 amd64 build: Successful with additional warnings

Debian Package lintian failed for Debian 10 amd64 build:
(see full package build log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-12372/artifact/DEB10BUILD/ErrorLog/log_lintian.txt)

W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr-doc: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200522-12-g9c1fb6a28-0 (missing) -> 7.5-dev-20200522-12-g9c1fb6a28-0~deb10u1
W: frr-pythontools: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200522-12-g9c1fb6a28-0 (missing) -> 7.5-dev-20200522-12-g9c1fb6a28-0~deb10u1
W: frr-rpki-rtrlib: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200522-12-g9c1fb6a28-0 (missing) -> 7.5-dev-20200522-12-g9c1fb6a28-0~deb10u1
W: frr-snmp: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200522-12-g9c1fb6a28-0 (missing) -> 7.5-dev-20200522-12-g9c1fb6a28-0~deb10u1
W: frr: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200522-12-g9c1fb6a28-0 (missing) -> 7.5-dev-20200522-12-g9c1fb6a28-0~deb10u1

@Spantik Spantik self-requested a review May 25, 2020 07:46
@mjstapp
Copy link
Contributor

mjstapp commented May 25, 2020

This implementation puts the external client in charge of requesting the data chunks one by one.

So ... the description of the current approach sounds just like a restful rpc - which is what I proposed originally. the client has to understand how to start an iteration, what to do with the context transferred in each reply, and how to interpret the end of the iteration. but we sort of have complexity on both sides: the client has to do the same things it would have to do if the operation were just specified clearly as an rpc, but at the same time there's extra complexity on the server (frr) side - and in each plugin, it sounds like. wouldn't it be clearer just to specify this as a restful rpc from the beginning, and simplify what the frr and plugin components have to do?

@LabN-CI
Copy link
Collaborator

LabN-CI commented Aug 31, 2020

Outdated results 💚

Basic BGPD CI results: SUCCESS, 0 tests failed

_ _
Result SUCCESS git merge/6371 fac75eb
Date 08/31/2020
Start 13:30:48
Finish 13:56:37
Run-Time 25:49
Total 1815
Pass 1815
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2020-08-31-13:30:48.txt
Log autoscript-2020-08-31-13:31:47.log.bz2
Memory 477 488 425

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented Aug 31, 2020

Continuous Integration Result: SUCCESSFUL

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13935/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Checkout code: Successful with additional warnings
Report for if.c | 5 issues
===============================================
< WARNING: strncat() is error-prone; please use strlcat() if possible#975: FILE: /tmp/f1-2175/if.c:975:
---
> WARNING: strncat() is error-prone; please use strlcat() if possible#975: FILE: /tmp/f2-2175/if.c:975:
98c98
< #1067: FILE: /tmp/f1-2175/if.c:1067:

Warnings Generated during build:

Debian 10 amd64 build: Successful with additional warnings

Debian Package lintian failed for Debian 10 amd64 build:
(see full package build log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13935/artifact/DEB10BUILD/ErrorLog/log_lintian.txt)

W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr-rpki-rtrlib: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200831-13-gfac75eb98-0 (missing) -> 7.5-dev-20200831-13-gfac75eb98-0~deb10u1
W: frr: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200831-13-gfac75eb98-0 (missing) -> 7.5-dev-20200831-13-gfac75eb98-0~deb10u1
W: frr-pythontools: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200831-13-gfac75eb98-0 (missing) -> 7.5-dev-20200831-13-gfac75eb98-0~deb10u1
W: frr-doc: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200831-13-gfac75eb98-0 (missing) -> 7.5-dev-20200831-13-gfac75eb98-0~deb10u1
W: frr-snmp: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200831-13-gfac75eb98-0 (missing) -> 7.5-dev-20200831-13-gfac75eb98-0~deb10u1

lib/northbound.c Outdated
int slen = 0;

for (uint8_t n = 0; n < MIN(keys->num, list->keys_size); n++)
slen += snprintf(buf + slen, size - slen, "[%s='%s']",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this size check is not correct. For example, suppose the buffer is of size 1 and our first key to print ends up being 10 characters. slen = snprintf(buf + 0, 1 - 0, "xxxxxxxxxx") so now slen == 10 because the output was truncated. In the next iteration we again compute size - slen, but size is an unsigned integer type of >= rank to int so slen is converted to size_t before the subtraction takes effect, leaving us with (size_t)1 - (size_t)10 which underflows into a very large positive number.

Normally this is where I say "this is why we should use strlcat" but given the performance characteristics under these circumstances let's just fix the manual concatenation.

Truncation should also be indicated via return code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, using snprintf() that way is very convenient but it's not safe. In this particular case I think it'd be very hard to overflow a BUFSIZ-sized buffer with YANG list keys, but I've changed the code nonetheless to follow best practices.

but given the performance characteristics under these circumstances let's just fix the manual concatenation.

This isn't performance sensitive code :) This function is only called when debug northbound callbacks state is enabled, a command that should only be used by FRR developers.

lib/northbound.c Outdated

if (input->max_elements != 0
&& output->num_elements >= input->max_elements
&& !CHECK_FLAG(nb_node->flags, F_NB_NODE_INSIDE_KEYLESS_LIST)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a gating check so might be slightly more efficient - and more clear - to put it as the first condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (;;) {
ret = nb_oper_data_iterate(&iter_input, &iter_output);
if (!repeat || ret != NB_ITER_STOP)
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused here. It seems like it should be ret != NB_ITER_CONTINUE or ret == NB_ITER_STOP for our loop-break condition? If it's CONTINUE don't we want to proceed to update the offset path with where we stopped, as below, and continue looping?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB_ITER_STOP is returned when nb_oper_data_iterate() has more data to iterate over but had to stop due to the provided maximum chunk size. In the case of the other return values (NB_ITER_FINISH and NB_ITER_ABORT) there's no need to do another iteration round.

The NB_ITER_CONTINUE return value is only used internally and can't be returned by nb_oper_data_iterate(). I've fixed the documentation comment of that function to make that clear.

If you prefer, I can also rename NB_ITER_STOP to NB_ITER_SUSPEND as "suspend" implies that the iteration needs to be resumed at one point, while "stop" doesn't give any hint in that direction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qlyoung just pushed another update including the s/NB_ITER_STOP/NB_ITER_SUSPEND/ rename as per our conversation.

}

"received RPC Get(type: %u, encoding: %u, with_defaults: %u, path: %s)",
type, encoding, with_defaults,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we print offset path as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

std::string path = tag->request.path();
// Request: string offset = 5;
std::string input_offset = tag->request.offset();
// Request: uint32 chunk_size = 6;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

tag->async_responder.WriteAndFinish(
response, grpc::WriteOptions(),
grpc::Status::OK, tag);
tag->state = FINISH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we always finish because the definition of a "complete" return value is now a single chunk, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Long term I think we should have two separate RPCs to fetch state data:

  • Get(xpath, input_offset, chunk_size) which will be an unary RPC that returns a single chunk of data;
  • GetStream(xpath, chunk_size) which will return a stream of data chunks (generated asynchronously to avoid blocking the main pthread for too long).

For now Get() is a streaming RPC but in practice it only returns one response per request. This is something I couldn't address on this PR due to time constraints.

return node;
} else {
if (IPV4_ADDR_CMP(&peer->addr, &address) >= 0)
return node;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scary but I guess I can live with it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What bothers you here precisely? I think the args->exact_match handling in the lookup_entry() callbacks is a fair price to pay for operational-data fragmentation support. The commit message of b8379b3 explains why this is necessary.

If you're concerned about the O(n) time complexity of iterating over the linked list until finding the next item, then yeah, that's not cool. This should however be easy to address by changing rip->peer_list to be an RB-tree (something that can be done later).

@LabN-CI
Copy link
Collaborator

LabN-CI commented Sep 1, 2020

Outdated results 💚

Basic BGPD CI results: SUCCESS, 0 tests failed

_ _
Result SUCCESS git merge/6371 c555fdf
Date 09/01/2020
Start 00:25:48
Finish 00:51:42
Run-Time 25:54
Total 1815
Pass 1815
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2020-09-01-00:25:48.txt
Log autoscript-2020-09-01-00:26:45.log.bz2
Memory 465 483 427

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

NetDEF-CI commented Sep 1, 2020

Continuous Integration Result: SUCCESSFUL

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13939/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Warnings Generated during build:

Checkout code: Successful with additional warnings
Report for if.c | 5 issues
===============================================
< WARNING: strncat() is error-prone; please use strlcat() if possible#975: FILE: /tmp/f1-25695/if.c:975:
---
> WARNING: strncat() is error-prone; please use strlcat() if possible#975: FILE: /tmp/f2-25695/if.c:975:
98c98
< #1067: FILE: /tmp/f1-25695/if.c:1067:

Warnings Generated during build:

Debian 10 amd64 build: Successful with additional warnings

Debian Package lintian failed for Debian 10 amd64 build:
(see full package build log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13939/artifact/DEB10BUILD/ErrorLog/log_lintian.txt)

W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr source: pkg-js-tools-test-is-missing
W: frr source: newer-standards-version 4.4.1 (current is 4.3.0)
W: frr-rpki-rtrlib: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200901-13-gc555fdf51-0 (missing) -> 7.5-dev-20200901-13-gc555fdf51-0~deb10u1
W: frr-doc: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200901-13-gc555fdf51-0 (missing) -> 7.5-dev-20200901-13-gc555fdf51-0~deb10u1
W: frr-pythontools: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200901-13-gc555fdf51-0 (missing) -> 7.5-dev-20200901-13-gc555fdf51-0~deb10u1
W: frr-snmp: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200901-13-gc555fdf51-0 (missing) -> 7.5-dev-20200901-13-gc555fdf51-0~deb10u1
W: frr: changelog-file-missing-explicit-entry 6.0-2 -> 7.5-dev-20200901-13-gc555fdf51-0 (missing) -> 7.5-dev-20200901-13-gc555fdf51-0~deb10u1

Log all input and output parameters passed to those callbacks. The
extra verbosity is very valuable to understand how the NB iterates
over YANG-modeled operational data (and troubleshoot possible
problems).

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The nb_oper_data_iterate() API function has lots of input parameters,
most of which need to be forwarded to different internal functions.

Introduce a new structure to group all these input parameters in
a single place. This greatly simplifies the signature of several
operational-data iteration functions, and makes it easier to add
new input parameters whenever necessary (e.g. for the upcoming YANG
fragmentation work).

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This commit extends the northbound layer with infrastructure for
asynchronous YANG-modeled operational data retrieval. This should
solve the problem of blocking the main pthread of the daemons for
too long while fetching bulk state data (e.g. millions of routes).

The nb_oper_data_iterate() API function was modified to accept
a starting offset, a maximum number of elements to iterate over
and outputs where the iteration stopped. The outputted offset can
be feeded back to the input to resume the iteration from where
it stopped. It's up to the NB client whether to fetch and buffer
all data before delivering it to the external client, or demand
the external client to be in charge of resuming the iteration
as necessary. In any case, canceling the iteration before it
finishes should be possible (small chunk sizes are better for more
responsiveness).

The challenge with asynchronous iteration is that data can be
modified while it's being iterated over. In particular, the iteration
might stop at a given YANG list entry, then once the iteration
resumes that list entry might not exist anymore. This means the
pointer returned by the latest get_next() callback is unsafe to use
once the iteration is resumed. To deal with this problem, the NB
should use the lookup_entry() callback as a form of synchronization
to ensure the list entry referent to the base offset of the data
iteration still exists. Further, a new 'exact_match' parameter was
added to that callback so that it can find the next available entry
in the list in case the one we're looking for doesn't exist anymore.

One limitation of the current implementation is that the data
iteration can't stop in the middle of a leaf-list or keyless
list. This is because the NB doesn't provide the get_keys() and
lookup_entry() callbacks for such nodes, so there's no way to perform
the required synchronization step once the iteration is resumed. In
the future it should be possible to work around this limitation by
implementing pseudo-keys that are internal to the FRR implementation
and not visible to the external world. The ConfD API uses a similar
strategy to deal with this problem.

As of now, none of the existing NB clients (CLI, gRPC, ConfD and
Sysrepo) can leverage this new infrastructure since none of them
have async capabilities. In spite of that, the CLI "show yang
operational-data" command was extended with the "max-elements
(1-1000000) [repeat]" parameters for testing purposes (the data
iteration can be stopped and resumed several times, but the command
still runs synchronously). These new parameters should be removed
once the CLI gets async capabilities (in that case a global knob
to enable fragmentation at a system level should be better).

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Move these callbacks to where they belong.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This is a follow-up to the previous commit. Using the
nb_oper_data_iterate() API function it's now possible to specify a
base offset of where the iteration should start. Since a northbound
client can perform this iteration asynchronously, retrieving data
in small chunks, it's possible that an iteration might stop in the
middle of a list entry. When the iteration is resumed, this list
entry might not exist anymore. In this case, the lookup_entry()
callback would fail and consequently the data iteration would be
aborted. This shouldn't be handled as an error, however, since data
is allowed to change while it's been iterated over.

To address this issue, introduce a new "exact_match" (boolean)
parameter to the lookup_entry() callback. This parameter will be
set to 'false' only when starting a data iteration from a base
offset, otherwise it's set to 'true'. When it's set to 'false',
the northbound callbacks should return the first list entry greater
than or equal to the provided search keys. This means that, if
the requested list entry was removed, the next one in the list is
returned instead and the iteration continues.

Here's a summary of how this can be done:
* When iterating over rb-trees, use RB_FIND for exact matches and
  RB_NFIND for non-exact matches;
* When iterating over routing tables, use route_node_lookup()
  for exact matches and route_table_get_next() for non-exact matches;
* For linked-lists, use == or >= logic for exact and non-exact
  matches when comparing elements (this assumes the list is ordered,
  otherwise it isn't an appropriate data-structure for storing
  YANG-modeled state data);
* For hash tables there's no easy way to perform a lookup with >=
  semantics, since hash tables store elements out of order by
  definition. In this case, the best thing to do is to switch to
  a different data structure that allow efficient in-order traversal
  (e.g. rb-trees). The lookup_entry() callbacks from bfdd and vrrpd
  fall into this category and weren't updated for this reason.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Return error messages in addition to error codes whenever something
goes wrong while fetching state data. This allows the northbound
plugins to give precise error information to the external northbound
clients, enhancing the user experience and automation friendliness.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Change the flags passed to lyd_print_mem() to avoid printing
implicitly created default nodes that are not part of the user
request.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
* Remove parameters that are not used for anything (leftovers from
  previous code);
* Make nb_oper_data_lookup_list_entry() cache the pointers of
  all lists present in the data path requested by the user. Then
  modify nb_oper_data_iterate_with_offset() to obtain the
  required list pointers from this cache instead of calling
  nb_oper_data_lookup_list_entry() again, which is a relatively
  expensive function.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
For simplicity, unify the two data offset parameters (base and node)
into a single parameter. This change makes the oper_data_iterate()
API function easier to use in the northbound plugins.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Allowing multiple data paths to be requested in a single RPC doesn't
add any value since one can just call the Get() RPC multiple times
using different data paths if necessary. For that reason, remove
the "repeated" modifier from the "path" input parameter to simplify
the code.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The Get() RPC was extended with two new input parameters ("offset"
and "chunk_size") and one new output parameter ("offset") in order
to leverage the new data chunking capabilities provided by the
nb_oper_data_iterate() API.

The way it's supposed to be used is very simple: if data
fragmentation is desired, Get() should be called inside of a loop
where the input offset is taken from the outputted offset of the
previous call (except in the case of the first call, where no start
offset should be given). Once Get() doesn't return any offset,
it means all data was already retrieved.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
* Combine nb_oper_data_iterate_without_offset() and
  nb_oper_data_iterate_with_offset() into a single function to avoid
  code duplication;
* Fix the node offset check to handle augmentations and other corner
  cases;
* Fix checks of when the iteration needs to stop.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
NB_ITER_SUSPEND implies that the iteration needs to be resumed at
one point, while NB_ITER_STOP doesn't give any hint in that direction.

Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
@LabN-CI
Copy link
Collaborator

LabN-CI commented Sep 1, 2020

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/6371 85ef060
Date 09/01/2020
Start 15:51:57
Finish 16:17:52
Run-Time 25:55
Total 1815
Pass 1815
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2020-09-01-15:51:57.txt
Log autoscript-2020-09-01-15:52:51.log.bz2
Memory 495 490 424

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: FAILED

Test incomplete. See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-13969/

This is a comment from an automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source / Pull Request: Successful

Building Stage: Successful

Basic Tests: Incomplete

Topo tests part 2 on Ubuntu 18.04 amd64: Incomplete (check logs for details)
Topo tests part 0 on Ubuntu 18.04 amd64: Failed (click for details) Topo tests part 0 on Ubuntu 18.04 amd64: No useful log found
Successful on other platforms/tests
  • Topo tests part 1 on Ubuntu 16.04 i386
  • Ubuntu 16.04 deb pkg check
  • Topo tests part 0 on Ubuntu 16.04 i386
  • Debian 10 deb pkg check
  • Topo tests part 2 on Ubuntu 16.04 amd64
  • Fedora 29 rpm pkg check
  • Debian 9 deb pkg check
  • Topo tests part 0 on Ubuntu 18.04 arm8
  • Static analyzer (clang)
  • Topo tests part 2 on Ubuntu 16.04 i386
  • Ubuntu 18.04 deb pkg check
  • Topo tests part 1 on Ubuntu 18.04 amd64
  • Ubuntu 20.04 deb pkg check
  • Addresssanitizer topotests part 1
  • CentOS 7 rpm pkg check
  • Addresssanitizer topotests part 0
  • IPv4 ldp protocol on Ubuntu 18.04
  • Debian 8 deb pkg check
  • Addresssanitizer topotests part 2
  • Topo tests part 0 on Ubuntu 16.04 amd64
  • IPv4 protocols on Ubuntu 18.04
  • Topo tests part 1 on Ubuntu 18.04 arm8
  • Topo tests part 1 on Ubuntu 16.04 amd64
  • IPv6 protocols on Ubuntu 18.04
  • Topo tests part 2 on Ubuntu 18.04 arm8

@donaldsharp
Copy link
Member

@polychaeta autoclose in 4 weeks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants