Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clixon does not switch to chunked framing after NETCONF 1.1 is negotiated #314

Closed
brunorijsman opened this issue Mar 14, 2022 · 18 comments
Closed

Comments

@brunorijsman
Copy link

brunorijsman commented Mar 14, 2022

Clixon NETCONF does not interoperate with ncclient or with the MG-Soft NETCONF browser.

After the initial hello exchange, Clixon stops responding to RPC calls.

This is because Clixon announces capability "urn:ietf:params:netconf:base:1.1" which causes the clients to start using version 1.1 of NETCONF over SSH as specified in RFC 6242.

RFC 6242 section 4.1 "Framing protocol" specifies that instead of using "]]>]]>" as a frame delimiter, chunked encoding is used: "If the :base:1.1 capability is advertised by both peers, the chunked framing mechanism (see Section 4.2) is used for the remainder of the NETCONF session."

Clixon advertises that supports version 1.1, but Clixon is hard-coded to always send and receive ]]>]]> frame delimiters, even if version 1.1 has been negotiated. See for example function netconf_input_cb and add_postamble (always called by netconf_output_encap).

Ncclient and the MG-Soft browser start sending chunked encoding as soon as version 1.1 has been negotiated, which Clixon does not recognize and hence Clixon does not respond to the client messages.

I added some debugging to function netconf_input.

static int
netconf_input_cb(int   s, 
		 void *arg)
{
...
	for (i=0; i<len; i++){
	    if (buf[i] == 0)
		continue; /* Skip NULL chars (eg from terminals) */
	    clicon_debug(2, "[DEBUG] received '%c' (%d)", buf[i], buf[i]); /* <<< */
	    cprintf(cb, "%c", buf[i]);
	    if (detect_endtag("]]>]]>",
			      buf[i],
			      &xml_state)) {
...
}

This confirms that both ncclient and the MG-Soft NETCONF brower start using chunked framing (I see ##) which Clixon does not recognize.

Note: for the MG-Soft NETCONF browser, the work-around is to force MG-Soft to use NETCONF 1.0 (this is an option in the connect screen)

Note for ncclient the work-around is to specify the Alcatel-Lucent device (it only supports NETCONF 1.0):

manager.connect(host="127.0.0.1",
                port=830,
                username="XXX",
                password="XXX",
                hostkey_verify=False,
                device_params={'name':'alu'})
@olofhagsand
Copy link
Member

See also #50

@olofhagsand
Copy link
Member

Workaround, patch, and rebuild

diff --git a/lib/src/clixon_netconf_lib.c b/lib/src/clixon_netconf_lib.c
index c8b14a35..1967084f 100644
--- a/lib/src/clixon_netconf_lib.c
+++ b/lib/src/clixon_netconf_lib.c
@@ -1695,10 +1695,6 @@ netconf_hello_server(clicon_handle h,
 
     cprintf(cb, "<hello xmlns=\"%s\" message-id=\"%u\">", NETCONF_BASE_NAMESPACE, 42);
     cprintf(cb, "<capabilities>");
-    /* Each peer MUST send at least the base NETCONF capability, "urn:ietf:params:netconf:base:1.1" 
-     * RFC 6241 Sec 8.1
-     */
-    cprintf(cb, "<capability>%s</capability>", NETCONF_BASE_CAPABILITY_1_1);
     /* A peer MAY include capabilities for previous NETCONF versions, to indicate
        that it supports multiple protocol versions. */
     cprintf(cb, "<capability>%s</capability>", NETCONF_BASE_CAPABILITY_1_0);

@brunorijsman
Copy link
Author

Thanks for the workaround!

@olofhagsand
Copy link
Member

olofhagsand commented Mar 15, 2022

If possible, please report on interoperability of using 1.0 capability with ncclient / MG-soft. May consider pushing the fix ^to master #50 is fixed.

@brunorijsman
Copy link
Author

I have tested the patch, and it appears to work. Ncclient can now connect without specifying the Alcatel-Lucent device. MG-Soft NETCONF browser can now connect using NETCONF version auto-select (NETCONF version 1.1 does not work, but that is expected given the patch). I also did a few test "get" operations and those work fine too. Once again, thanks for fast response and for the great work on Clixon - I am just getting familiar with it now and I am really liking what I have seen so far. Kudos.

olofhagsand added a commit that referenced this issue Mar 17, 2022
…nnounce 1.0

  * See [Clixon does not switch to chunked framing after NETCONF 1.1 is negotiated](#314)
  * To enable Netconf 1.1, set `NETCONF_1_1_ANNOUNCE`
@olofhagsand
Copy link
Member

Thanks!
I got ncclient working as well so I can do interop.
Patch added disables default 1.1 announcing.
Plan to add chunked framing to next release 5.7, will revisit this issue if so.
Close since verified ^

olofhagsand added a commit that referenced this issue Mar 28, 2022
…f 1.1.

  * First hello is 1.0 EOM framing, then successing rpc is chunked framing
  * See
    * [Netconf framing](#50), and
    * [Clixon does not switch to chunked framing after NETCONF 1.1 is negotiated](#314)
* C:
  * Moved netconf framing code from netconf application to clixon lib
* Test:
  * New expecteof_netconf and adjusted other expect scripts to handle NETCONF 1.1 framing
@olofhagsand
Copy link
Member

NETCONF 1.1 framing is now implemented, and has (briefly) been verified by a simple ncclient command.
Can you please try it out?

@brunorijsman
Copy link
Author

brunorijsman commented Mar 28, 2022 via email

@olofhagsand
Copy link
Member

The master branch

@brunorijsman
Copy link
Author

brunorijsman commented Mar 31, 2022 via email

@brunorijsman
Copy link
Author

Also, this code looks strange:

    if (clicon_option_int(h, "CLICON_NETCONF_BASE_CAPABILITY") > 0){
	/* Each peer MUST send at least the base NETCONF capability, "urn:ietf:params:netconf:base:1.1" 
	 * RFC 6241 Sec 8.1
	 */
	if (clicon_option_int(h, "CLICON_NETCONF_BASE_CAPABILITY") > 0) /* RFC 6241 */
	    cprintf(cb, "<capability>%s</capability>", NETCONF_BASE_CAPABILITY_1_1);
    }

Why are there two nested if statements (one with curly braces and one without) checking for the exact same condition?

@brunorijsman
Copy link
Author

I tried adding the following line to my .xml clixon config file:

<CLICON_NETCONF_BASE_CAPABILITY>1</CLICON_NETCONF_BASE_CAPABILITY>

But I got

Mar 31 14:46:08: application unknown-element Failed to find YANG spec of XML node: CLICON_NETCONF_BASE_CAPABILITY with parent: clixon-config in namespace: http://clicon.org/config <bad-element>CLICON_NETCONF_BASE_CAPABILITY</bad-element>

@olofhagsand
Copy link
Member

The nested if:s is wrong, thanks, but it doesnt change the beheavior.
The hello message from Clixon should include

<capability>urn:ietf:params:netconf:base:1.1</capability>

if the following is configured (or actually left out since 1 is default):

<CLICON_NETCONF_BASE_CAPABILITY>1</CLICON_NETCONF_BASE_CAPABILITY>

This is the hello message I get with the default example:

<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="42">
   <capabilities>
     <capability>urn:ietf:params:netconf:base:1.1</capability>    <---------
     <capability>urn:ietf:params:netconf:base:1.0</capability>
     <capability>urn:ietf:params:netconf:capability:candidate:1.0</capability>
     <capability>urn:ietf:params:netconf:capability:validate:1.1</capability>
     <capability>urn:ietf:params:netconf:capability:startup:1.0</capability>
     <capability>urn:ietf:params:netconf:capability:xpath:1.0</capability>
     <capability>urn:ietf:params:netconf:capability:notification:1.0</capability>
   </capabilities>
   <session-id>1</session-id>
</hello>

Note the 1.1 capability announced. There seems to be something wrong with your setup?

@olofhagsand
Copy link
Member

Note this is announced by the clixon_netconf client. Ensure it is rebuilt

@brunorijsman
Copy link
Author

brunorijsman commented Apr 1, 2022

I figured out why my environment was "weird".

I did a "git pull" to get the greatest and latest Clixon code.

Then I did a "make" and a "sudo make install" to recompile and re-install Clixon.

This built a new version of the clixon libraries (version 5.7) but the version-number-less links for the libraries still pointed to the old 5.6 version.

-rw-r--r-- 1 root root   14576 Apr  1 10:29 libclixon_netconf.so.5.6
-rw-r--r-- 1 root root   14576 Mar 17 10:41 libclixon_netconf.so.5.7
lrwxrwxrwx 1 root root      23 Apr  1 10:29 libclixon_restconf.so -> libclixon_restconf.so.5
lrwxrwxrwx 1 root root      25 Apr  1 10:29 libclixon_restconf.so.5 -> libclixon_restconf.so.5.6

So, my plugin code was linking with the old unmodified 5.6 library.

I got around this problem by deleting all libraries and rebuilding.

We probably need some changes to the Clixon makefiles for the install target to avoid this problem.

After I worked around the problem, I am now able to connect both the MG-Soft NETCONF client and the ncclient to Clixon using NETCONF 1.1.

The capabilities are advertised as expected in the open messages, and chunking framing appears to work without any problems.

Server hello:

<?xml version="1.0" encoding="utf-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="42">
  <capabilities>
    <capability>urn:ietf:params:netconf:base:1.1</capability>
    <capability>urn:ietf:params:netconf:base:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:candidate:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:validate:1.1</capability>
    <capability>urn:ietf:params:netconf:capability:startup:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:xpath:1.0</capability>
    <capability>urn:ietf:params:netconf:capability:notification:1.0</capability>
  </capabilities>
  <session-id>4</session-id>
</hello>

Client hello (I forced MG-Soft to use NETCONF 1.1.):

<?xml version="1.0" encoding="utf-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
  <capabilities>
    <capability>urn:ietf:params:netconf:base:1.1</capability>
  </capabilities>
</hello>

However, I did notice that the client does a get-config, the server responds with namespace "urn:ietf:params:xml:ns:netconf:base:1.0" (I am not sure if that is expected or not, but I suspect not)

However, I did notice that the client does a get-config, the client (i.e. MG-Soft) uses namespace "urn:ietf:params:xml:ns:netconf:base:1.0", which looks weird to me, considering 1.1 was negotiated in the hello.

Client get-config request:

<?xml version="1.0" encoding="utf-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="7">
  <get-config>
    <source>
      <running/>
    </source>
    <filter type="subtree">
      <intf:interfaces xmlns:intf="http://remoteautonomy.com/yang-schemas/interfaces"/>
    </filter>
  </get-config>
</rpc>

Server get-config response:

<?xml version="1.0" encoding="utf-8"?>
<rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="7">
  <data>
    <interfaces xmlns="http://remoteautonomy.com/yang-schemas/interfaces">
      <interface>
        <name>eth0</name>
        <ipv4-address>1.1.1.1</ipv4-address>
      </interface>
    </interfaces>
  </data>
</rpc-reply>

@olofhagsand
Copy link
Member

I think that is OK, the namespace is separate from the capability.
More specifically, namespace "urn:ietf:params:xml:ns:netconf:base:1.0" is used for both capabilities:
NETCONF 1.0: urn:ietf:params:netconf:base:1.0
NETCONF 1.1: urn:ietf:params:netconf:base:1.1
The changed capabilty affects other things, like the chunked framing.
Does this mean there is no other issue?
If so, please close it.

@olofhagsand
Copy link
Member

So, my plugin code was linking with the old unmodified 5.6 library.
I got around this problem by deleting all libraries and rebuilding.
We probably need some changes to the Clixon makefiles for the install target to avoid this problem.

Please open a separate issue regarding the build system

@brunorijsman
Copy link
Author

Closed the issue. I verified that Clixon now correctly uses chunked framing when NETCONF 1.1 is negotiated. It can now establish a NETCONF 1.1 session with both MG-Soft NETCONF browser and ncclient. Will open a separate issue for the issue with library version numbering after a rebuild. -> did not gather enough data to accurately describe what happened exactly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants