Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: lwm2m: Add support for blockwise GET and FETCH #61858

Merged
merged 8 commits into from
Sep 27, 2023

Conversation

SeppoTakalo
Copy link
Collaborator

@SeppoTakalo SeppoTakalo commented Aug 24, 2023

This PR contains chain of changes to allow CoAP blockwise support for GET and FETCH messages using mechanisms introduced in #54761.

However, the PR mentioned was only working for LwM2M SEND and registration messages. Both which are initiated by LwM2M client and are CoAP queries (CoAP CON).

To support GET and FETCH (Composite READ) I need to add mechanism not to free the response packet until last block is send out, or any other query starts.

I also found a bug in SenML-CBOR content format that did not parse Composite-GET path correctly.

Opaque payload format is now separated onto its own, and not considered as a part of Plain-Text format. This allows follow up development where we can support stateless Blockwise-GET into single binary resources without keeping the full content in the memory. I did not implement this yet, is it would require API changes into the read-callback.

Also, it is notable that I use CoAP Token to find an ongoing blockwise-transfer. This seem to work OK with AVSystem Coiote server. For Leshan this requires COAP.BLOCKWISE_REUSE_TOKEN=true option to work.
There seem to be discussion regarding Leshan/Californium choice of using different token:

Working GET query using the same Token against Coiote:

Screenshot from 2023-08-24 17-16-18

@SeppoTakalo
Copy link
Collaborator Author

@LukasWoodtli please review also. You wrote the initial blockwise-send implementation.

@SeppoTakalo
Copy link
Collaborator Author

For those interested: Here is a Wireshark capture of blockwise GET (LwM2M read), Blockwise FETCH (LwM2M composite read), and blockwise POST (LwM2M send)

Screenshot from 2023-08-25 14-03-33
2023-08-25 Leshan blockwise working.zip

For the testing, I modified BinaryApp container to have 5 kB of buffer in resource 19/0/0/0.

@rlubos
Copy link
Contributor

rlubos commented Aug 25, 2023

Also, it is notable that I use CoAP Token to find an ongoing blockwise-transfer. This seem to work OK with AVSystem Coiote server. For Leshan this requires COAP.BLOCKWISE_REUSE_TOKEN=true option to work.

After doing some reading (thanks for the wireshark log) it seems to me that indeed there is some gap between Block transfer RFC and FETCH/PATCH RFC.

RFC7959 indeed says:

(Note that the choice of token 0xfc in this example is arbitrary;
tokens are just shown in this example to illustrate that the requests
for additional blocks cannot make use of the token of the Observation
relationship. As a general comment on tokens, there is no other
mention of tokens in this document, as block-wise transfers handle
tokens like any other CoAP exchange. As usual, the client is free to
choose tokens for each exchange as it likes.)

Which to my understanding leaves the option to change the token for each exchange. This may work fine for block transfers of individual resource, as that URI is retransmitted in each request, so one could identify the block context based on this information. But for FETCH? I see in the logs that the payload (carrying information about requested resources) is only sent in the initial block request. But for the consecutive blocks, we only have CoAP header, and w/o URI options there is no other way to identify the request other than the token.

If GET comes with different Token, we could generate a new content for it, but how do you ensure that content is the same than from previous query? In my opinion they have introduced a bug into Californium.

If it was only about the "freshness" of the content, I guess this could be handled by the Etag option. But with FETCH I think it's actually worse than you describe - if you get a request for consecutive block with a different token, there is literally no way to tell what the content should be (i. e. what resources to include in the reply).

Copy link
Contributor

@rlubos rlubos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

subsys/net/lib/lwm2m/lwm2m_rw_opaque.c Outdated Show resolved Hide resolved
subsys/net/lib/lwm2m/lwm2m_rw_opaque.c Outdated Show resolved Hide resolved
return -ENOENT;
}

if (msg->path.level < 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand this check anyway, wouldn't lwm2m_engine_get_create_res_inst() fail if the path level was lower than LWM2M_PATH_LEVEL_RESOURCE?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy&paste from existing rw_plaintext.c. I don't know the logic behind it, so I did not want to touch it.

subsys/net/lib/lwm2m/lwm2m_obj_binaryappdata.c Outdated Show resolved Hide resolved
@LukasWoodtli
Copy link

Also, it is notable that I use CoAP Token to find an ongoing blockwise-transfer. This seem to work OK with AVSystem Coiote server. For Leshan this requires COAP.BLOCKWISE_REUSE_TOKEN=true option to work.

This has confused me for a while now. It seems that, using the token to match blocks that belong to the same message, is not intended by CoAP. But there doesn't seem to be a commonly accepted solution for that.

Copy link

@LukasWoodtli LukasWoodtli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

/* Check for block transfer */
block_opt = coap_get_option_int(msg->in.in_cpkt, COAP_OPTION_BLOCK1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use coap_get_block1_option here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code is not written by me. Just moved from https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/lib/lwm2m/lwm2m_message_handling.c#L2253C1-L2312

So I did not refactor it.

}

/* Handle blockwise 1 (Part 2): Append BLOCK1 option / free context */
if (block_ctx) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about error checking? r could be overwritten by coap_append_block1_option(msg->out.out_cpkt, &block_ctx->ctx); further down without being checked for an error.

subsys/net/lib/lwm2m/lwm2m_message_handling.c Outdated Show resolved Hide resolved
@@ -2667,11 +2745,23 @@ void lwm2m_udp_receive(struct lwm2m_ctx *client_ctx, uint8_t *buf, uint16_t buf_
lwm2m_reset_message(msg, true);
}

clear_ongoing_block_tx();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are multiple parallel block-wise transfers in progress. Aren't they all cleared here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional.
I don't believe any server would do parallel block-wise transfers with a single client.

I cannot be sure when the server is finished with the blockwise GET because I cannot keep any timeout. Server is supposed to retry. So I drop all ongoing block transfers when I receive a request that is not part of any ongoing operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bugs me whether we should differentiate between client-initiated BLOCK1 transfers (LwM2M Send) and server-initiated BLOCK2 transfers (LwM2M Get/Fetch). I think there is no ambiguity between those two. And right now it seems that we'd abort an ongoing block Send on reception of any (even non-block) request/response from the server. And conversely, the application can hijack any ongoing BLOCK2 transfer when calling lwm2m_send_cb().

OFC we can agree on such a limitation, though it doesn't seem too difficult to avoid that.

@boaks
Copy link

boaks commented Sep 6, 2023

In my opinion they have introduced a bug into Californium.

No, obvious an implementation, which requires to reuse tokens in a blockwise transfer is not compliant.
Not sure, if this PR is fixing this requirement or rather manifest the not compliant token reuse.
Anyway, it's your decision.

@chrysn
Copy link

chrysn commented Sep 6, 2023

using the token to match blocks that belong to the same message, is not intended by CoAP

It is not. Requests containing blocks of the same operation are identified by having the same set of options (except obviously the block options), not their tokens.

If this relies on the tokens to be common, it will only work with selected CoAP libraries that make this particular choice in their requests (such as Californium with the peculiar COAP.BLOCKWISE_REUSE_TOKEN=true option), but will not work with general CoAP libraries.

There is ongoing discussion in the CoRE working group (in its public meetings, possibly on its mailing list and now also in core-wg/corrclar#29) about enhancing things in block-wise, but to my knowledge there is no good reason to use this behavior -- moreover, all participants I've spoken to in those disucssions were clear that relying on the token is not how it should be done, some being as explicit as stating that "if people start relying on Californium producing identical tokens, they are effectively forking the CoAP ecosystem".

I'm not too deeply familiar with LwM2M, and only used Zephyr once or twice, so please cut me some slack in understanding the full changes here. But please, if you think you need identical tokens, let's discuss why you need them (here or over in core-wg/corrclar#29), and let's work out how this is done in a way that it works with all CoAP peers.

@boaks
Copy link

boaks commented Sep 6, 2023

To be clear: The intention of

COAP.BLOCKWISE_REUSE_TOKEN=true

is not to manifest that wrong usage. It's about providing a work-around for the meantime until an implementation complies to RFC7959.

@SeppoTakalo SeppoTakalo force-pushed the lwm2m_opaque_get branch 2 times, most recently from 0ab2699 to e06b06e Compare September 8, 2023 11:29
@SeppoTakalo
Copy link
Collaborator Author

Rebased the implementation.

I also refactored the find_ongoin_block_tx() so that it does not use tokens anymore. In fact, it does no use anything anymore. This is because as discussed here and various other discussion chains, tokens should not be used to match BLOCK queries to ongoing transactions. For that purposes RFC9175 Request-Tag should be used, but clients don't seem to widely use those. So there is no compliant 100% sure and standard compliant method to support concurrent transactions and I therefore decided not to bother supporting that.
In reality, LwM2M servers don't probably even try using multiple transactions so this should be OK.

Just to be on a safe side, I added a support for using Etag option so a client can notice if the content have changed between blocks. As a Etag, I just use sys_hash32() of the payload.

@chrysn
Copy link

chrysn commented Sep 8, 2023

Thanks for the update, both on the token and the ETag side.

For that purposes RFC9175 Request-Tag should be used.

To clarify: To a server, Request-Tag is not special. If the client used option 65316 (from the experimental range, and has the same relevant bits set) instead of Request-Tag, everything Request-Tag is there for must keep working just the same.
And it's OK if the server rarely sees a Request-Tag, clients only need to set one (or use some other option) if they intend to do concurrent transactions (or have some other reasons).

The ETag is good and necessary even without (intentionally) concurrent requests.

@SeppoTakalo
Copy link
Collaborator Author

Thanks @chrysn and @boaks for pointing out the obvious misunderstanding of using the CoAP token. This way we keep different open source implementations to interoperate.

if (ret < 0) {
return ret;
}
sys_slist_append(&ongoing_block_sends, &msg->node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're effectively pushing the same sys_snode_t msg->node to two lists, ongoing_block_sends here and pending_sends there. This doesn't seem correct.
Since we only support a single outgoing BLOCK2 transfer, perhaps just store the pointer (exclusively for BLOCK2 transfers)?

@@ -2667,11 +2745,23 @@ void lwm2m_udp_receive(struct lwm2m_ctx *client_ctx, uint8_t *buf, uint16_t buf_
lwm2m_reset_message(msg, true);
}

clear_ongoing_block_tx();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bugs me whether we should differentiate between client-initiated BLOCK1 transfers (LwM2M Send) and server-initiated BLOCK2 transfers (LwM2M Get/Fetch). I think there is no ambiguity between those two. And right now it seems that we'd abort an ongoing block Send on reception of any (even non-block) request/response from the server. And conversely, the application can hijack any ongoing BLOCK2 transfer when calling lwm2m_send_cb().

OFC we can agree on such a limitation, though it doesn't seem too difficult to avoid that.

Partial content format support is required to have a proper support
for content format OPAQUE instead of threading it as a part of
plain text format.

Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
Opaque content format is not part of clear-text, so it
should be separated into its own file.

Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
In reality, single-write is the only operation that handles
BLOCK1 operations when receiving paylod.

Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
lwm2m_udp_receive() is only called with same function
pointer, so there no need to carry that in the parameter.

Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
Allow blockwise-send buffers to be used with GET and FETCH
queries as well.
When outgoing packet is split into multiple blocks, don't free
it when first block is send. Keep it in memory until some other requests
come.

Following queries to next block are matched using CoAP token.
However, this required Leshan to use COAP.BLOCKWISE_REUSE_TOKEN=true
option from Californium.

Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
Composite read was incorrectly trying to parse CoAP packet
instead of payload of the packet.

Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
To protect the integrity of outgoing block-wise transfers, append
Etag option that allows client to see if the received block is
generated from same content as it is expecting.

Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
Previously each piece of LwM2M SEND was
using token length of zero. I think this was unintentional.

Signed-off-by: Seppo Takalo <seppo.takalo@nordicsemi.no>
@SeppoTakalo
Copy link
Collaborator Author

@rlubos I refactored the PR so that BLOCK2 sends are the only ones that I keep the pointer for. And there is only one ongoing Block2 at a time.

When we receive any incoming request, we clear the previous block2 transfer. There is no timeout for this. As long as the server keeps asking for blocks, we keep sending.

Block1 transfers (LwM2M SEND and registration) are handled differently. This is the code that Lukas did earlier. CoAP timeouts or ACK on the final block will clear all the state. This is not kept in the same list as Block2. Ending of Block1 does not anymore cause clearing of Block2 transfers.

Now all block2 functions are clearly named find_ongoing_block2_tx(), clear_ongoing_block2_tx() and handle_ongoing_block2_tx() so its easy to separate from Block1 functions.

Now it looks like Block1 and Block2 can happen same time
Screenshot from 2023-09-21 16-39-24

They don't collide, because on is request(CON) and another is ACK.

Copy link
Contributor

@rlubos rlubos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, no more concerns from my side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants