-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configuration to disable reuse of tokens on blockwise transfer. #2088
Conversation
Improve protection from delay attacks, if no other means, maybe on application level, are available. Signed-off-by: Achim Kraus <achim.kraus@cloudcoap.net>
This PR is changing the behavior of Californium within RFC7959. Californium was using the same token for blockwise transfers in order to ease traceability. attacks-on-coap shows the downside of this. Even if for ACKs (piggy-backed-responses) Californium also uses the MID, using different tokens is the most common practice. If someone detects trouble with it, it's possible to switch it back using "COAP.BLOCKWISE_REUSE_TOKEN=true" |
I am not an CoAP expert, but I think this change has introduced problems, without fixing the potential repply attack. In summary:I claim that this change broke Blockwise transfers that might require you to maintain a state. I think this would be the case if the targeted resource might change between queries. If token is not kept the same, you cannot ensure integrity of the content as you cannot distinct between queries. Etag option would ensure that you have received a same representation of the content, but you cannot really use that in the GET query as it is generated by the server. ResoningI believe this change is meant to fix potential Response Delay and Mismatch Attack described in Attacks on CoAP page. However, I think the approach to fix the issue is wrong. As explained in the link above, the attack requires that attacker can guess the re-use of the token, in order to repply. And to fix the issue, you should use better random source to generate tokens, which I believe is not an issue here. But this change did not do anything with token generations. So if there was issue, it is still there. (I'm not suggesting there was a problem). In blockwise transfer, if the server is implemented in stateless manner, this approach here is working. However, each query of a next block, have now no relationship to a previous query. They are completely independent. So even if server is implemented to maintain a state it cannot work anymore. I'll explain why this is a problem in a bit.. In RFC7959: Section 3.4 there are few examples of usage of tokens. If you follow the example after "Retrieval of remaining blocks" note in the first sequence diagram, you notice that GET queries of remaining block keeps using the same token. It is not very clear there in RFC, but my assumption here is, that as long as you are requesting pieces of the same block, you should retain the same token. If you use the same token for the whole blockwise transmission, it does not danger you to the reply attack, because that token exist only once. You only query one block once using the same token. Then next block uses the same token, but different block number. If your token algorithm works correctly, you are not going to use same token ever when requesting the same block for the same resource. You you do repply a packet during the blockwise transaction, the requesting client will just deal that block as a dublicate. I have an example in my mind why this token rotation is problematic during the blockwise. Lets imagine that we have a resource called "/camera". Every time you do a GET request for it, it gives you a still image. But every request give a new still image from a live feed. So how can you do a blockwise transfer of a image, if you cannot tell which requests belong together? Every block would be a block from a new image, not from the same that started the transaction. So, this token-rotation only works with stateles servers. I breaks things if there is state involved. The problem I'm facing, is LwM2M related. I'm using Leshan server which uses this library and after an update, things started to break. In LwM2M, you could send GET requests that might result more that one resource to be packet into a payload. If, for example TLS, SenML JSON/CBOR are used, the payload might contain lots of resources. This means that when first request arrives, the payload is formed, and kept in the memory until all is send. Now if we do this stateless, it means that on every request we form a new paylod, split it, and send one block of it. How can you ensure that all the blocks are from similar payload? For example, if one of the resource is a "timestamp" that changes value on every second (or ms). Then every time you form a payload, it will be different than last time. Last problem, which clearly broke Leshan, is that after this change, LwM2M Composite-Read does not work anymore. In composite-read, the payload of GET request contains a list of resources that should be put into the response payload. In next query, Leshan is not sending the paylod anymore, just the indicator to get Block N=1. But if your token changes, there is no indication that where your request should be targeted? |
Maybe just as very first information:
I will try to provide an answer for the points in your issue, but I will need some time. |
OK, that beams back ... as I already wrote, with the years a lot of approaches have been implemented, but they change the trade-offs chosen in RFC7959. Same approaches have also be applied to RFC7641, when some want to use CON notifies to achieve a "value stream without gaps". In my sum: that may be all very nice ideas, but the place to discus is the IETF core mailing-list. Alternatively, we added such stuff with a configuration flag, e.g. BLOCKWISE_STRICT_BLOCK1_OPTION and BLOCKWISE_STRICT_BLOCK2_OPTION. To your claim: RFC7959 - 2.4 Using the Block2 Option bottom of the page:
Basically, that means, if the resource is changing, the etag is changing. Old transfers are detected by the changed etag and are canceled and the new presentation of the resource gets available for download. That affects also RFC7641. Alternative definitions would cause the server to allocate (much?) more memory. If I remember well, it's now a coupe of years, when Californium (miss)used the etag for "multiple concurrently". We stopped with that (miss)used. There is no configuration value available to enable that again. |
This shows not a GET block2 transfer, instead it shows a observe & get. The GET uses token 0xfb, and so the first response (etag 6f00f38e). The next response/notify (etag 6f00f392) also uses that token 0xfb, as defined by RFC7641. But the follow-up request for that notify are then using a different token 0xfc. Finally the RFC mentions:
If the client is free to choose tokens, a server can't bind state to it! |
Just to ensure, the token is the root-cause, could you please test that with Did you already open an issue in Leshan (I haven't found that), maybe changing the default value specific for the lwm2m application helps.
That's not defined by the token, that's defined by the the clients-identity and the coap-options, mainly the uri and uri-query. There is a second PR #2161, which adds the message code also to that. |
@SeppoTakalo I confirm this change in behavior breaks Zephyr LWM2M clients and Leshan, I solved it 5 months ago by configuring BLOCKWISE_REUSE_TOKEN to true on my servers. I should have opened a zephyr ticket or sent a discord message |
I'm wondering. As I wrote in the leshan issue: Do you remember the case, when you observed the error? |
I just take a quick look at this, so I could have missed a lot. But concerning :
Note that Composite-Read is a FETCH on So there is maybe a potential issue here.
And for more fun, FETCH can also be used with observe. (see : OpenMobileAlliance/OMA_LwM2M_for_Developers#528) |
Yeep, some of those questions are already pending on IETF Constrained Application Protocol (CoAP): Corrections and Clarifications. If you like, I will create a PR for leshan to change the default for the token reuse in order to keep the old behavior until the IETF Core clarifies the usage of the token. |
FYI openthread is not affected: openthread/openthread#7976 |
My guess : we found some issue mainly with FETCH and blockwise which could take long time before to be solve. So I guess the question are :
(changing or not the default behavior in Leshan depends a lot of answers to questions above ☝️, so we can see that after 🙂 ) |
I'm aware of those corrclar issues. FMPOV, if the client is defined to be free to choose the token, then no server state could be related to. If some new RFC seems to be undefined, I would rather wait on their clarification before manifest, the choose a token is required in order to have some of the new stuff working. I think, it doesn't help, if that is done in advance of the answer. I will try to contact Jon from libcoap in order to get his opinion, maybe also Olaf (libcoap). |
I have been alerted to the existence of this discussion. |
If RFC 9175 solves the issue, I think LwM2M needs to consider that. |
FMPOV, the question is, who needs it, and who will implement it. The misuse of the token, caused by the behavior of Californium before 3.8 is already implemented. |
Before 3.8 californium was reusing the token. (This is not mandatory but I guess this is not forbidden too) It seems some implementation decide to use that token to identify the "block exchange".
But since californium 3.8, user can choose to reuse token or not, using And visibly this breaks some devices. So the question is not really about "implementing our own protocol". So maybe before to decide if we should change or not the default behavior, we should identify the real problem with |
I would prefer to move the discussion to a separate issue to have it more transparent for others.
Yes, but it should always be clear, that a client is free to do so. If a coap-server implementation breaks using different tokens it's not a compliant implementation.
FMPOV, it breaks only not compliant devices. To keep not compliant devices running, would makes it hard to change anything. What I usually try is to add a configuration, to switch back to the old behavior, as here.
agreed. Unfortunately, I'm neither common with FETCH nor the Zephyr LwM2M client.
Californium sends the payload of the fetch only for the first request. For none-block1 case it will be not too hard to change that. But that will increase the data volume. So, if it goes in that direction, I guess we need the next configuration flag ;-): As the discussion send payload in block2 request already pointed out, there is an conflict of "stateless FETCH" an "reducing the data volume". And with block1 it get's even more complicated. So, is the Zephyr implementation really a stateless one? Are the users there really willing to spend more data? Because, if the Zephyr implementation isn't stateless nor the users are willing to spend more data, this will not be an solution either. |
I understand that too.
FMPOV, This is what we need to clarify, if this is true. There is nothing to change in Californium OR Leshan.
I agree
Some thought about it, from Leshan Project : https://github.com/eclipse-leshan/leshan/wiki/How-Leshan-should-behave-with-Non-Compliant-Implementations-%3F |
On 28. Aug 2023, at 11:14, Simon ***@***.***> wrote:
It seems some implementation decide to use that token to identify the "block exchange”.
The problem is that this deviant behavior is not only not interoperable, it creates an incentive for other implementations to cater to this behavior (which is essentially the discussion we are having here for Californium), effectively forking the CoAP protocol.
There are good reasons to limit token re-use (see Section 4.2 of RFC 9175), so clients need to be free to implement appropriate strategies; nurturing implementations that get in the way of that is not a good idea.
To the contrary, to avoid turning CoAP into soup, we should be deliberately *not* cater for such implementations, to create pressure that the deviant behavior gets fixed there.
Grüße, Carsten
|
If that is a block2 blockwise transfer, the RFC7959 already states, that this is not supported. As I cited above: RFC7959 - 2.4 Using the Block2 Option bottom of the page:
So, if the ReadComposites are addressing the same resource, then concurrent operation is not supported. The term |
I have some doubt that maybe this could be interpreted as URI + payload. So reading RFC8132§2. FETCH Method a bit more , I guess that same resource probably also means same URI (but not crystal clear to me), e.g. :
|
The term is RFC7959 and that's the base for the implementation decision. RFC8132 was later and never really considered. We added mainly just the message codes and some methods to override the fetch operation. FMPOV, it's still in discussion. Once that discussion concluded, the implementation will be adapted. |
I think we have 5 underlying activities here which are causing some confusion when overlapped. Tokens
It appears that the (non-compliant) server is effectively including the Token in its 'cache-key' to determine which response data to send back when there are concurrent requests. I think it is dangerous for a server to do (assume) that as we have seen with a client that varies the Token. Having the same token to get all of the body for the multiple Block2 payloads can be useful diagnosing Block2 issues and is valid. libcoap does this with "Base Token + left shifted block NUM" to keep the tokens unique across a multiple Block2 transfer. Request-Tag RFC9175With this sent in a request (different between each new request to a resource), as it is a part of the server's 'cache-key', concurrent responses are supported and the server can differentiate between which resource response is required, rather than the unsafe method of using the Token as part of the 'cache-key'. There are discussions about whether Request-Tag should be sent in all requests or not. core-wg/corrclar#28 FETCH RFC8132 Block2 onlyThis adds in an extra level of complexity when looking at how to handle Block2 responses when compared to GET as the data payload of the request dictates what the server responds with. Two effectively concurrent requests to the same resource with different data payloads which gives 2 different responses starts to get messy when Block2 responses are needed. There is debate over whether the data payload should be included with each FETCH request for the next Block2 response. core-wg/corrclar#27 FETCH RFC8132 Block1 and Block2The main debate here is whether the entire Block1 sequence needs to be repeated when requesting the next Block2 from the server. core-wg/corrclar#28 ETagIf ETag is used in a response and the response changes over time, then the ETag needs to be different whenever the response changes. The client can then detect the change and process the new response accordingly (if the ETag changes during a Block2 transfer, it is likely that the client will need to re-request the information to get the new updated body). ETag cannot be used in a request as a differentiator between concurrent responses as it is likely to solicit a 2.03 response instead of a 2.05 response. Request-Tag may fail here when a client client tries to retrieve remaining blocks of a Block2 transfer accessing data that has now changed, unless the server has cached the previous data. |
@mrdeep1 thx for that clarification. |
PFC 2252 = RFC 7252? |
Usage of RFC9175: Are you still interested? Could you provide the infos about the zephyr implementation? |
Usage of RFC9175 : It solves the concurrent access issue. |
Indeed, if we want to enable clients to do a FETCH without sending the entire request body again for every block, the server needs to keep state, corrclar 27/28. (This gets ridiculous when Block1 is used with Block2, but still is an issue for Block2 only.) Blockwise tries to be open both to stateless servers and servers that want to keep state; so if the client behavior needs to depend on which of these is the case, we'd need additional signaling. |
Just to be sure, you mean this is ridiculous to re-send all payload using block1 for each block2 request, right ? (core-wg/corrclar#28) |
This additional signaling should ideally be part of RFC8132, right ? (I mean we should not need additional RFC ?) |
For stateful servers, subsequent FETCH requests for the next Block2 do not need the FETCH data as the server can build a cache-key that contains the cachable options that points to the appropriate data that needs a Block2 slice returned. BUT, if there is a chance that there are multiple concurrent FETCH to the same resource, but with different FETCH data, then the request will need a Request-Tag per different FETCH data so that the server can include this in the cache-key to differentiate which data-set should be used for getting the appropriate slice. For non-stateful servers, the entire FETCH request needs to be repeated so the server can re-generate the data and then send back the appropriate slice of the data based on the block size and requested block number. This includes any Block1 set of transfers handling the FETCH data. The server will need to maintain some sort of state during the Block1 transfers to be able to assemble the entire FETCH data (note that if there are Block1 transfers and there is any chance of concurrent FETCH, then you must use Request-Tag to differentiate between the discrete FETCHes so server can correctly assemble the Block1s during the transitional state assemble phase). Note that if Observe is being used, the server will need to maintain something to be able to generate any unsolicited responses, and that if the client wants to explicitly de-register the Observe, this has to be the original register FETCH request (including data), with just the original Observe option updated to de-register. |
Agreed. the client has no way of knowing that the server is stateless or not unless there is some sort of out of band knowledge, or the server signals (new/bis RFC) its capabilities. |
Well, RFC 8132 is published, so unless we can find the signaling in there or in another published RFC, we'll need to do something. I'll bring this up in today's CoRE WG Interim meeting. |
I don't want to dogpile on here (as cabo already made the good points, and I hope that this option stays off by default, so that breakage is not silent), but I'd like to weigh in on the Zephyr side, providing more evidence points that their implementation does break when interacting with other implementations. I failed to find a reference to Zephyr's issue tracker, where the reliance on tokens in block-wise is discussed. Could you give me a pointer? |
In the same way that Extended Tokens (RFC8974 Section 2.2 Discovering Support) and Q-Block (RFC9177 Section 4.1 Properties of the Q-Block1 and Q-Block2 Options) test for functionality support in the server, I guess we could suggest something like (for stateless FETCH support) on the client
|
That sounds a good possible way. |
As Request-Tag has been thrown around here as a solution to what apparently was done with token matching before: A Request-Tag will only ever distinguish two requests. If two requests don't match in their block key (eg. have a different Uri-Path), the request tags will not magically make them match. |
Yep.
Everything else, details of the RFC 8132 or contribution of RFC9175 to a solution of this issue, is for me out of the scope of discussing the re-use of tokens in a blockwise transfer. If that stuff requires attention, then I think, a separate issue (as #2168) makes that work easier. Thanks to all, who contributed their knowledge, experience and opinions. |
@boaks, about LWM2M (not Zephir), I tried to answer over the various discussions we had about it but I can try to summarize here :
If you want to know more, there is not too much to read about it in LWM2M specification :
IF you don't know so much about LWM2M, you should at least try to understand :
For people we know a bit about LWM2M, this looks like : You send a FETCH on
Then you can get an answer containing LWM2M nodes value in the payload, e,g. using ct=SENML-JSON :
In this example, we only ask for LWM2M Single resource but you can ask for object, object instance, any kind of resource, resource instance or even the whole LWM2M object tree using `"n":"/") About Request-Tag, I just see that since LWM2M v1.1.x, the specification says :
|
Sure, my point is more: |
For information, the related Zephyr PR is here |
I have been away for a week, so I did not find time to comment. Sorry about that. Thank you for all the valuable information here. It definitely looks like this falls in between not so properly defined behaviour of CoAP. However, I still don't think its use is properly defined, and I'm mostly considering LwM2M usecase here, when the response payload might be generated, and each time it is generated it might be different. So on most cases, I need to maintain a state so I can ensure the integrity of the payload.
So clearly whoever initiates a GET query must already know that response might be a block-wise transfer and append the Request-Tag option. Or (worst case and against recommendation) append a Request-Tag into all queries. Otherwise, if you send a normal GET and the server splits it into multiple blocks and sends BLOCK N=0, then what do you do for the next block query? Do you generate a Request-Tag or not? If you generate a Request-Tag, this is actually a new transfer and you need to start block block N=0. When does Californium add the Request-Tag into queries? Or is it implemented? As I'm one of the contributor to Zephyr's LwM2M client, I can alter the behavior here. But going through this long discussion have not really helped. You seem to have concluded that re-using the token is wrong, but what should I then use to match the following GET query to a previous one, if I need to maintain a state? |
To accurately determine which specific body of data you want to get from the server, the request has to (at a minimum) include the Uri-Path and Request-Tag options so that the server can use the Request-Tag to differentiate requests to the same URI (resource) and use the Block2 option to get the appropriate slice. When or when not to add in the Request-Tag option on the request (or a way of signalling that a Request-Tag is required) is currently a subject of debate. Certainly if the request has Block1 or Block2 options, then a Request-Tag option needs to be added. Otherwise, currently, you have to send a Request-Tag with every request, or re-request using Request-Tag on detecting a response has used Block2. |
I'm still not sure, why it is considered, that a lwm2m-server uses concurrent GET/FETCH operations (Composite-Read) for a single lwm2m-client. From my point of view, there is not that much benefit in that. If the server finishes the blockwise transfer before starting the next, the current RFC 7959 works pretty well. Using ETAG also ensures, that if the resource is changing during the transfer, the client gets noticed about that. |
My understanding,
If no If (In all case, FETCH doesn't work with stateless implementation with current RFC states) So unless I missed something request-tag will not solve your issue unless you really want to support concurrent block transfer.
Not implemented and It seems to be not planned : #2174
This seems to be the simple choice and the choice made by libcoap : "For libcoap, I took the decision that by default, Request-Tag is sent with every request (even if Block1 is not being used and Block2 was not defined) "just in case" there is a Block2 sized response. The client application can however disable the CoAP stack doing this, only sending (done by CoAP stack) Request-Tag if Block1 or Block2 were defined." But I agree this sounds not recommended by the RFC :
Even if that should work because :
I guess this is another working possible option, but I can understand that at first sight this could be considered as not ideal. |
It's not implemented. For me concurrent blockwise transfers have no general benefit. At least for now, Californium is unfortunately not a AI, which implements the stuff by importing RFCs on itself ;-). So it requires some contribution. In the years ago I had a paid job for doing so, since a year this is not longer the case. I currently try to limit the invested free time in open source to 4h a week. That's eaten up by answering questions and providing bugfixes. |
(Trying to only add what has not been said) All the CoAP server can do to protect the integrity of the payload is to set an ETag (eg. a hash of the full body). It is the CoAP client that must then verify that all the ETags match, and will thus verify the integrity. Unless there are concurrent requests (in which case the client will set a Request-Tag), the server that receives a GET for FETCH request for Block2:1/-/.. (i.e. a non-initial one) does not know which "instance" of the larger request this comes for. But it doesn't have to: It will just pick the latest one it has (for the given client and the given set of non-block options), trusting that the client really doesn't do concurrent requests. If the client did try concurrent requests, it hopefully at least fails when checking the ETags -- there's only so much the server can do here. How long the server keeps that "instance" around will depend on its capabilities, but at any rate, the lookup criterion are the client's address and the relevant options. (7959 had no precise definition of "relevant" options; 9175 calls the criterion being "matchable"). |
Improve protection from delay attacks, if no other means, maybe on application level, are available.
Signed-off-by: Achim Kraus achim.kraus@cloudcoap.net