-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drivers, at86rf2xx: cleanup #7276
Conversation
the problem is prematurely merged #7275 (IMHO) |
add simplification of |
then #7115 was |
on that I agree :> |
If the same bugfix can be done with 17 lines at the same time, that's not a style discussion anymore.
@smlng should've posted that he's onto another solution. |
sure, and I'm the one to blame here. |
I wasn't even aware there was an alternative and my priority was to get master working again. |
OK, sorry didn't want to be rude ... I was just looking into #7275 and thought, mhm maybe there is a simpler solution - but looking again it was already merged - no time for a comment left. |
Sorry, that was indeed important. |
That's why the on hardware testrunner is so important :) |
I'm still not sure how these two functions relate. This seems to chop of functionality, so please explain, what you did exactly and why? |
in #7115 @DipSwitch changed the signature of This PR enhances the fix of #7275 and also simplifies |
drivers/at86rf2xx/at86rf2xx_getset.c
Outdated
/* Wait for any state transitions to complete before forcing TRX_OFF */ | ||
do { | ||
old_state = at86rf2xx_get_status(dev); | ||
} while (old_state == AT86RF2XX_STATE_IN_PROGRESS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Can you remove the newline before this removal, too.
drivers/at86rf2xx/at86rf2xx.c
Outdated
} while (state == AT86RF2XX_STATE_BUSY_RX_AACK || | ||
state == AT86RF2XX_STATE_BUSY_TX_ARET); | ||
|
||
uint8_t state = at86rf2xx_set_state(dev, AT86RF2XX_STATE_TX_ARET_ON); | ||
if (state != AT86RF2XX_STATE_TX_ARET_ON) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the change to |
that's why I changed the title of this PR to "cleanup" ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor cosmetic changes.
drivers/at86rf2xx/at86rf2xx.c
Outdated
} while (state == AT86RF2XX_STATE_BUSY_RX_AACK || | ||
state == AT86RF2XX_STATE_BUSY_TX_ARET); | ||
|
||
uint8_t state = at86rf2xx_set_state(dev, AT86RF2XX_STATE_TX_ARET_ON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you leave the variable definition at the top of the function, please?
drivers/at86rf2xx/at86rf2xx_getset.c
Outdated
/* Wait for any state transitions to complete before forcing TRX_OFF */ | ||
do { | ||
old_state = at86rf2xx_get_status(dev); | ||
} while (old_state == AT86RF2XX_STATE_IN_PROGRESS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Can you remove the newline before this removal, too.
651b50d
to
0b940d8
Compare
comments addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully last batch of comments
@@ -18,7 +19,7 @@ | |||
* @author Thomas Eichinger <thomas.eichinger@fu-berlin.de> | |||
* @author Joakim Nohlgård <joakim.nohlgard@eistec.se> | |||
* @author Hauke Petersen <hauke.petersen@fu-berlin.de> | |||
* | |||
* @author Sebastian Meiling <s@mlng.net> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one as well. Feel free to remove
@@ -103,7 +102,7 @@ static int _send(netdev_t *netdev, const struct iovec *vector, unsigned count) | |||
at86rf2xx_tx_prepare(dev); | |||
|
|||
/* load packet data into FIFO */ | |||
for (unsigned i = 0; i < count; i++, ptr++) { | |||
for (unsigned i = 0; i < count; ++i, ++ptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the same effect, but okay :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha it seems faster though ... brains are stupid :P
@@ -19,6 +19,7 @@ | |||
* @author Hauke Petersen <hauke.petersen@fu-berlin.de> | |||
* @author Kaspar Schleiser <kaspar@schleiser.de> | |||
* @author Kévin Roussel <Kevin.Roussel@inria.fr> | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change
* @{ | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated changes?
@kaspar030 Murdock complains about:
|
Not anymore, see #7953. |
I guess I need ☕️ - overlooking so many things here ... btw. last comments addressed |
Do it like we do. Drink more 🍺 instead of ☕ :P |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
Please squash |
- update documentation - adapt constness - fix code alignment and stlye
64a0f07
to
13639c1
Compare
can anybody make some tests, too, before merging this - it seems to work, but would be great to have someone verify this, too |
Tested on
all worked as expected => go |
at86rf2xx: revert regression introduced in #7276
Sorry for the some what late reply... @smlng
I wanted to clarify that changing the signature was actually part of the fix. There was a race condition between the get_state call and the set state call. The action (state changing) by itself was not atomic. So the driver state was able to change between the get and set call. But now I'm thinking about it, the windows closed from 20 instructions to 2 or so. Not sure and would need to reread the code completely. The problem I tried to fix had something to do that multiple threads and interrupts wanting to change the driver state. (This bug is because of that depended on MCU speed, transmission speeds and response times) It can only be tested by... actually testing all possible implementations ever going to be made. Because the checking logic was splited, By triggering the RX interrupt, it can enter the receive state. Where this is canceled because old_state will not be updated before changing the state. (This explains the possible packet loss you were experiencing) This can probably only be fixed by a mutex around the the main state (sleep, idle, tx, RX) since interrupts cannot be disable around the while loop.. another solution might be: every interrupt queues a message WITOUTH any processing. And all the logic in the IRQ handler is moved to the thread loop of the driver... Sorry for not finding the time to clarify this when it was requested. There where some other things needing more attention. |
Async / multi-threaded problems are quite hard to explain ... Ever ... Haha This part is back in the thread previously wanting to change the state. |
It is possibly also the reason that it only affected very few people. My MCU was running at 36MHz and was using the AT86RT212B (sub-gig) when doing full speed/throughput tests where 3 nodes responded to the same multicast message send by the master. Where the master waited for all the responses within a given windows. I had options to change the general wait time and response times. All 4 devices where connected through our AT command interface. (Self made, not the RIOT-OS based one) The devices where connected to a PC with an USB to serial converter to the same application. The application did the actual control of the master and the slave and checking if everything got from one end to another. The more slave nodes that where added to the test, the faster one dead-locked, the second one less fast, etc. etc. etc. the failure rate looked quadratic. After my fix I was able to send over maybe 20 million packets where there was first a possibility one of the slave nodes dead-locking between 0 and 4 million packets ... With the other information I've just read you all found. I think in my case the MCU always hanged in a wait lock before allowing the state change. However due to another node trying to run the exact same function (claiming air time) at the exact same time (one backing off by the CSMA/CA) some waired stuff was happening... "Best explanation ever +1" |
When using the queue implementation it may probably only be queued once. Because I thing every status change is an interrupt. However reading it once actually handles all interrupts. (So any queued interrupts still in the queue are false and must be flushed). Or add a pointer to some counter in the data pointer (assuming adding the same message twice to the queue is prohibited by the queue) Than there is also another possible dead-lock there. Because the read of the status takes time. If during the read the interrupt is triggered. Has it been read or is there a new one? The only solution would be to read the status while you actually know there is no new status. If the read has finished and there is not a message queued your save to go... |
I hope however I'm wrong here because that's is really not a good solution... :D Say the EXTI IRQ priority is lower than the SPI priority.. the MCU runs slow. The SPI interrupt is never left during transfer. The IRQ is received after the complete transfer. If something else has a higher priority. The data could already be received but the SPI IRQ is pending the EXTI is received assuming it is a high priority.. (here I think it's possible you loss the ability to know which one came first or last) I think if you play long enough with these terms one will give a possible missed EXTI irq as well.. so the double read is required... Even though one could technically already have been performed. Due to none real time interrupt receiving. This could be guarded by fixing EXTI as priority 0/1 system timer as priority 0 data irqs lower than 3 or whatever... |
I would strongly suggest to first fix this bug before trying to fix bugs in other code using the driver. Possible problems with the TCP stack could just as easily lay here... Since TCP for example would ACK right away. If you have a line (bus) topology where one node is 2 hops away and one 3 hops away, and both would receive a packet after each other first the 3th node than the 2nd this bug might dead-lock the 2nd node because it would try to handle 3 to 4 packets at the same time (during routing to the 3th node, receiving it's own packet, and sending it's own ACK and receiving the ACK of the 3th node). |
[originally - but now not anymore -] much simpler fix than #7275 , while still using the redefined
at86rf2xx_set_state(...)
function of #7115.[edit] plus avoiding the redundant
do {} while
...