Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMIC asserts after too long trying to join #1

Closed
terrillmoore opened this issue Apr 1, 2017 · 3 comments
Closed

LMIC asserts after too long trying to join #1

terrillmoore opened this issue Apr 1, 2017 · 3 comments

Comments

@terrillmoore
Copy link
Member

If you run the lmic code for long enough while configured for US915 without being able to join, it will assert:
FAILURE {somepath}/src/lmic/radio.c:452

Analysis of the code shows that it's failing because radio.c thinks it's been asked to transmit on FSK, and things are not set up for FSK (since there's no FSK here).

The join code continually lowers the data rate on every join failure.

I speculate that lowerDR() (from lorabase.h) is getting fooled by the data-rate lowering code into choosing an invalid/untested path, due to something from the conditional compiles.

@terrillmoore
Copy link
Member Author

terrillmoore commented Apr 1, 2017

The problem appears to be that the lmic code increments LMIC.txCnt, and doesn't clear it until there's a success. This is OK, except for the following line in the US915 code:

s1_t dr = DR_SF7 - ++LMIC.txCnt;
if( dr < DR_SF10 ) {
    dr = DR_SF10;
    failed = 1; // All DR exhausted - signal failed
}

Since LMIC.txCnt is not reset during a join, and is therefore growing indefinitely, sooner or later it will overflow a signed subtract -- and indeed, the failure happens after 128 prints of "EV_JOIN_FAILED". Furthermore, this explains the fairly random choice of CRs and sporadic prints of EV_JOIN_FAILED after the first one.

The solution appears to be to duplicate some of the logic initJoinLoop() from onJoinFailed(), at least in the us915 version:

  1. set LMIC.adrTxPow to 20
  2. setDrJoin(DRCHG_SET, DR_SF7);

And also reset LMIC.txCnt to zero.

@terrillmoore
Copy link
Member Author

According to LoRaWAN Regional Specs for US915 (page 12, section 2.2.2, line 28):

If using the over-the-air activation procedure, the end-device should broadcast the JoinReq message alternatively on a random 125 kHz channel amongst the 64 channels defined using DR0 and a random 500 kHz channel amongst the 8 channels defined using DR4. The end device should change channel for every transmission.

The LMIC code doesn't do this. It drops power from SF7/125 (DR4) to SF10/125 (DR0) on each channel change, then declares a failure after getting to DR0 without a successful join. The "Join failed" indication is an artifact of this code; there's no requirement in the spec to report a join failure to the application.

So in fact, we should change the code to always use DR_SF10 for joining, and that will take out the logic that is decrementing the data rate and causing the overflow. Since join doesn't change adrTxPow, there's no need to reset it.

@terrillmoore
Copy link
Member Author

Fixed on master by fc9494c. Fixed originally by 18a05f1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant