Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/msp430fxyz: improved SPI driver speed #5886

Merged
merged 3 commits into from
Oct 17, 2016

Conversation

haukepetersen
Copy link
Contributor

From the discussion in #4780 we heard, that the SPI driver speed for the MSP430 platform performs poorly when writing a burst of data (e.g. when transferring network packets to the radio). This PR should improve the situation quite a bit.

Changes:

  • improved the transfer function when only writing data to a device
  • added the transfer_byte() (convenience) function to the periph_common module while making the transfer_bytes() function local to the driver. This will significantly reduce the number of function calls while transferring data...

@PeterKietzmann would you mind to test this and benchmark this against your existing numbers? Thx

@haukepetersen haukepetersen added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Platform: MSP Platform: This PR/issue effects MSP-based platforms Impact: major The PR changes a significant part of the code base. It should be reviewed carefully labels Sep 28, 2016
@haukepetersen haukepetersen added this to the Release 2016.10 milestone Sep 28, 2016

return length;
}

int spi_transfer_byte(spi_t dev, char out, char *in)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll have to remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups...

@haukepetersen
Copy link
Contributor Author

fixed and squashed the occurrence of the old transfer_byte() function

@PeterKietzmann
Copy link
Member

Theoretically I agree with you about this adoption. Practically it appears that the cc2420 on the z1 node does not work with this PR. As a test I took to "old" SPI implementation and added a spi_fast_write method which I only use for writing the complete frame to the FIFO. That function is implemented as proposed in #4780 and everything works. Unfortunately this PR influences device drivers. I'm still about to figure out weather this is a problem of SPI or the device driver.

@haukepetersen
Copy link
Contributor Author

But could you hook up your scope and check out the output signal/timings with this PR, to make sure the driver is not broken in general but that the male functioning has something to do with timings?!

@haukepetersen
Copy link
Contributor Author

I don't have a Z1 here available right now, so I can't really test anything here. But by looking at the code again, the only thing that has changed should be, that we leave out a function call for each byte transferred, though we should have tighter inter-byte timings. Looking at the CC2420 datasheet, an SPI clock up to 20MHz should be fine (and we are currently at 5MHz, right?), so seems that the radio should actually work fine under the new conditions. Wonder what we are doing wrong here.

@PeterKietzmann
Copy link
Member

The SPI driver itself works like this, I checked it. However, the cc2420 driver (@4Mhz) has some problems here and there. There are some SPI accesses which are not successful when using NULL as output parameter. I think I identified most (still one issue open), but need to dig deeper into the question why. Luckily, writing the frame to the FIFO works with high speed feature (which usually consumes most of the time).
However, I fear that this change might introduce further device driver problems. Especially when thinking about adapting other SPI drivers like this. So again the counter question: How about we introduce an optional spi_write_fast function the the API?
I will report back once I proceeded with debugging the cc2420 driver using this PR. Hopefully everything will clear up.

@haukepetersen
Copy link
Contributor Author

I still have some trouble to understand, why the cc2420 driver has problems with the new code. When calling the transfer_bytes function with out := NULL, the adapted SPI driver is actually transferring a logical 0, while the current master is doing exactly the same...

However, I fear that this change might introduce further device driver problems.

Not sure... We need to figure out, why exactly the cc2420 is not behaving correctly. And if we open up new problems, it IMHO points more in the direction of bad driver implementation than to a broken SPI driver. And if so, we should face the problems with other drivers and fix them by the root instead of building upon a non-optimal SPI driver below to obfuscate those problems. But first step first, we need to understand whats going wrong here.

How about we introduce an optional spi_write_fast function the the API?

State now, I am absolutely against it - feels to me like building a quick-fix workaround into a main RIOT API: not good...

@jnohlgard
Copy link
Member

I am against the idea of creating two separate API calls for the same functionality. It should be possible to rewrite the loops for efficiency to take different code paths depending on whether the input is NULL or not and depending on the desired bus clock vs. the core clock to try to speed up the transfer.
Size-wise it should be the same or smaller compared to having two separate functions in the API.

@kaspar030
Copy link
Contributor

+1

-----Original Message-----
From: "Joakim Nohlgård" notifications@github.com
To: RIOT-OS/RIOT RIOT@noreply.github.com
Cc: Kaspar Schleiser kaspar@schleiser.de, Comment comment@noreply.github.com
Sent: Sa., 01 Okt. 2016 10:40
Subject: Re: [RIOT-OS/RIOT] cpu/msp430fxyz: improved SPI driver (write) speed (#5886)

I am against the idea of creating two separate API calls for the same functionality. It should be possible to rewrite the loops for efficiency to take different code paths depending on whether the input is NULL or not and depending on the desired bus clock vs. the core clock to try to speed up the transfer.
Size-wise it should be the same or smaller compared to having two separate functions in the API.

You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#5886 (comment)

@PeterKietzmann
Copy link
Member

@gebart that's what this PR does, isn't it? I still need to get this PR running with the cc2420 radio.

for (unsigned i = 0; i < length; i++) {
while (!(SPI_IF & SPI_IE_TX_BIT)) {}
SPI_DEV->TXBUF = (uint8_t)out[i];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the following lines here:

while(SPI_DEV->STAT & 1){};
SPI_DEV->RXBUF;

The first line waits for any ongoing transmission by reading the UCBUSY bit in the status register (compare datasheet sec. 16.4.5). The second line simply reads the receive buffer which will clear the UCxRXIFG bit. In master mode that bit indicates that data has been received. I tell you, without this read operation, things can get messy :/ ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haukepetersen you wrote the msp_430-regs file. Maybe you want to add a macro for the UCBUSY bit in the status register?

@haukepetersen
Copy link
Contributor Author

done.

@PeterKietzmann: could you do me one more favor? Could you connect your logic analyzer and post the output for
a) transfer_bytes(dev, buf, NULL, 10);
b) transfer_bytes(dev, buf, buf2, 10);
with buf and buf2 being some random 10 byte data buffer?

I would be interested in how much performance difference we encounter between the two code paths in the current form. But as said, I don't have a z1 here anymore, so I can't do it myself... Thx

@PeterKietzmann
Copy link
Member

@haukepetersen my logic analyser currently strikes, that's why I used an oscilloscope on the clock line. Case a) has an inter-byte-delay of ~360 ns (yes nano seconds) whereas b) has about 4,8 us. Is that enough for you?
My proposed change does not work for telosb :(. I need to figure out why before merging...

@haukepetersen
Copy link
Contributor Author

thanks!

An inter-byte difference of 4us, wow! So I guess this short-path makes sense here. Will play around with this on some other platforms as well... Thx again and post your progress on the telosb :-)

@PeterKietzmann
Copy link
Member

In the periph_conf.h of different msp430 based boards, SPI_IE_TX_BIT and SPI_IE_RX_BIT are defined. I think we should rename them to SPI_IF_TX_BIT and SPI_IF_RX_BIT as these were part of the "Interrupt Flag Register" (compare msp430xf2xx family user guid page 510 section 19.3.13. Same applies for older msp430xf1xx MCUs)

while (!(SPI_IF & SPI_IE_TX_BIT)) {}
SPI_DEV->TXBUF = (uint8_t)out[i];
while (SPI_DEV->STAT & USCI_SPI_STAT_UCBUSY) {}
SPI_DEV->RXBUF;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both lines should go below the closing brace in line 227 (just be called once, after the transmission of multiple bytes). However, we will need to adapt this passage for the older msp430 revisions so we can do it then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we need here (below the closing brace in line 227) is:

#ifndef SPI_USE_USCI
while (!(SPI_IF & SPI_IE_TX_BIT) || !(SPI_IF & SPI_IE_RX_BIT)) {}
SPI_DEV->RXBUF;
#else
while (SPI_DEV->STAT & USCI_SPI_STAT_UCBUSY) {}
SPI_DEV->RXBUF;
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use while (!(SPI_IF & SPI_IE_TX_BIT) || !(SPI_IF & SPI_IE_RX_BIT)) {} for both variants?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this (and simply waiting for !(SPI_IF & SPI_IE_RX_BIT)) does not work on z1 nodes with msp430xf2xx. I can't fully explain it but I've tested it several times.

}
return 1;
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haukepetersen I'd like to improve "simple" receiving as well

    else if (!out) {
        for (unsigned i = 0; i < length; i++) {
            SPI_DEV->TXBUF = 0;
            while (!(SPI_IF & SPI_IE_RX_BIT)) {}
            in[i] = (char)SPI_DEV->RXBUF;
        }
    }

@haukepetersen
Copy link
Contributor Author

  • took the read RX buf part out of the loop for the 'fast send' -> @PeterKietzmann simplified your proposal slightly, does this work?
  • added a 'fast receive' path

@PeterKietzmann
Copy link
Member

fast receive is ok

@PeterKietzmann PeterKietzmann changed the title cpu/msp430fxyz: improved SPI driver (write) speed cpu/msp430fxyz: improved SPI driver speed Oct 12, 2016
else {
char tmp;
for (unsigned i = 0; i < length; i++) {
tmp = (out) ? out[i] : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the !out case handled above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. will adapt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I confused this PRs. Can't adapt this one but #5921 later. However, your comment makes sense and test succeeded.

in[i] = (char)SPI_DEV->RXBUF;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It leads to the same assembly, but I'd prefer a little more C-like code:

unsigned n = length;

while(n--) {
    SPI_DEV->TXBUF = *out++;
}
  • makes the for loops less verbose
  • saves the compiler from doing all the dereferences (in[i] = x ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I actually tend to like the newby-array-notation better. I think it is easier to read for non-professional-pointer-magicians (or am I the only one?), and the effects on the compiler are neglectable...

@haukepetersen
Copy link
Contributor Author

fixed spi_transfer_ and rebased.

@kaspar030 I won't fix the array-style notation in the loop.

@haukepetersen haukepetersen added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Oct 17, 2016
@haukepetersen
Copy link
Contributor Author

also was so free to squash...

@haukepetersen haukepetersen added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Oct 17, 2016
@PeterKietzmann
Copy link
Member

I prefer the "old" notations as well, in terms of readability. However, I won't have NACKed Kaspars proposal. Will check this PR in the afternoon

SPI_DEV->TXBUF = (uint8_t)out[i];
}
while (!(SPI_IF & SPI_IE_RX_BIT)) {}
SPI_DEV->RXBUF;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the following here, even though it's not the prettiest solution. I've read the manual several times and tested different configurations on (and between) z1 and telosb motes so I'm pretty convinced this is the right thing to do:

#ifndef SPI_USE_USCI
        while (!(SPI_IF & SPI_IE_TX_BIT) || !(SPI_IF & SPI_IE_RX_BIT)) {}
        SPI_DEV->RXBUF;
#else
        while (SPI_DEV->STAT & 1) {}
        SPI_DEV->RXBUF;
#endif

IIRC you already introduced a macro for checking bit no 1 (BUSY bit) in the status register for devices with SPI_USE_USCI but this was lost I guess.

@PeterKietzmann
Copy link
Member

ACK once addressed. If we consider this PR for the release can we probably include #5921 as well?!

@haukepetersen
Copy link
Contributor Author

fixed and squashed right in.

while (!(SPI_IF & SPI_IE_TX_BIT)) {}
SPI_DEV->TXBUF = (uint8_t)out[i];
}
while (!(SPI_IF & SPI_IE_RX_BIT)) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @haukepetersen I was to imprecise in my previous comment. This line can go out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@PeterKietzmann PeterKietzmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK and go

@PeterKietzmann PeterKietzmann merged commit ed67143 into RIOT-OS:master Oct 17, 2016
@haukepetersen haukepetersen deleted the opt_msp_spi branch October 19, 2016 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Impact: major The PR changes a significant part of the code base. It should be reviewed carefully Platform: MSP Platform: This PR/issue effects MSP-based platforms Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants