Skip to content

Conversation

maribu
Copy link
Member

@maribu maribu commented Apr 24, 2019

Contribution description

Previously the compiler was allowed to reorder access to the interrupt control registers in regard to memory access not marked as volatile (at least some people - most notably some compiler developers - read the C standard this way). In practise this did not happen as irq_disable(), irq_restore(), irq_enable() are part of a separate compilation unit: Calls to external functions unknown to the compiler are treated as if they were memory barriers. But if link time optimization (LTO) is enabled, this no longer would work: The compiler could inline the code accessing the interrupt control registers and reorder the memory accesses wrapped in irq_disable() and irq_restore() outside of their protection.

This commit adds the "memory" clobber to the inline assembly accessing the interrupt control registers. This makes those accesses explicit compiler memory barriers. The machine code generated without LTO enabled should not differ in any way by this commit. But the use of irq_*() should now be safe with LTO.

Testing procedure

Compile some application for an ARMv7 (no CortexM) board, e.g. the MSB-A2. Without LTO no differences before and after this PR should be present, with this PR they could be present.

E.g. here is the output of elf_diff generated for the binaries in examples/default generated by:

Issues/PRs references

Related to #11438, as this implements the barrier claimed to be present in the doc there


Update 1: Updated binary diffs

Previously the compiler was allowed to reorder access to the interrupt control
registers in regard to memory access not marked as `volatile` (at least some
people - most notably some compiler developers - read the C standard this way).
In practise this did not happen as irq_disable(), irq_restore(), irq_enable()
are part of a separate compilation unit: Calls to external functions unknown to
the compiler are treated as if they were memory barriers. But if link time
optimization (LTO) is enabled, this no longer would work: The compiler could
inline the code accessing the interrupt control registers and reorder the memory
accesses wrapped in irq_disable() and irq_restore() outside of their protection.

This commit adds the "memory" clobber to the inline assembly accessing the
interrupt control registers. This makes those accesses explicit compiler memory
barriers. The machine code generated without LTO enabled should not differ in
any way by this commit. But the use of irq_*() should now be safe with LTO.
@maribu maribu added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Platform: ARM Platform: This PR/issue effects ARM-based platforms labels Apr 24, 2019
@maribu maribu requested a review from kaspar030 April 24, 2019 15:22
@maribu
Copy link
Member Author

maribu commented Apr 26, 2019

The binary diffs I created were incorrect: I did not build within the same branch name, but the branch name is printed in the boot up message. And I also used LTO=yes instead of LTO=1. The bin diffs are now fixed accordingly.

Surprisingly, the builds without LTO now differ in size (+4 B .text), while those with LTO have the same size.

@maribu
Copy link
Member Author

maribu commented May 23, 2019

@gschorcht: Can you check if the ESP8266 and the ESP32 implementation of irq_disable(), irq_enable()andirq_restore()` is not affected by the same bug?

The Xtensa is out-of-order, so a compiler only barrier would not be sufficient there. But maybe the RSIL instruction does implicitly work as a full barrier? (I found no information on that. In other code I only found an rsync issued after wsr which your code in irq_restore() does as well.) If no additional barrier instructions need to be issued in the inline assembly, I still think that the inline assembly should have memory as clobber, so that the compiler does not move memory accesses across irq_disable() and irq_restore().

(Note that an external function is not effectively a barrier, as the compiler can move memory access across theses calls once that external function is no longer a black box. That is the case with LTO enabled, see #11438 (comment))

Copy link
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

@gschorcht
Copy link
Contributor

@maribu I will try to take a look, but it will take some time since I have to understand the problem first. I'm quite busy by my regular business at the moment.

@kaspar030 kaspar030 added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label May 23, 2019
@maribu
Copy link
Member Author

maribu commented May 23, 2019

@gschorcht: I do lack the knowledge about the Xtensa ISA, but from the other code I so I would guess is that only memory needs to be added to the clobbers. If that is true, issues will not arise unless LTO is enabled, which by default is disabled.

So from my point of view this not urgent (even though I think it is important to eventually check/fix it).

@gschorcht
Copy link
Contributor

@maribu I am not sure whether any problem could occur with the following code in irq_disable/irq_enable:

    uint32_t _saved_intlevel;
    __asm__ volatile ("rsil %0, 0" : "=a" (_saved_intlevel));
    _saved_intlevel &= 0xf;
    return _saved_intlevel;

Local variables are hold in registers and the compiler produces the following instructions for the code above:

    rsil   a2, 0
    extui  a2, a2, 0, 4
    retw.n

The return value is stored in a2.

Xtensa Instruction Set Architecture (ISA) Reference Manual says:
RSIL first reads the PS Special Register, writes this value to address register at, and then sets PS.INTLEVEL to a constant in the range 0..15 encoded in the instruction word. ... The instruction following the RSIL is guaranteed to be executed at the new interrupt level specified in PS.INTLEVEL, therefore it is not necessary to insert one of the SYNC instructions to force the interrupt level change to take effect.

rsil is a single instruction and the old interrupt level is stored in register a2 in any case before the new interrupt level is set. Even if a context switch would happen between rsil and the following extui to extract the lowest four bits, it shouldn't be a problem.

The code produced for irq_restore

    __asm__ volatile ("wsr %0, ps; rsync" :: "a" (state));

is of course

   wsr.ps  a2
   rsync
   ret.n

where a2 contains the new interrupt level as parameter.

Xtensa Instruction Set Architecture (ISA) Reference Manual says:
RSYNC waits for all previously fetched WSR.* instructions to be performed before interpreting the register fields of the next instruction.

@maribu
Copy link
Member Author

maribu commented May 24, 2019

@gschorcht: Perfect, so only the memory clobbers are needed there to make sure the compiler does not move memory accesses wrappen in irq_disable() / irq_restore() outside of it with LTO enabled.

@gschorcht
Copy link
Contributor

@gschorcht: Perfect, so only the memory clobbers are needed there to make sure the compiler does not move memory accesses wrappen in irq_disable() / irq_restore() outside of it with LTO enabled.

Sorry to ask again, I'm still not sure whether I understand the problem. If I understood you correclty, you are afraid that LTO might inline the code of these functions and reorder the memory access arround. Right? If so, do you think that this could happen even if the functions are present in ELF binaries?

One other question arose to me related to the topic. GCC for Xtensa knows an option -mserialize-volatile. When this option is enabled, GCC inserts MEMW instructions before volatile memory references to guarantee sequential consistency. This option is disabled in GCC for ESPs by default. Do you think it makes sense to enable them?

@kaspar030
Copy link
Contributor

-mserialize-volatile

Didn't know about that, thanks for the hint!
It seems it is default anyways: (from "man gcc")

       -mserialize-volatile
       -mno-serialize-volatile
           When this option is enabled, GCC inserts "MEMW" instructions before "volatile" memory references to guarantee sequential
           consistency.  The default is -mserialize-volatile.  Use -mno-serialize-volatile to omit the "MEMW" instructions.

@kaspar030
Copy link
Contributor

Anyhow, this PR is good to go.

@kaspar030 kaspar030 merged commit 35d43cc into RIOT-OS:master May 24, 2019
@gschorcht
Copy link
Contributor

@kaspar030 > It seems it is default anyways: (from "man gcc")

Not for ESPs:

xtensa-lx106-elf-gcc -Q --help=target
The following options are target specific:
...
  -mno-serialize-volatile     		[enabled]

@maribu
Copy link
Member Author

maribu commented May 24, 2019

If I understood you correclty, you are afraid that LTO might inline the code of these functions and reorder the memory access arround.

Exactly. The experiment in #11438 (comment) shows, that GCC does this for AVR.

If so, do you think that this could happen even if the functions are present in ELF binaries?

Yes :-/

One other question arose to me related to the topic. GCC for Xtensa knows an option -mserialize-volatile. When this option is enabled, GCC inserts MEMW instructions before volatile memory references to guarantee sequential consistency. This option is disabled in GCC for ESPs by default. Do you think it makes sense to enable them?

There have been some heated discussions lately on whether volatile can be used for synchronization. As far as I know there is pretty much a consensus: E.g. the Linux Kernel Developers explicitly forbid this use of volatile, e.g. Intel considers volatile almost useless for multi-threaded programming, and the C standard guys have added an standard atomic library since C11, as they felt previous versions of C did not provide the required means to do synchronization.

My personal point of view is to stick with that consensus and do not use volatile for means of synchronization. As far as I know irq_disable() and irq_restore() are either used explicitly (in low level code) or indirectly through higher level APIs such as mutex (in high level code). Therefore, -mno-serialize-volatile should be safe. It would have a performance cost to enable -serialize-volatile and would slightly increase code size.

@maribu maribu deleted the arm7_barriers branch November 4, 2019 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: ARM Platform: This PR/issue effects ARM-based platforms Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants