-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Linux/ARM: Fix +3000 bus errors of unit-test in case of O2/O3 levels #6379
Linux/ARM: Fix +3000 bus errors of unit-test in case of O2/O3 levels #6379
Conversation
Below is unit-test result of CoreCLR. Before applying the patch: The number of unit-test failures is 3886 (-O2 and -O3 make same result in release build mode)
After applying the patch: The number of unit-test failures is 47 (such as debug build mode)
|
@jkotas , Could you review and merge this PR to enable the release build mode + O3 (without O1) on the Linux/ARM board? And, I have been working the next PR to keep the existing /CC: @janvorli , @benpye, @myungjoo , @lemmaa , @manu-silicon , @parjong |
@@ -201,7 +201,13 @@ extern "C" { | |||
|
|||
#else // _MSC_VER | |||
|
|||
#if defined (_M_ARM) && !defined (DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use _ARM_
instead of _M_ARM
.
And why do you define it differently for DEBUG and non-DEBUG? It seems it would not hurt anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use ARM instead of _M_ARM.
Basically, I though that the valid macro is _M_ARM. I will try to change the macro from _M_ARM to _ARM.
And why do you define it differently for DEBUG and non-DEBUG?
I have defined it for DEBUG (-O0) and non-DEBUG (for Checked/-O2 and Release/-O3).
It seems it would not hurt anything.
I can not guarantee it would not hut anything because I did have enough evaluation. I would like to remove "!defined (DEBUG)" flag after evaluating the unit-test and the aging-test enough as a next step if we need.
617c2d1
to
3a0d6f3
Compare
The PR is Updated with squash command. |
@leemgs ah, I see now, the |
@janvorli , Yes. I know that. However, I had got the And, some approaches could not fix all bus errors (e.g. approximately 1200 failures / 3000 failures) |
Most of the codebase uses macros/methods like |
@jkotas , According to the above CI report, the Windows environment generates
|
Windows build is using Microsoft Visual Studio 2015 compilers. |
@leemgs the problem with Windows build here is that the UNALIGNED_ARM is not defined. The pal.h is never included on Windows, since there is no PAL. |
e46c94e
to
c55959e
Compare
@sjsinju, @prajwal-aithal , Could you check the issue of |
a8774de
to
9d25291
Compare
@@ -757,13 +757,26 @@ inline | |||
signed __int16 getI2LittleEndian(const BYTE * ptr) | |||
{ return * (UNALIGNED signed __int16 *)ptr; } | |||
|
|||
#if defined(_ARM_) && defined(__linux__) && !defined (DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use the GET_UNALIGNED_VAL32 and GET_UNALIGNED_VAL64 in these functions instead of this change as @jkotas has suggested before? It would make this change cleaner without any need for ifdefs.
Also, please modify all of the functions, not just the ones that were causing problems now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use the GET_UNALIGNED_VAL32 and GET_UNALIGNED_VAL64 in
these functions instead of this change as @jkotas has suggested before? It would make
this change cleaner without any need for ifdefs.
Yes. I have modified the exiting PR. I will again send the PR as soon as I verify the effect of new PR with the CoreCLR's unit-test.
9d25291
to
32d64c5
Compare
I have sent new PR using the |
@@ -98,8 +98,9 @@ inline void SwapGuid(GUID *pGuid) | |||
#define VALPTR(x) VAL32(x) | |||
#endif | |||
|
|||
#if defined(ALIGN_ACCESS) && !defined(_MSC_VER) | |||
|
|||
#if defined(_TARGET_ARM_) && !defined(_MSC_VER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janvorli , @jkotas Could give me a hint about the defined(ALIGN_ACCESS)
? Is ALIGN_ACCESS
coming from ./src/inc/stdmacros.h#L131 ? I could not know the related history why this statement was created in the past, because it was made by 'dotnet-bot'.
^ef1e2ab (dotnet-bot 2015-01-30 14:14:42 -0800 103 #if defined(ALIGN_ACCESS) && !def
^ef1e2ab (dotnet-bot 2015-01-30 14:14:42 -0800 104) #ifdef __cplusplus
^ef1e2ab (dotnet-bot 2015-01-30 14:14:42 -0800 105) extern "C++" {
When I checked a real ALIGN_ACCESS
definition at build time, Actually, the ALIGN_ACCESS
(in ./src/pal/inc/pal_endian.h) macro was not defined in case of cross-compilation of CoreCLR for Linux/ARM. So, I have modified the existing if defined(ALIGN_ACCESS) && !defined(_MSC_VER)
statement to aware aggressive optimization levels (e.g., O2/O3) in Linux/ARM environment. Could you review the related lines that I modified?
@sjsinju, Thanks. The ARM CI is running normally. When this PR will be merged, Please, Append the |
LGTM modulo comment |
32d64c5
to
2f4ed8f
Compare
I re-submitted new PR (w/ squash) after applying the review of @jkotas. Below is evaluation result of the unit-test.
|
2f4ed8f
to
941231c
Compare
**PROBLEM** This patch is to resolve +3000 bus errors that are generated whenever we use the aggressive optimization levels of clang (issue #5844 ). When we enable the -O3 optimization level of the clang version(from clang 3.5 to clang 3.9(snapshot)), we have always got the lots of bus errors from the coreCLR's unit tests. Actually, we can easily monitor SIGBUS signals (e.g., "misaligned memory access") with /proc/cpu/alignment facility in kernel space. Using "echo 2 > /proc/cpu/alignment" makes Linux kernel fixes the problems but the performance of the application will be degraded. .source: http://lxr.free-electrons.com/source/Documentation/arm/mem_alignment **VERSION 4** . Use 'GET_UNALIGNED_VALXXX' macros in the CoreClr infra-structure without any need for ifdefs. **VERSION 3** .Apply this PR on only Linux/ARM for different system environment (Windows). Here is .NET CI Report on Windows: Compile Error error C2146: syntax error: missing ';' before identifier '__unaligned_int32' (compiling source file D:\j\workspace\checked_windo---f6dc6fe4\src\jit\alloc.cpp) [D:\j\workspace\checked_windo---f6dc6fe4\bin\obj\Windows_NT.x64.Checked\src\jit\ crossgen\clrjit_crossgen.vcxproj] Indication 1 **VERSION 2** .Add UNALIGNED_ARM macro for handling ARM core specific optimization levels. .Add RISC-based ARM core handling into the existing infra-structure of the platform adaptation layer (PAL) for aggressive optimization cases on Linux/ARM. **VERSION 1** Basically, RISC-based ARM architecture requires aligned access with 4byte reads. In order to support aggressive optimization levels such as O2/O3, let's use attribute keyword of aligned(1) instead of using memcpy(2) in into a properly aligned buffer or the packing attribute. **BACKGROUND** According to ARM information center(infocenter.arm.com), By default, the ARM compiler expects normal C and C++ pointers to point to an aligned word in memory. A type qualifier __packed is provided to enable unaligned pointer access. If you want to define a pointer to a word that can be at any address (that is, that can be at a non-natural alignment), you must specify this using the __packed qualifier when defining the pointer: __packed int *pi; // pointer to unaligned int However, clang/llvm does not support the __packed qualifier such as __attribute__((packed)) or __attribute__((packed, aligned(4))) In -O0 (debug build) the innermost block is emitted as the following assembly, which works properly: ldr r1, [r0, dotnet#24] ldr r2, [r0, dotnet#20] In -O3 (release build) however the compiler realizes these fields are adjacent and generates this assembly: ldrdeq r2, r3, [r0, dotnet#20] Unfortunately, vldr/ldrdb instructions always generate an alignment fault (in practice). It seems that clang uses ldrb instruction although GCC uses ldr instruction because armv7 supports unaligned ldr instruction. Note: If some arm architectures (e.g., Linux/ARM Emulator) does not support unaligned ldr, this issue is not generated with aggressive optimization levels (e.g., -O2 and -O3). * Case study: How does the ARM Compiler support unaligned accesses? http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html * Case study: Indicating unaligned access to Clang for ARM compatibility http://stackoverflow.com/questions/9185811/indicating-unaligned-access-to-clang-for-arm-compatibility * Case study: Chromium source for UnalignedLoad32() on ARM https://github.com/nwjs/chromium.src/blob/nw15/third_party/cld/base/basictypes.h#L302 Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>
941231c
to
37dc935
Compare
👍 |
@dotnet-bot test Windows_NT x86 ryujit Checked Build and Test please |
@jkotas , @janvorli Could you merge this PR to fix the bus errors of CoreCLR with Release Build +
|
…otnet/coreclr#6379) **PROBLEM** This patch is to resolve +3000 bus errors that are generated whenever we use the aggressive optimization levels of clang (issue dotnet/coreclr#5844 ). When we enable the -O3 optimization level of the clang version(from clang 3.5 to clang 3.9(snapshot)), we have always got the lots of bus errors from the coreCLR's unit tests. Actually, we can easily monitor SIGBUS signals (e.g., "misaligned memory access") with /proc/cpu/alignment facility in kernel space. Using "echo 2 > /proc/cpu/alignment" makes Linux kernel fixes the problems but the performance of the application will be degraded. .source: http://lxr.free-electrons.com/source/Documentation/arm/mem_alignment **VERSION 4** . Use 'GET_UNALIGNED_VALXXX' macros in the CoreClr infra-structure without any need for ifdefs. **VERSION 3** .Apply this PR on only Linux/ARM for different system environment (Windows). Here is .NET CI Report on Windows: Compile Error error C2146: syntax error: missing ';' before identifier '__unaligned_int32' (compiling source file D:\j\workspace\checked_windo---f6dc6fe4\src\jit\alloc.cpp) [D:\j\workspace\checked_windo---f6dc6fe4\bin\obj\Windows_NT.x64.Checked\src\jit\ crossgen\clrjit_crossgen.vcxproj] Indication 1 **VERSION 2** .Add UNALIGNED_ARM macro for handling ARM core specific optimization levels. .Add RISC-based ARM core handling into the existing infra-structure of the platform adaptation layer (PAL) for aggressive optimization cases on Linux/ARM. **VERSION 1** Basically, RISC-based ARM architecture requires aligned access with 4byte reads. In order to support aggressive optimization levels such as O2/O3, let's use attribute keyword of aligned(1) instead of using memcpy(2) in into a properly aligned buffer or the packing attribute. **BACKGROUND** According to ARM information center(infocenter.arm.com), By default, the ARM compiler expects normal C and C++ pointers to point to an aligned word in memory. A type qualifier __packed is provided to enable unaligned pointer access. If you want to define a pointer to a word that can be at any address (that is, that can be at a non-natural alignment), you must specify this using the __packed qualifier when defining the pointer: __packed int *pi; // pointer to unaligned int However, clang/llvm does not support the __packed qualifier such as __attribute__((packed)) or __attribute__((packed, aligned(4))) In -O0 (debug build) the innermost block is emitted as the following assembly, which works properly: ldr r1, [r0, dotnet/coreclr#24] ldr r2, [r0, dotnet/coreclr#20] In -O3 (release build) however the compiler realizes these fields are adjacent and generates this assembly: ldrdeq r2, r3, [r0, dotnet/coreclr#20] Unfortunately, vldr/ldrdb instructions always generate an alignment fault (in practice). It seems that clang uses ldrb instruction although GCC uses ldr instruction because armv7 supports unaligned ldr instruction. Note: If some arm architectures (e.g., Linux/ARM Emulator) does not support unaligned ldr, this issue is not generated with aggressive optimization levels (e.g., -O2 and -O3). * Case study: How does the ARM Compiler support unaligned accesses? http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html * Case study: Indicating unaligned access to Clang for ARM compatibility http://stackoverflow.com/questions/9185811/indicating-unaligned-access-to-clang-for-arm-compatibility * Case study: Chromium source for UnalignedLoad32() on ARM https://github.com/nwjs/chromium.src/blob/nw15/third_party/cld/base/basictypes.h#L302 Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com> Commit migrated from dotnet/coreclr@561b64d
PROBLEM
This patch is to resolve +3000 bus errors that are reproduced whenever we use
the aggressive optimization levels of clang (issue #5844 ).
When we enable the -O3 optimization level of the clang version(from clang 3.5
to clang 3.9(snapshot)), we have always got the lots of bus errors from the
coreCLR's unit tests. Actually, we can easily monitor SIGBUS signals (e.g.,
"misaligned memory access") with /proc/cpu/alignment facility in kernel space.
Using "echo 2 > /proc/cpu/alignment" makes Linux kernel fixes the problems
but the performance of the application will be degraded.
.source: http://lxr.free-electrons.com/source/Documentation/arm/mem_alignment
VERSION 4
. Use 'GET_UNALIGNED_VALXXX' macros in the CoreClr infra-structure without
any need for ifdefs.
VERSION 3
.Apply this PR on only Linux/ARM for different system environment (Windows).
Here is .NET CI Report on Windows: Compile Error
error C2146: syntax error: missing ';' before identifier '__unaligned_int32'
(compiling source file D:\j\workspace\checked_windo---f6dc6fe4\src\jit\alloc.cpp)
[D:\j\workspace\checked_windo---f6dc6fe4\bin\obj\Windows_NT.x64.Checked\src\jit
crossgen\clrjit_crossgen.vcxproj] Indication 1
VERSION 2
.Add UNALIGNED_ARM macro for handling ARM core specific optimisation levels.
.Add RISC-based ARM core handling into the existing infra-structure of the
platform adaptation layer (PAL) for aggressive optimization cases on Linux/ARM.
VERSION 1
Basically, RISC-based ARM architecture requires aligned access with 4byte reads.
In order to support aggressive optimization levels such as O2/O3, let's use
attribute keyword of aligned(1) instead of using memcpy(2) in into
a properly aligned buffer or the packing attribute.
Signed-off-by: Geunsik Lim geunsik.lim@samsung.com