Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Linux/ARM: Fix +3000 bus errors of unit-test in case of O2/O3 levels #6379

Merged

Conversation

leemgs
Copy link

@leemgs leemgs commented Jul 21, 2016

PROBLEM
This patch is to resolve +3000 bus errors that are reproduced whenever we use
the aggressive optimization levels of clang (issue #5844 ).

When we enable the -O3 optimization level of the clang version(from clang 3.5
to clang 3.9(snapshot)), we have always got the lots of bus errors from the
coreCLR's unit tests. Actually, we can easily monitor SIGBUS signals (e.g.,
"misaligned memory access") with /proc/cpu/alignment facility in kernel space.
Using "echo 2 > /proc/cpu/alignment" makes Linux kernel fixes the problems
but the performance of the application will be degraded.
.source: http://lxr.free-electrons.com/source/Documentation/arm/mem_alignment

VERSION 4
. Use 'GET_UNALIGNED_VALXXX' macros in the CoreClr infra-structure without
any need for ifdefs.

VERSION 3
.Apply this PR on only Linux/ARM for different system environment (Windows).

Here is .NET CI Report on Windows: Compile Error
error C2146: syntax error: missing ';' before identifier '__unaligned_int32'
(compiling source file D:\j\workspace\checked_windo---f6dc6fe4\src\jit\alloc.cpp)
[D:\j\workspace\checked_windo---f6dc6fe4\bin\obj\Windows_NT.x64.Checked\src\jit
crossgen\clrjit_crossgen.vcxproj] Indication 1

VERSION 2
.Add UNALIGNED_ARM macro for handling ARM core specific optimisation levels.
.Add RISC-based ARM core handling into the existing infra-structure of the
platform adaptation layer (PAL) for aggressive optimization cases on Linux/ARM.

VERSION 1
Basically, RISC-based ARM architecture requires aligned access with 4byte reads.
In order to support aggressive optimization levels such as O2/O3, let's use
attribute keyword of aligned(1) instead of using memcpy(2) in into
a properly aligned buffer or the packing attribute.

Signed-off-by: Geunsik Lim geunsik.lim@samsung.com

@leemgs
Copy link
Author

leemgs commented Jul 21, 2016

Below is unit-test result of CoreCLR.

Before applying the patch:

The number of unit-test failures is 3886 (-O2 and -O3 make same result in release build mode)

=======================
     Test Results (Release + O3)
=======================
# CoreCLR Bin Dir  : /unit-test/bin.coreclr.release.20160627.1.O3/Product/Linux.arm.Release
# Tests Discovered : 9873
# Passed           : 5626
# Failed           : 3886 (commit: 20160627-52570683, OOM: 1, manual kill commands: 7 )
# Skipped          : 361
=======================
102 minutes and 10 seconds taken to run CoreCLR tests.

real    103m44.120s
user    256m19.410s
sys 13m29.340s
root@target:/unit-test/coreclr.git/tests#

After applying the patch:

The number of unit-test failures is 47 (such as debug build mode)

=======================
     Test Results (Release + O3 ; with patch ; But, needs refactoring)
=======================
# CoreCLR Bin Dir  : /unit-test/bin.coreclr.release.20160627.1.O3.attribute.1.refacto1/Product/Linux.arm.Release
# Tests Discovered : 9873
# Passed           : 9465
# Failed           : 47 (commit: 20160627-52570683, OOM: 2,  manual kill commands: 0 ) <=== N O W ! ! !
# Skipped          : 361
=======================
117 minutes and 42 seconds taken to run CoreCLR tests.

real    119m15.377s
user    145m21.515s
sys 17m23.065s

@leemgs
Copy link
Author

leemgs commented Jul 21, 2016

@jkotas , Could you review and merge this PR to enable the release build mode + O3 (without O1) on the Linux/ARM board?

And, I have been working the next PR to keep the existing UNALIGNED macro even though this PR can resolve this issue #5844. Finally, I hope that we can successfully enable -Oz (for size) and -Ofast(for speed) for deployment of the Linux/ARM based devices.

/CC: @janvorli , @benpye, @myungjoo , @lemmaa , @manu-silicon , @parjong

@@ -201,7 +201,13 @@ extern "C" {

#else // _MSC_VER

#if defined (_M_ARM) && !defined (DEBUG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use _ARM_ instead of _M_ARM.
And why do you define it differently for DEBUG and non-DEBUG? It seems it would not hurt anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ARM instead of _M_ARM.

Basically, I though that the valid macro is _M_ARM. I will try to change the macro from _M_ARM to _ARM.

And why do you define it differently for DEBUG and non-DEBUG?

I have defined it for DEBUG (-O0) and non-DEBUG (for Checked/-O2 and Release/-O3).

It seems it would not hurt anything.

I can not guarantee it would not hut anything because I did have enough evaluation. I would like to remove "!defined (DEBUG)" flag after evaluating the unit-test and the aging-test enough as a next step if we need.

@leemgs leemgs force-pushed the upstream-bus-err-attribute-aligned1-refact1 branch from 617c2d1 to 3a0d6f3 Compare July 21, 2016 10:36
@leemgs
Copy link
Author

leemgs commented Jul 21, 2016

Basically, I though that the valid macro is _M_ARM. I will try to change the macro from _M_ARM to _ARM.

The PR is Updated with squash command.

@janvorli
Copy link
Member

@leemgs ah, I see now, the __attribute__((aligned(1))) cannot be used in cast, just in typedefs.
But can you please try if defining UNALIGNED to __unaligned like we do on windows and see if it would work too? That would be the simplest overall the code base. I've tried to do it this way on x64 ubuntu and the compiler didn't complain.

@leemgs
Copy link
Author

leemgs commented Jul 21, 2016

That would be the simplest overall the code base.
I've tried to do it this way on x64 ubuntu and the compiler didn't complain.

@janvorli , Yes. I know that. However, I had got the error: unterminated conditional directive while building CoreCLR in case of Linux/ARM architecture. As I shared previously, I had evaluated the effect of the below approaches.

And, some approaches could not fix all bus errors (e.g. approximately 1200 failures / 3000 failures)

@jkotas
Copy link
Member

jkotas commented Jul 21, 2016

Most of the codebase uses macros/methods like GET_UNALIGNED_VAL16 to read unaligned values. It may be best to switch these two places in the JIT to use this macro as well to fix this problem - instead of trying to tweak the definition of UNALIGNED.

@leemgs
Copy link
Author

leemgs commented Jul 21, 2016

@jkotas , According to the above CI report, the Windows environment generates Compile Error. Could you share me the version of clang/llvm that CI system uses on Windows OS to handle this case?

@jkotas
Copy link
Member

jkotas commented Jul 21, 2016

Windows build is using Microsoft Visual Studio 2015 compilers.

@janvorli
Copy link
Member

@leemgs the problem with Windows build here is that the UNALIGNED_ARM is not defined. The pal.h is never included on Windows, since there is no PAL.

@leemgs
Copy link
Author

leemgs commented Jul 22, 2016

@jkotas , @janvorli Thank you for sharing the information and hint about the CI build error. I will check it to resolve the build error of CI.

@leemgs leemgs force-pushed the upstream-bus-err-attribute-aligned1-refact1 branch from e46c94e to c55959e Compare July 22, 2016 03:34
@leemgs
Copy link
Author

leemgs commented Jul 22, 2016

Linux ARM Emulator Cross Debug Build
umount: /opt/linux-arm-emulator-root/dev: device is busy.

@sjsinju, @prajwal-aithal , Could you check the issue of Linux ARM Emulator Cross ***? This issue results from the incorrect operation of mount/unmount.

@leemgs leemgs force-pushed the upstream-bus-err-attribute-aligned1-refact1 branch 2 times, most recently from a8774de to 9d25291 Compare July 22, 2016 05:13
@leemgs
Copy link
Author

leemgs commented Jul 22, 2016

I have executed squash to make one patch because all CI tests are passed.
@jkotas , @janvorli Could you review and merge the updated PR to fix the unit-test failures of CoreCLR in Release Build + O3 (by default)?

@@ -757,13 +757,26 @@ inline
signed __int16 getI2LittleEndian(const BYTE * ptr)
{ return * (UNALIGNED signed __int16 *)ptr; }

#if defined(_ARM_) && defined(__linux__) && !defined (DEBUG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use the GET_UNALIGNED_VAL32 and GET_UNALIGNED_VAL64 in these functions instead of this change as @jkotas has suggested before? It would make this change cleaner without any need for ifdefs.
Also, please modify all of the functions, not just the ones that were causing problems now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use the GET_UNALIGNED_VAL32 and GET_UNALIGNED_VAL64 in
these functions instead of this change as @jkotas has suggested before? It would make
this change cleaner without any need for ifdefs.

Yes. I have modified the exiting PR. I will again send the PR as soon as I verify the effect of new PR with the CoreCLR's unit-test.

@leemgs leemgs force-pushed the upstream-bus-err-attribute-aligned1-refact1 branch from 9d25291 to 32d64c5 Compare July 25, 2016 09:07
@leemgs
Copy link
Author

leemgs commented Jul 25, 2016

I have sent new PR using the GET_UNALIGNED_VALxxx. The evaluation result of unit-test will be appeared shortly.

@@ -98,8 +98,9 @@ inline void SwapGuid(GUID *pGuid)
#define VALPTR(x) VAL32(x)
#endif

#if defined(ALIGN_ACCESS) && !defined(_MSC_VER)

#if defined(_TARGET_ARM_) && !defined(_MSC_VER)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli , @jkotas Could give me a hint about the defined(ALIGN_ACCESS) ? Is ALIGN_ACCESS coming from ./src/inc/stdmacros.h#L131 ? I could not know the related history why this statement was created in the past, because it was made by 'dotnet-bot'.

^ef1e2ab (dotnet-bot        2015-01-30 14:14:42 -0800 103 #if defined(ALIGN_ACCESS) && !def
^ef1e2ab (dotnet-bot        2015-01-30 14:14:42 -0800 104) #ifdef __cplusplus
^ef1e2ab (dotnet-bot        2015-01-30 14:14:42 -0800 105) extern "C++" {

When I checked a real ALIGN_ACCESS definition at build time, Actually, the ALIGN_ACCESS (in ./src/pal/inc/pal_endian.h) macro was not defined in case of cross-compilation of CoreCLR for Linux/ARM. So, I have modified the existing if defined(ALIGN_ACCESS) && !defined(_MSC_VER) statement to aware aggressive optimization levels (e.g., O2/O3) in Linux/ARM environment. Could you review the related lines that I modified?

@leemgs
Copy link
Author

leemgs commented Jul 25, 2016

@sjsinju, @prajwal-aithal , Could you check the issue of Linux ARM Emulator Cross ***? This issue results from the incorrect operation of mount/unmount.

@sjsinju, Thanks. The ARM CI is running normally. When this PR will be merged, Please, Append the ./common/AboveStackLimit/AboveStackLimit.exe unit-test into https://github.com/dotnet/coreclr/blob/master/tests/testsRunningInsideARM.txt to check a bus error.
/CC: @jyoungyun

@jkotas
Copy link
Member

jkotas commented Jul 25, 2016

LGTM modulo comment

@leemgs leemgs force-pushed the upstream-bus-err-attribute-aligned1-refact1 branch from 32d64c5 to 2f4ed8f Compare July 25, 2016 22:13
@leemgs
Copy link
Author

leemgs commented Jul 25, 2016

I re-submitted new PR (w/ squash) after applying the review of @jkotas. Below is evaluation result of the unit-test.

  • CoreCLR commit 5257068
  • On Raspberry Pi2 Board
  • Build Mode: Release build + O3 (by default)
  • Before the PR: (OOM: 1, manual kill commands: 7 )
=======================
     Test Results 
=======================
# CoreCLR Bin Dir  : /unit-test/bin.coreclr.release.O3/Product/Linux.arm.Release
# Tests Discovered : 9873
# Passed           : 5626
# Failed           : 3886 
# Skipped          : 361
=======================
102 minutes and 10 seconds taken to run CoreCLR tests.

real    103m44.120s
user    256m19.410s
sys 13m29.340s
root@target:/unit-test/coreclr.git/tests# 
  • After the PR: (OOM: 2, manual kill commands: 0 )
=======================
     Test Results
=======================
# CoreCLR Bin Dir  : /unit-test/bin.coreclr.release.O3.attribute.1.refacto1.2/Product/Linux.arm.Release
# Tests Discovered : 9873
# Passed           : 9465
# Failed           : 47
# Skipped          : 361
=======================
117 minutes and 48 seconds taken to run CoreCLR tests.

real    119m22.713s
user    145m51.545s
sys 17m8.125s

@leemgs leemgs force-pushed the upstream-bus-err-attribute-aligned1-refact1 branch from 2f4ed8f to 941231c Compare July 25, 2016 22:34
**PROBLEM**
This patch is to resolve +3000 bus errors that are generated whenever we use
the aggressive optimization levels of clang (issue #5844 ).

When we enable the -O3 optimization level of the clang version(from clang 3.5
to clang 3.9(snapshot)), we have always got the lots of bus errors from the
coreCLR's unit tests. Actually, we can easily monitor SIGBUS signals (e.g.,
"misaligned memory access") with /proc/cpu/alignment facility in kernel space.
Using "echo 2 > /proc/cpu/alignment" makes Linux kernel fixes the problems
but the performance of the application will be degraded.
.source: http://lxr.free-electrons.com/source/Documentation/arm/mem_alignment

**VERSION 4**
. Use 'GET_UNALIGNED_VALXXX' macros in the CoreClr infra-structure without
  any need for ifdefs.

**VERSION 3**
.Apply this PR on only Linux/ARM for different system environment (Windows).

Here is .NET CI Report on Windows: Compile Error
error C2146: syntax error: missing ';' before identifier '__unaligned_int32'
(compiling source file D:\j\workspace\checked_windo---f6dc6fe4\src\jit\alloc.cpp)
[D:\j\workspace\checked_windo---f6dc6fe4\bin\obj\Windows_NT.x64.Checked\src\jit\
crossgen\clrjit_crossgen.vcxproj] Indication 1

**VERSION 2**
.Add UNALIGNED_ARM macro for handling ARM core specific optimization levels.
.Add RISC-based ARM core handling into the existing infra-structure of the
platform adaptation layer (PAL) for aggressive optimization cases on Linux/ARM.

**VERSION 1**
Basically, RISC-based ARM architecture requires aligned access with 4byte reads.
In order to support aggressive optimization levels such as O2/O3, let's use
attribute keyword of aligned(1) instead of using memcpy(2) in into
a properly aligned buffer or the packing attribute.

**BACKGROUND**
According to ARM information center(infocenter.arm.com), By default,
the ARM compiler expects normal C and C++ pointers to point
to an aligned word in memory. A type qualifier __packed is provided to
enable unaligned pointer access. If you want to define a pointer to a word
that can be at any address (that is, that can be at a non-natural alignment),
you must specify this using the __packed qualifier when defining the pointer:
__packed int *pi; // pointer to unaligned int

However, clang/llvm does not support the __packed qualifier such as
__attribute__((packed)) or __attribute__((packed, aligned(4)))

In -O0 (debug build) the innermost block is emitted as the following assembly,
which works properly:
 ldr r1, [r0, dotnet#24]
 ldr r2, [r0, dotnet#20]

In -O3 (release build) however the compiler realizes these fields are adjacent
and generates this assembly:
 ldrdeq  r2, r3, [r0, dotnet#20]
Unfortunately, vldr/ldrdb instructions always generate an alignment fault
(in practice). It seems that clang uses ldrb instruction although GCC uses
ldr instruction because armv7 supports unaligned ldr instruction.

Note: If some arm architectures (e.g., Linux/ARM Emulator) does not support
unaligned ldr, this issue is not generated with aggressive optimization
levels (e.g., -O2 and -O3).

* Case study: How does the ARM Compiler support unaligned accesses?
  http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html

* Case study: Indicating unaligned access to Clang for ARM compatibility
  http://stackoverflow.com/questions/9185811/indicating-unaligned-access-to-clang-for-arm-compatibility

* Case study: Chromium source for UnalignedLoad32() on ARM
  https://github.com/nwjs/chromium.src/blob/nw15/third_party/cld/base/basictypes.h#L302

Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>
@leemgs leemgs force-pushed the upstream-bus-err-attribute-aligned1-refact1 branch from 941231c to 37dc935 Compare July 25, 2016 22:37
@jkotas
Copy link
Member

jkotas commented Jul 25, 2016

👍

@leemgs
Copy link
Author

leemgs commented Jul 26, 2016

16:41:26 Setting status of 37dc935
to FAILURE with url http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/
job/x86_ryujit_checked_windows_nt_prtest/2421/ and message: 'Build finished. '
16:41:26 Using context: Windows_NT x86 ryujit Checked Build and Test
16:41:26 [WS-CLEANUP] Deleting project workspace...[WS-CLEANUP] done
16:41:27 Finished: FAILURE

@dotnet-bot test Windows_NT x86 ryujit Checked Build and Test please

@leemgs
Copy link
Author

leemgs commented Jul 26, 2016

@jkotas , @janvorli Could you merge this PR to fix the bus errors of CoreCLR with Release Build + -O3 (by default)? Below is the experimental result of CorCLR unitTest with the latest commit number. As a next step, We will go to the -Oz (for size) and -Ofast (for speed) to deploy the embedded devices based on CoreCLR.

  • Commit Number: c358f77 (Jul-25-2016)
  • On Samsung ARM ChromeBook
  • OS: Ubuntu/ARM 14.04 32bit
  • UnitTest Result After the PR: (OOM: 2, manual kill commands: 1 )
=======================
     Test Results
=======================
# CoreCLR Bin Dir  : /unit-test/bin.coreclr.release.O3.054d133a.20160725.1/Product/Linux.arm.Release
# Tests Discovered : 9873
# Passed           : 9484
# Failed           : 28
# Skipped          : 361
=======================
164 minutes and 14 seconds taken to run CoreCLR tests.

real    165m24.788s
user    153m41.410s
sys 56m11.885s
root@target:/unit-test/coreclr.git/tests# 

@jkotas jkotas merged commit 561b64d into dotnet:master Jul 26, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…otnet/coreclr#6379)

**PROBLEM**
This patch is to resolve +3000 bus errors that are generated whenever we use
the aggressive optimization levels of clang (issue dotnet/coreclr#5844 ).

When we enable the -O3 optimization level of the clang version(from clang 3.5
to clang 3.9(snapshot)), we have always got the lots of bus errors from the
coreCLR's unit tests. Actually, we can easily monitor SIGBUS signals (e.g.,
"misaligned memory access") with /proc/cpu/alignment facility in kernel space.
Using "echo 2 > /proc/cpu/alignment" makes Linux kernel fixes the problems
but the performance of the application will be degraded.
.source: http://lxr.free-electrons.com/source/Documentation/arm/mem_alignment

**VERSION 4**
. Use 'GET_UNALIGNED_VALXXX' macros in the CoreClr infra-structure without
  any need for ifdefs.

**VERSION 3**
.Apply this PR on only Linux/ARM for different system environment (Windows).

Here is .NET CI Report on Windows: Compile Error
error C2146: syntax error: missing ';' before identifier '__unaligned_int32'
(compiling source file D:\j\workspace\checked_windo---f6dc6fe4\src\jit\alloc.cpp)
[D:\j\workspace\checked_windo---f6dc6fe4\bin\obj\Windows_NT.x64.Checked\src\jit\
crossgen\clrjit_crossgen.vcxproj] Indication 1

**VERSION 2**
.Add UNALIGNED_ARM macro for handling ARM core specific optimization levels.
.Add RISC-based ARM core handling into the existing infra-structure of the
platform adaptation layer (PAL) for aggressive optimization cases on Linux/ARM.

**VERSION 1**
Basically, RISC-based ARM architecture requires aligned access with 4byte reads.
In order to support aggressive optimization levels such as O2/O3, let's use
attribute keyword of aligned(1) instead of using memcpy(2) in into
a properly aligned buffer or the packing attribute.

**BACKGROUND**
According to ARM information center(infocenter.arm.com), By default,
the ARM compiler expects normal C and C++ pointers to point
to an aligned word in memory. A type qualifier __packed is provided to
enable unaligned pointer access. If you want to define a pointer to a word
that can be at any address (that is, that can be at a non-natural alignment),
you must specify this using the __packed qualifier when defining the pointer:
__packed int *pi; // pointer to unaligned int

However, clang/llvm does not support the __packed qualifier such as
__attribute__((packed)) or __attribute__((packed, aligned(4)))

In -O0 (debug build) the innermost block is emitted as the following assembly,
which works properly:
 ldr r1, [r0, dotnet/coreclr#24]
 ldr r2, [r0, dotnet/coreclr#20]

In -O3 (release build) however the compiler realizes these fields are adjacent
and generates this assembly:
 ldrdeq  r2, r3, [r0, dotnet/coreclr#20]
Unfortunately, vldr/ldrdb instructions always generate an alignment fault
(in practice). It seems that clang uses ldrb instruction although GCC uses
ldr instruction because armv7 supports unaligned ldr instruction.

Note: If some arm architectures (e.g., Linux/ARM Emulator) does not support
unaligned ldr, this issue is not generated with aggressive optimization
levels (e.g., -O2 and -O3).

* Case study: How does the ARM Compiler support unaligned accesses?
  http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html

* Case study: Indicating unaligned access to Clang for ARM compatibility
  http://stackoverflow.com/questions/9185811/indicating-unaligned-access-to-clang-for-arm-compatibility

* Case study: Chromium source for UnalignedLoad32() on ARM
  https://github.com/nwjs/chromium.src/blob/nw15/third_party/cld/base/basictypes.h#L302

Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>

Commit migrated from dotnet/coreclr@561b64d
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants