Power Manager fixes from savvas1#16
Merged
sakindia123 merged 13 commits intosakindia123:masterfrom Jan 22, 2013
Merged
Conversation
added 13 commits
January 14, 2013 15:21
sakindia123
added a commit
that referenced
this pull request
Jan 22, 2013
Power Manager fixes from savvas1
sakindia123
pushed a commit
that referenced
this pull request
Oct 19, 2013
…optimizations
Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.
For instance in the following function:
void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
waiter->magic = waiter;
INIT_LIST_HEAD(&waiter->list);
}
compiled as:
800554d0 <debug_mutex_lock_common>:
800554d0: e92d4008 push {r3, lr}
800554d4: e1a00001 mov r0, r1
800554d8: e3a02010 mov r2, #16 ; 0x10
800554dc: e3a01011 mov r1, #17 ; 0x11
800554e0: eb04426e bl 80165ea0 <memset>
800554e4: e1a03000 mov r3, r0
800554e8: e583000c str r0, [r3, #12]
800554ec: e5830000 str r0, [r3]
800554f0: e5830004 str r0, [r3, #4]
800554f4: e8bd8008 pop {r3, pc}
GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.
This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:
Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).
Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
save r8:
- str lr, [sp, #-4]!
+ stmfd sp!, {r8, lr}
and restore r8 on both exit paths:
- ldmeqfd sp!, {pc} @ Now <64 bytes to go.
+ ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go.
(...)
tst r2, #16
stmneia ip!, {r1, r3, r8, lr}
- ldr lr, [sp], #4
+ ldmfd sp!, {r8, lr}
Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:
save r8:
- stmfd sp!, {r4-r7, lr}
+ stmfd sp!, {r4-r8, lr}
and restore r8 on both exit paths:
bgt 3b
- ldmeqfd sp!, {r4-r7, pc}
+ ldmeqfd sp!, {r4-r8, pc}
(...)
tst r2, #16
stmneia ip!, {r4-r7}
- ldmfd sp!, {r4-r7, lr}
+ ldmfd sp!, {r4-r8, lr}
Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".
Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
sachinthomaspj
pushed a commit
to sachinthomaspj/android_kernel_htc_pico
that referenced
this pull request
Sep 30, 2014
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.
As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.
Thanks to Nicolas Pitre for the excellent analysis:
Test case:
int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }
With the current ARM version:
foo:
ldrb r3, [r0, #2] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
ldrb r1, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
ldrb r2, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
mov r3, r3, asl sakindia123#16 @ tmp154, MEM[(const u8 *)x_1(D) + 2B],
ldrb r0, [r0, #3] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
orr r3, r3, r1, asl sakindia123#8 @, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
orr r3, r3, r2 @ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
orr r0, r3, r0, asl #24 @,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
bx lr @
bar:
stmfd sp!, {r4, r5, r6, r7} @,
mov r2, #0 @ tmp184,
ldrb r5, [r0, #6] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
ldrb r4, [r0, #5] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
ldrb ip, [r0, #2] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
ldrb r1, [r0, #4] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
mov r5, r5, asl sakindia123#16 @ tmp175, MEM[(const u8 *)x_1(D) + 6B],
ldrb r7, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
orr r5, r5, r4, asl sakindia123#8 @, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
ldrb r6, [r0, sakindia123#7] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
orr r5, r5, r1 @ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
ldrb r4, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
mov ip, ip, asl sakindia123#16 @ tmp188, MEM[(const u8 *)x_1(D) + 2B],
ldrb r1, [r0, #3] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
orr ip, ip, r7, asl sakindia123#8 @, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
orr r3, r5, r6, asl #24 @,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
orr ip, ip, r4 @ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
orr ip, ip, r1, asl #24 @, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
mov r1, r3 @,
orr r0, r2, ip @ tmp171, tmp184, tmp194
ldmfd sp!, {r4, r5, r6, r7}
bx lr
In both cases the code is slightly suboptimal. One may wonder why
wasting r2 with the constant 0 in the second case for example. And all
the mov's could be folded in subsequent orr's, etc.
Now with the asm-generic version:
foo:
ldr r0, [r0, #0] @ unaligned @,* x
bx lr @
bar:
mov r3, r0 @ x, x
ldr r0, [r0, #0] @ unaligned @,* x
ldr r1, [r3, #4] @ unaligned @,
bx lr @
This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access. This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:
long long bar (long long *x) {return *x; }
then the resulting code is:
bar:
ldmia r0, {r0, r1} @ x,,
bx lr @
So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.
Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation. Let's see with an
ARMv5 target:
foo:
ldrb r3, [r0, #0] @ zero_extendqisi2 @ tmp139,* x
ldrb r1, [r0, #1] @ zero_extendqisi2 @ tmp140,
ldrb r2, [r0, #2] @ zero_extendqisi2 @ tmp143,
ldrb r0, [r0, #3] @ zero_extendqisi2 @ tmp146,
orr r3, r3, r1, asl sakindia123#8 @, tmp142, tmp139, tmp140,
orr r3, r3, r2, asl sakindia123#16 @, tmp145, tmp142, tmp143,
orr r0, r3, r0, asl #24 @,, tmp145, tmp146,
bx lr @
bar:
stmfd sp!, {r4, r5, r6, r7} @,
ldrb r2, [r0, #0] @ zero_extendqisi2 @ tmp139,* x
ldrb r7, [r0, #1] @ zero_extendqisi2 @ tmp140,
ldrb r3, [r0, #4] @ zero_extendqisi2 @ tmp149,
ldrb r6, [r0, #5] @ zero_extendqisi2 @ tmp150,
ldrb r5, [r0, #2] @ zero_extendqisi2 @ tmp143,
ldrb r4, [r0, #6] @ zero_extendqisi2 @ tmp153,
ldrb r1, [r0, sakindia123#7] @ zero_extendqisi2 @ tmp156,
ldrb ip, [r0, #3] @ zero_extendqisi2 @ tmp146,
orr r2, r2, r7, asl sakindia123#8 @, tmp142, tmp139, tmp140,
orr r3, r3, r6, asl sakindia123#8 @, tmp152, tmp149, tmp150,
orr r2, r2, r5, asl sakindia123#16 @, tmp145, tmp142, tmp143,
orr r3, r3, r4, asl sakindia123#16 @, tmp155, tmp152, tmp153,
orr r0, r2, ip, asl #24 @,, tmp145, tmp146,
orr r1, r3, r1, asl #24 @,, tmp155, tmp156,
ldmfd sp!, {r4, r5, r6, r7}
bx lr
Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[steven@steven676.net: backport to 3.0: don't depend on asm-generic
wrapper support in Kbuild]
Change-Id: I37f8db38bfa2cd8680a8bec0cb4da8ec39c04861
Gamersab
pushed a commit
to Gamersab/android_kernel_htc_pico-old
that referenced
this pull request
Oct 7, 2014
This moves ARM over to the asm-generic/unaligned.h header. This has the
benefit of better code generated especially for ARMv7 on gcc 4.7+
compilers.
As Arnd Bergmann, points out: The asm-generic version uses the "struct"
version for native-endian unaligned access and the "byteshift" version
for the opposite endianess. The current ARM version however uses the
"byteshift" implementation for both.
Thanks to Nicolas Pitre for the excellent analysis:
Test case:
int foo (int *x) { return get_unaligned(x); }
long long bar (long long *x) { return get_unaligned(x); }
With the current ARM version:
foo:
ldrb r3, [r0, sachinthomaspj#2] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
ldrb r1, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
ldrb r2, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
mov r3, r3, asl sakindia123#16 @ tmp154, MEM[(const u8 *)x_1(D) + 2B],
ldrb r0, [r0, sachinthomaspj#3] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
orr r3, r3, r1, asl sakindia123#8 @, tmp155, tmp154, MEM[(const u8 *)x_1(D) + 1B],
orr r3, r3, r2 @ tmp157, tmp155, MEM[(const u8 *)x_1(D)]
orr r0, r3, r0, asl #24 @,, tmp157, MEM[(const u8 *)x_1(D) + 3B],
bx lr @
bar:
stmfd sp!, {r4, r5, r6, r7} @,
mov r2, #0 @ tmp184,
ldrb r5, [r0, sachinthomaspj#6] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 6B], MEM[(const u8 *)x_1(D) + 6B]
ldrb r4, [r0, sachinthomaspj#5] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 5B], MEM[(const u8 *)x_1(D) + 5B]
ldrb ip, [r0, sachinthomaspj#2] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 2B], MEM[(const u8 *)x_1(D) + 2B]
ldrb r1, [r0, sachinthomaspj#4] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 4B], MEM[(const u8 *)x_1(D) + 4B]
mov r5, r5, asl sakindia123#16 @ tmp175, MEM[(const u8 *)x_1(D) + 6B],
ldrb r7, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 1B], MEM[(const u8 *)x_1(D) + 1B]
orr r5, r5, r4, asl sakindia123#8 @, tmp176, tmp175, MEM[(const u8 *)x_1(D) + 5B],
ldrb r6, [r0, sakindia123#7] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 7B], MEM[(const u8 *)x_1(D) + 7B]
orr r5, r5, r1 @ tmp178, tmp176, MEM[(const u8 *)x_1(D) + 4B]
ldrb r4, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D)], MEM[(const u8 *)x_1(D)]
mov ip, ip, asl sakindia123#16 @ tmp188, MEM[(const u8 *)x_1(D) + 2B],
ldrb r1, [r0, sachinthomaspj#3] @ zero_extendqisi2 @ MEM[(const u8 *)x_1(D) + 3B], MEM[(const u8 *)x_1(D) + 3B]
orr ip, ip, r7, asl sakindia123#8 @, tmp189, tmp188, MEM[(const u8 *)x_1(D) + 1B],
orr r3, r5, r6, asl #24 @,, tmp178, MEM[(const u8 *)x_1(D) + 7B],
orr ip, ip, r4 @ tmp191, tmp189, MEM[(const u8 *)x_1(D)]
orr ip, ip, r1, asl #24 @, tmp194, tmp191, MEM[(const u8 *)x_1(D) + 3B],
mov r1, r3 @,
orr r0, r2, ip @ tmp171, tmp184, tmp194
ldmfd sp!, {r4, r5, r6, r7}
bx lr
In both cases the code is slightly suboptimal. One may wonder why
wasting r2 with the constant 0 in the second case for example. And all
the mov's could be folded in subsequent orr's, etc.
Now with the asm-generic version:
foo:
ldr r0, [r0, #0] @ unaligned @,* x
bx lr @
bar:
mov r3, r0 @ x, x
ldr r0, [r0, #0] @ unaligned @,* x
ldr r1, [r3, sachinthomaspj#4] @ unaligned @,
bx lr @
This is way better of course, but only because this was compiled for
ARMv7. In this case the compiler knows that the hardware can do
unaligned word access. This isn't that obvious for foo(), but if we
remove the get_unaligned() from bar as follows:
long long bar (long long *x) {return *x; }
then the resulting code is:
bar:
ldmia r0, {r0, r1} @ x,,
bx lr @
So this proves that the presumed aligned vs unaligned cases does have
influence on the instructions the compiler may use and that the above
unaligned code results are not just an accident.
Still... this isn't fully conclusive without at least looking at the
resulting assembly fron a pre ARMv6 compilation. Let's see with an
ARMv5 target:
foo:
ldrb r3, [r0, #0] @ zero_extendqisi2 @ tmp139,* x
ldrb r1, [r0, #1] @ zero_extendqisi2 @ tmp140,
ldrb r2, [r0, sachinthomaspj#2] @ zero_extendqisi2 @ tmp143,
ldrb r0, [r0, sachinthomaspj#3] @ zero_extendqisi2 @ tmp146,
orr r3, r3, r1, asl sakindia123#8 @, tmp142, tmp139, tmp140,
orr r3, r3, r2, asl sakindia123#16 @, tmp145, tmp142, tmp143,
orr r0, r3, r0, asl #24 @,, tmp145, tmp146,
bx lr @
bar:
stmfd sp!, {r4, r5, r6, r7} @,
ldrb r2, [r0, #0] @ zero_extendqisi2 @ tmp139,* x
ldrb r7, [r0, #1] @ zero_extendqisi2 @ tmp140,
ldrb r3, [r0, sachinthomaspj#4] @ zero_extendqisi2 @ tmp149,
ldrb r6, [r0, sachinthomaspj#5] @ zero_extendqisi2 @ tmp150,
ldrb r5, [r0, sachinthomaspj#2] @ zero_extendqisi2 @ tmp143,
ldrb r4, [r0, sachinthomaspj#6] @ zero_extendqisi2 @ tmp153,
ldrb r1, [r0, sakindia123#7] @ zero_extendqisi2 @ tmp156,
ldrb ip, [r0, sachinthomaspj#3] @ zero_extendqisi2 @ tmp146,
orr r2, r2, r7, asl sakindia123#8 @, tmp142, tmp139, tmp140,
orr r3, r3, r6, asl sakindia123#8 @, tmp152, tmp149, tmp150,
orr r2, r2, r5, asl sakindia123#16 @, tmp145, tmp142, tmp143,
orr r3, r3, r4, asl sakindia123#16 @, tmp155, tmp152, tmp153,
orr r0, r2, ip, asl #24 @,, tmp145, tmp146,
orr r1, r3, r1, asl #24 @,, tmp155, tmp156,
ldmfd sp!, {r4, r5, r6, r7}
bx lr
Compared to the initial results, this is really nicely optimized and I
couldn't do much better if I were to hand code it myself.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[steven@steven676.net: backport to 3.0: don't depend on asm-generic
wrapper support in Kbuild]
Change-Id: I37f8db38bfa2cd8680a8bec0cb4da8ec39c04861
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.