Description
The per function TBAA information created for the Fortran dummy arguments may become invalid after LLVM inlining. I am not sure if we have existing issue for this.
I started from the following Fortran source:
subroutine test(x, y, i)
real, target :: x(4), y(4)
integer, value :: i
call inner(x, y)
call inner(y, x(i:i))
contains
subroutine inner(x, y)
real :: x(:), y(:)
x(1:4) = y(1:4)
end subroutine inner
end subroutine test
program main
interface
subroutine test(x, y, i)
real, target :: x(4), y(4)
integer, value :: i
end subroutine test
end interface
real :: x(4), y(4)
x = 1.0
y = 2.0
call test(x, y, 1)
print *, x
print *, y
end program main
repro1.ll.gz - LLVM IR on entry to LLVM produced by flang-new -Ofast alias.f90 -c -march=skylake
with an added noinline
attribute for test_
.
repro2.ll.gz - modified LLVM IR to demonstrate the problem.
After inlining and other optimization passes applied to repro1.ll
the LLVM IR for test_
looks like this:
; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite)
define void @test_(ptr nocapture %0, ptr nocapture %1, i32 %2) local_unnamed_addr #0 {
%4 = load <4 x float>, ptr %1, align 4, !tbaa !3
store <4 x float> %4, ptr %0, align 4, !tbaa !9
%5 = sext i32 %2 to i64
%6 = add nsw i64 %5, -1
%7 = getelementptr [4 x float], ptr %0, i64 0, i64 %6
%8 = load <4 x float>, ptr %7, align 4, !tbaa !3
store <4 x float> %8, ptr %1, align 4, !tbaa !9
ret void
}
!3 = !{!4, !4, i64 0}
!4 = !{!"dummy arg data/_QFtestFinnerEy", !5, i64 0}
!5 = !{!"dummy arg data", !6, i64 0}
!6 = !{!"any data access", !7, i64 0}
!7 = !{!"any access", !8, i64 0}
!8 = !{!"Flang function root _QFtestPinner"}
!9 = !{!10, !10, i64 0}
!10 = !{!"dummy arg data/_QFtestFinnerEx", !5, i64 0}
TBAA tags !3
and !9
indicate no aliasing for the instructions they are attached to. This is obviously incorrect for store %4
and %8
. I was not able to write a test that would force LLVM to incorrectly reorder the store and the load, so did it manually in repro2.ll
. The incorrect behavior may be seen with:
flang-new -O0 repro2.ll; ./a.out
:
2. 2. 2. 2.
1. 1. 1. 1.
Expected result (flang-new -O0 repro1.ll; ./a.out
):
2. 2. 2. 2.
2. 2. 2. 2.
@tblah, FYI