Skip to content

Conversation

@asl
Copy link
Contributor

@asl asl commented Nov 9, 2022

The PR reapplies #60520 along with correctness fix – we only do the transformation when it is safe (the inner values have single use)

Fixes #61731

@asl asl requested review from dan-zheng, meg-gupta and rxwei November 9, 2022 19:23
@dan-zheng
Copy link
Contributor

Thanks! Could you please highlight (attach a comment to) where the correctness fix is?

@asl
Copy link
Contributor Author

asl commented Nov 9, 2022

@dan-zheng Sure! It's in the separate commit (along with testcases): 63e0083

@asl
Copy link
Contributor Author

asl commented Nov 9, 2022

@swift-ci please test

@asl
Copy link
Contributor Author

asl commented Nov 9, 2022

Tagging @BradLarson

Copy link
Contributor

@dan-zheng dan-zheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dan-zheng Sure! It's in the separate commit (along with testcases): 63e0083

Thanks for the pointer! Thanks too for adding tests.


LGTM, though it would be good to get approval from current codebase maintainers. Hope there won't be more breakages.

@asl
Copy link
Contributor Author

asl commented Nov 9, 2022

@compnerd @meg-gupta Do CI tests on windows platform enough to ensure the issue is fixed? Or we'd need to do something additionally?

@asl
Copy link
Contributor Author

asl commented Nov 9, 2022

@dan-zheng Apparently it inhibits desired peephole optimizations (AutoDiff/e2e_optimizations.swift) on Windows. checking...

@asl asl changed the title Reapply "Implement several peephole optimizations to unblock further optimizations of autodiff code" with correctness fix [WIP] Reapply "Implement several peephole optimizations to unblock further optimizations of autodiff code" with correctness fix Nov 9, 2022
@compnerd
Copy link
Member

@asl, no it does not. You need to additionally build the full toolchain.

@asl
Copy link
Contributor Author

asl commented Nov 10, 2022

@swift-ci please test Linux

@asl
Copy link
Contributor Author

asl commented Nov 10, 2022

@swift-ci Please Build Toolchain Windows Platform

@asl
Copy link
Contributor Author

asl commented Nov 10, 2022

@compnerd Looks like different code is generated for AutoDiff/e2e_optimizations.swift on Windows (as compared to Linux / MacOS), this is a bit worrisome to me. Is there a way to obtain SIL dump somehow somewhere?

@compnerd
Copy link
Member

Unfortunately the toolchain is not available to download from the CI (@shahmishal should be able to help with that). You would have to build the toolchain locally to get that. But I do agree that the difference is something that needs to be investigated before merging.

@asl
Copy link
Contributor Author

asl commented Nov 10, 2022

@compnerd Can I build windows toolchain on Mac M1? I thought that swift and cross-compilation is not something very well compatible.

@compnerd
Copy link
Member

@asl there is no official support (particularly because there is lack of clarity on some of the legalities of doing so) for cross-compiling from macOS to Windows, but it should be possible to accomplish.

@asl
Copy link
Contributor Author

asl commented Nov 10, 2022

@compnerd Ok. My biggest issue so far is that I need to find windows machine somewhere just to run the installer and get the pre-built SDK. Or maybe there is tarball / zip / whatever that is possible to unpack easily available somewhere for download?

@compnerd
Copy link
Member

You can extract the contents of the installer using dark from the WiX toolset. That should run fine under wine I imagine. For a quick one-off test, you could likely use the Windows developer VM image from Microsoft (https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/).

@asl
Copy link
Contributor Author

asl commented Nov 10, 2022

No wine on M1 :) as it's an aarch64 system. So the only possibility is essentially windows / arm :(

@shahmishal
Copy link
Member

Unfortunately the toolchain is not available to download from the CI (@shahmishal should be able to help with that). You would have to build the toolchain locally to get that. But I do agree that the difference is something that needs to be investigated before merging.

Installer is available if the build successfully passes.

@shahmishal
Copy link
Member

@asl
Copy link
Contributor Author

asl commented Nov 10, 2022

@compnerd Is windows build / compiler reliable? And results are reproducible? I took the SDK / swift stdlib from the @shahmishal link listed above and tried to cross-compile (from Mac to x86_64-unknown-windows-msvc). I got the same optimized code as on Mac / Linux and test certainly passes. This differs from the FileCheck error dump on CI.

@compnerd
Copy link
Member

@asl yeah, it is definitely reliable and reproducible. It could be some subtle UB in the compiler itself that is giving different results on Windows compared to Linux/macOS. The Windows compiler is built with cl so it does optimize things differently from clang.

@asl
Copy link
Contributor Author

asl commented Nov 10, 2022

@compnerd Will it be possible for you to compile test/AutoDiff/e2e_optimizations.swift without optimizations with, say, Swift 5.7.1 on Windows and attach the SIL here?

@compnerd
Copy link
Member

@asl I might be able to do it with a recentish main snapshot, not 5.7.1. I would highly recommend that you build the toolchain if you need to get more details out of this. Can you please provide a command line invocation for the compilation?

@asl
Copy link
Contributor Author

asl commented Nov 10, 2022

I do not have a reasonable way to build a toolchain on Windows, ARM world is quite incompatible with x86 one. I'd not ask if I would have such possibility. Maybe we'd simply mark the test as XFAIL on windows then. I checked the generated code in the Mac / Linux vs cross-compilation on Windows and the only differences were some array-related functions.

Can you please provide a command line invocation for the compilation?

Just swiftc e2e_optimizations.swift -emit-sil would be enough.

Thanks!

@compnerd
Copy link
Member

compnerd commented Nov 11, 2022

I don't think that just marking it as XFAIL on windows is appropriate. You can run the Windows VM on ARM.

e2e_optimizations.sil

@asl
Copy link
Contributor Author

asl commented Nov 11, 2022

@compnerd Thanks! The only difference wrt cross-compiled code is the order of users annotation in couple of places. Investigating further.

@asl
Copy link
Contributor Author

asl commented Nov 13, 2022

After lots of plumbing I was able to build a windows toolchain. And the results are worrysome.

Result of swiftc -O, cross compilation on Mac:

sil hidden @test_gradient_float : $@convention(thin) () -> () {
[global: ]
bb0:
  %0 = float_literal $Builtin.FPIEEE32, 0x41200000 // 10 // users: %1, %21
  debug_value %0 : $Builtin.FPIEEE32, let, name "x0", argno 1, type $Float, expr op_fragment:#Float._value // id: %1
  %2 = float_literal $Builtin.FPIEEE32, 0x43480000 // 200 // users: %3, %9
  debug_value %2 : $Builtin.FPIEEE32, let, name "x2", type $Float, expr op_fragment:#Float._value // id: %3
  %4 = float_literal $Builtin.FPIEEE32, 0x42C80000 // 100 // users: %5, %10
  debug_value %4 : $Builtin.FPIEEE32, let, name "x3", type $Float, expr op_fragment:#Float._value // id: %5
  %6 = integer_literal $Builtin.Int64, 1          // user: %7
  %7 = builtin "sitofp_Int64_FPIEEE32"(%6 : $Builtin.Int64) : $Builtin.FPIEEE32 // users: %8, %9, %13
  debug_value %7 : $Builtin.FPIEEE32, let, name "x4", type $Float, expr op_fragment:#Float._value // id: %8
  %9 = builtin "fdiv_FPIEEE32"(%7 : $Builtin.FPIEEE32, %2 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // users: %14, %16, %15
  %10 = builtin "fneg_FPIEEE32"(%4 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %12
  %11 = float_literal $Builtin.FPIEEE32, 0x471C4000 // 4.0E+4 // user: %12
  %12 = builtin "fdiv_FPIEEE32"(%10 : $Builtin.FPIEEE32, %11 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %13
  %13 = builtin "fmul_FPIEEE32"(%12 : $Builtin.FPIEEE32, %7 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %16
  debug_value %9 : $Builtin.FPIEEE32, let, name "x3", type $Float, expr op_fragment:#Float._value // id: %14
  %15 = builtin "fneg_FPIEEE32"(%9 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %18
  %16 = builtin "fadd_FPIEEE32"(%9 : $Builtin.FPIEEE32, %13 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // users: %17, %18, %19
  debug_value %16 : $Builtin.FPIEEE32, let, name "x2", type $Float, expr op_fragment:#Float._value // id: %17
  %18 = builtin "fadd_FPIEEE32"(%16 : $Builtin.FPIEEE32, %15 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %19
  %19 = builtin "fadd_FPIEEE32"(%16 : $Builtin.FPIEEE32, %18 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // users: %20, %21
  debug_value %19 : $Builtin.FPIEEE32, let, name "x1", type $Float, expr op_fragment:#Float._value // id: %20
  %21 = builtin "fmul_FPIEEE32"(%0 : $Builtin.FPIEEE32, %19 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // users: %22, %22
  %22 = builtin "fadd_FPIEEE32"(%21 : $Builtin.FPIEEE32, %21 : $Builtin.FPIEEE32) : $Builtin.FPIEEE32 // user: %23
  %23 = struct $Float (%22 : $Builtin.FPIEEE32)   // users: %26, %24
  debug_value %23 : $Float, let, name "x0", argno 1 // id: %24
  // function_ref specialized blackHole
  %25 = function_ref @$s9blackHoleSf_Tg5 : $@convention(thin) (Float) -> Float // user: %26
  %26 = apply %25(%23) : $@convention(thin) (Float) -> Float
  %27 = tuple ()                                  // user: %28
  return %27 : $()                                // id: %28
} // end sil function 'test_gradient_float'

Result of swiftc -O natively on Windows:

sil hidden @test_gradient_float : $@convention(thin) () -> () {
bb0:
  %0 = float_literal $Builtin.FPIEEE32, 0x41200000 // 10 // user: %11
  // function_ref float(_:)
  %1 = function_ref @$s4fooO5floatyS2fF : $@convention(thin) (Float) -> Float // user: %2
  %2 = thin_to_thick_function %1 : $@convention(thin) (Float) -> Float to $@callee_guaranteed (Float) -> Float // users: %34, %24
  // function_ref forward-mode derivative of float(_:)
  %3 = function_ref @$s4fooO5floatyS2fFTJfSpSr : $@convention(thin) (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) // user: %4
  %4 = thin_to_thick_function %3 : $@convention(thin) (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) to $@callee_guaranteed (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) // users: %35, %25
  // function_ref reverse-mode derivative of float(_:)
  %5 = function_ref @$s4fooO5floatyS2fFTJrSpSr : $@convention(thin) (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) // user: %6
  %6 = thin_to_thick_function %5 : $@convention(thin) (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) to $@callee_guaranteed (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) // users: %36, %26
  // function_ref specialized thunk for @callee_guaranteed (@unowned Float) -> (@unowned Float, @owned @escaping @callee_guaranteed (@unowned Float) -> (@unowned Float))
  %7 = function_ref @$sS4fIegyd_Igydo_S2fxq_r0_lyS2fIsegnr_Iegnro_TR25$s4fooO5floatyS2fFTJrSpSrTf3nnpf_n : $@convention(thin) (@in_guaranteed Float) -> (@out Float, @owned @callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <Float, Float>) // user: %8
  %8 = thin_to_thick_function %7 : $@convention(thin) (@in_guaranteed Float) -> (@out Float, @owned @callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <Float, Float>) to $@callee_guaranteed (@in_guaranteed Float) -> (@out Float, @owned @callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <Float, Float>) // users: %37, %9
  %9 = convert_function %8 : $@callee_guaranteed (@in_guaranteed Float) -> (@out Float, @owned @callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <Float, Float>) to $@callee_guaranteed @substituted <τ_0_0, τ_0_1, τ_0_2, τ_0_3> (@in_guaranteed τ_0_0) -> (@out τ_0_1, @owned @callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <τ_0_2, τ_0_3>) for <Float, Float, Float, Float> // user: %10
  %10 = convert_escape_to_noescape %9 : $@callee_guaranteed @substituted <τ_0_0, τ_0_1, τ_0_2, τ_0_3> (@in_guaranteed τ_0_0) -> (@out τ_0_1, @owned @callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <τ_0_2, τ_0_3>) for <Float, Float, Float, Float> to $@noescape @callee_guaranteed @substituted <τ_0_0, τ_0_1, τ_0_2, τ_0_3> (@in_guaranteed τ_0_0) -> (@out τ_0_1, @owned @callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <τ_0_2, τ_0_3>) for <Float, Float, Float, Float> // user: %16
  %11 = struct $Float (%0 : $Builtin.FPIEEE32)    // user: %14
  %12 = alloc_stack $Float                        // users: %31, %30, %27
  %13 = alloc_stack $Float                        // users: %18, %16, %14
  store %11 to %13 : $*Float                      // id: %14
  %15 = alloc_stack $Float                        // users: %17, %16
  %16 = apply %10(%15, %13) : $@noescape @callee_guaranteed @substituted <τ_0_0, τ_0_1, τ_0_2, τ_0_3> (@in_guaranteed τ_0_0) -> (@out τ_0_1, @owned @callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <τ_0_2, τ_0_3>) for <Float, Float, Float, Float> // users: %28, %27
  dealloc_stack %15 : $*Float                     // id: %17
  dealloc_stack %13 : $*Float                     // id: %18
  %19 = alloc_stack $Float                        // users: %29, %27, %23
  %20 = integer_literal $Builtin.Int64, 1         // user: %21
  %21 = builtin "sitofp_Int64_FPIEEE32"(%20 : $Builtin.Int64) : $Builtin.FPIEEE32 // user: %22
  %22 = struct $Float (%21 : $Builtin.FPIEEE32)   // user: %23
  store %22 to %19 : $*Float                      // id: %23
  strong_retain %2 : $@callee_guaranteed (Float) -> Float // id: %24
  strong_retain %4 : $@callee_guaranteed (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) // id: %25
  strong_retain %6 : $@callee_guaranteed (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) // id: %26
  %27 = apply %16(%12, %19) : $@callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <Float, Float>
  strong_release %16 : $@callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <Float, Float> // id: %28
  dealloc_stack %19 : $*Float                     // id: %29
  %30 = load %12 : $*Float                        // user: %33
  dealloc_stack %12 : $*Float                     // id: %31
  // function_ref specialized blackHole
  %32 = function_ref @$s9blackHoleSf_Tg5 : $@convention(thin) (Float) -> Float // user: %33
  %33 = apply %32(%30) : $@convention(thin) (Float) -> Float
  strong_release %2 : $@callee_guaranteed (Float) -> Float // id: %34
  strong_release %4 : $@callee_guaranteed (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) // id: %35
  strong_release %6 : $@callee_guaranteed (Float) -> (Float, @owned @callee_guaranteed (Float) -> Float) // id: %36
  strong_release %8 : $@callee_guaranteed (@in_guaranteed Float) -> (@out Float, @owned @callee_guaranteed @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <Float, Float>) // id: %37
  %38 = tuple ()                                  // user: %39
  return %38 : $()                                // id: %39
} // end sil function 'test_gradient_float'

I will probably open a separate issue to track this instability as it seems to be unrelated to this particular case

@asl
Copy link
Contributor Author

asl commented Nov 14, 2022

The root cause was identified and it is not related to this PR at all. See #62085 (comment) for complete explanation

@asl
Copy link
Contributor Author

asl commented Nov 14, 2022

@swift-ci please test

@asl
Copy link
Contributor Author

asl commented Nov 14, 2022

@swift-ci Please Build Toolchain Windows Platform

@asl asl changed the title [WIP] Reapply "Implement several peephole optimizations to unblock further optimizations of autodiff code" with correctness fix Reapply "Implement several peephole optimizations to unblock further optimizations of autodiff code" with correctness fix Nov 14, 2022
@asl
Copy link
Contributor Author

asl commented Nov 14, 2022

@compnerd As far as I can see, windows toolchain has some failed tests that were failing before this PR. Are we good to go?

@asl asl requested a review from compnerd November 14, 2022 18:23
@asl
Copy link
Contributor Author

asl commented Nov 16, 2022

@compnerd Will you please confirm that fails on windows toolchain are unrelated to this PR and it could be merged?

Thanks!

@compnerd
Copy link
Member

I don't think that it is unrelated ...

‘build/artifacts/*’ doesn’t match anything, but ‘*’ does. Perhaps that’s what you mean?
ERROR: Step ‘Archive the artifacts’ failed: No artifacts found that match the file pattern "build/artifacts/*". Configuration error?

Sounds like something failed to build and thus the packaging step failed.

@asl
Copy link
Contributor Author

asl commented Nov 16, 2022

@compnerd I'm seeing in the log:

The following tests FAILED:
	606 - TestFoundation.TestFileManager-test_homedirectoryForUser (ILLEGAL)
	611 - TestFoundation.TestFileManager-test_emptyFilename (Exit code 0xc0000409
)
	1281 - TestFoundation.TestNSString-test_NSHomeDirectoryForUser (Failed)
	1283 - TestFoundation.TestNSString-test_expandingTildeInPath (Failed)

As far as I can see, same fails in other PRs that are building windows toolchain, e.g.: #61768 or #61665

@compnerd
Copy link
Member

I think that we need to fix the rest of the build to be sure :/

I can try to run the Foundation test suite locally later to verify that those are actually failing without this change.

@asl
Copy link
Contributor Author

asl commented Nov 16, 2022

@compnerd Thanks!

@asl
Copy link
Contributor Author

asl commented Nov 16, 2022

@compnerd Here is the recent "clear" PR (w/o this patch) with the same set of test failures: #62101

asl and others added 5 commits November 16, 2022 20:10
…ions of autodiff code

1. Simplify differentiable_function_extract of differentiable_function.
Before:
%x = differentiable_function(%orig, %jvp, %vjp)
%y = differentiable_function_extract [original] %x
After:
%y = %orig

2. Push conversion instructions inside of differentiable_function.
This unblocks inlining and specialization.
Before:
%x = differentiable_function(%orig, %jvp, %vjp)
%y = convert_escape_to_noescape %x
After:
%orig' = convert_escape_to_noescape %orig
%jvp' = convert_escape_to_noescape %jvp
%vjp' = convert_escape_to_noescape %vjp
%y = differentiable_function(%orig', %jvp', %vjp')

3. Another peephole is needed for reordering function conversion instructions to enable full inlining:
(convert_escape_to_noescape (convert_function (thin_to_thick_function x)))
=>
(convert_escape_to_noescape (thin_to_thick_function (convert_function x)))

Co-authored-by: Dan Zheng <danielzheng@google.com>
…oken - It is certainly not something mandatory
@asl asl force-pushed the diff-folding-fix branch from 0835561 to 4e7d70e Compare November 16, 2022 19:10
@asl
Copy link
Contributor Author

asl commented Nov 16, 2022

@swift-ci please test

@asl
Copy link
Contributor Author

asl commented Nov 16, 2022

@swift-ci Please Build Toolchain Windows Platform

@asl
Copy link
Contributor Author

asl commented Nov 17, 2022

@swift-ci Please test Windows platform

@asl asl merged commit a5e8381 into main Nov 17, 2022
@asl asl deleted the diff-folding-fix branch November 17, 2022 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid SIL with recent SILCombine changes

5 participants