Compiling on release mode #1189
Description
Due to the errors with linking S4TF to an arbitrary Swift executable (#1185 (comment)), I am currently very constrained with how I can test code that imports S4TF. For now, my only option is to replace the Swift package tests with custom code I want to execute. Having to re-build S4TF repeatedly presents a bottleneck to my workflow.
I profiled S4TF build times on Google Colab (dual-core x64), and found out some interesting results. When running swift test
, it always re-compiles your code, even if you compiled it previously via swift build
. There is only one exception - when both swift build
and swift test
are in debug mode, it avoids redundantly re-compiling. This speedup does not apply when both are -Onone
release, the option that compiles most quickly otherwise.
- Pre-build as release (
-Onone
) (excluding tests): 1 min 51 sec- Build tests as release (
-Onone
): 2 min 29 sec (everything)- Extrapolated time if excluding tests: 1 min 50 sec
- Build tests as debug: 3 min 50 sec
- Extrapolated time if excluding tests: 3 min 0 sec
- Build tests as release (
- Pre-build as debug (excluding tests): 3 min 0 sec
- Build tests as release (
-Onone
): 2 min 48 sec (everything)- Extrapolated time if excluding tests: 2 min 7 sec
- Build tests as debug: 57 sec
- Extrapolated time if excluding tests: 0 sec
- Build tests as release (
If I can find a way to import S4TF outside of its tests, compiling with unoptimized release seems to be the wisest option. That would take around 2 minutes. I could add a special command to Swift-Colab that caches the Swift package build products folder. When you restart the runtime (I do that often), it would link against the build products instead of re-compiling. It would also cache the x10 binaries so you only download them from the network once. This Colab command would be implemented once there is a Swift toolchain that both runs S4TF and has the Python LLDB API.
I previously heard that there were some performance concerns with not compiling S4TF with full optimization. There are tight loops where using debug mode could cause a bottleneck, but where do these loops happen? If they are in CTensorFlow, then it doesn't matter how S4TF is compiled because CTensorFlow is pre-compiled in the x10 binary.
When I tried compiling S4TF in fully optimized release mode, I got the compiler crash caused by BatchNorm, which is currently unsolved. The crash logs are in the Colab notebooks attached below. This crash did not happen in release when the -Onone
flag was set - does that behavior reveal anything new about the bug?
crash_no_tests.ipynb.zip
crash_with_tests.ipynb.zip
I am compiling using the 2021-11-12 toolchain instead of the newest toolchain (2022-01-06). Newer toolchains (starting with 2021-12-23 or earlier) introduce a bug that prevents S4TF from compiling even in debug mode (#1184 (comment)).