-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
cfunction: reimplement, as originally planned, for reliable performance #57226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
524b478
to
07dce1e
Compare
I will read this in more detail soon but looks nice 👍 |
1be6ee4
to
65f1345
Compare
jl_method_instance_t *mi = jl_get_specialization1((jl_tupletype_t*)sigt, latestworld, 0); | ||
if (mi) { | ||
auto it = compiled_mi.find(mi); | ||
if (it != compiled_mi.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also do a cache lookup for 0b100
i.e looking for code in a packageimage
This implements several sources of bugfixes and improvements: - direct edges are correctly represented - performance does not degrade when edges trigger - the JIT does not call `jl_infer_type` until actually required - constant return can handle invalidation and emitting efficient code This generates the code for the exact signature specified by the user, instead of using `jl_infer_type` to compute a different signature which previously might cause unnecessary boxing and previously introduced unstable performance characteristics. This lets us defer generating the actual thunk required at runtime with the JIT when the required information is already available, and also to validate that information is still correct, and regenerate it when not correct anymore.
65f1345
to
198310f
Compare
Backported PRs: - [x] #57142 <!-- Add reference to time_ns in time --> - [x] #57241 <!-- Handle `waitpid` race condition when `SIGCHLD` is set to `SIG_IGN` --> - [x] #57249 <!-- restore non-freebsd-unix fix for profiling --> - [x] #57211 <!-- Ensure read/readavailable for BufferStream are threadsafe --> - [x] #57262 <!-- edit NEWS for v1.12 --> - [x] #57226 <!-- cfunction: reimplement, as originally planned, for reliable performance --> - [x] #57253 <!-- bpart: Fully switch to partitioned semantics --> - [x] #57273 <!-- fix "Right arrow autocompletes at line end" implementation --> - [x] #57280 <!-- dep: Update JuliaSyntax --> - [x] #57229 <!-- staticdata: Close data race after backedge insertion --> - [x] #57298 <!-- Updating binding version to fix MMTk CI --> - [x] #57248 <!-- improve concurrency safety for `Compiler.finish!` --> - [x] #57312 <!-- Profile.print: de-focus sleeping frames as gray --> - [x] #57289 <!-- Make `OncePerX` subtype `Function` --> - [x] #57310 <!-- Make ptls allocations at least 128 byte aligned --> - [x] #57311 <!-- Add a warning for auto-import of types --> - [x] #57338 <!-- fix typo in Float32 random number generation --> - [x] #57293 <!-- Fix getfield_tfunc when order or boundscheck is Vararg --> - [x] #57349 <!-- docs: fix-up world-age handling for META access --> - [x] #57344 <!-- Add missing type asserts when taking the queue out of the task struct --> - [x] #57348 <!-- 🤖 [master] Bump the SparseArrays stdlib from 212981b to 72c7cac --> - [x] #55040 <!-- Allow macrocall as function sig --> - [x] #57299 <!-- Add missing latestworld after parameterized type alias -->
@vtjnash This broke GPUCompiler.jl; compiling a single one-function kernel, |
-5 is the info code for an unhandled constant function and 0 is the expected result when nothing needed to be compiled since it can not get called |
The example I'm looking at indeed involves a constant function ( |
Sorry for the noob-ish question. What versions of julia will this hit? Will it be in 1.10 or 1.11? |
v1.12 |
This implements several sources of bugfixes and improvements:
jl_infer_type
until actually requiredThis generates the code for the exact signature specified by the user, instead of using
jl_infer_type
to compute a different signature which previously might cause unnecessary boxing and previously introduced unstable performance characteristics. This lets us defer generating the actual thunk required at runtime with the JIT when the required information is already available, and also to validate that information is still correct, and regenerate it when not correct anymore.