Skip to content

cfunction: reimplement, as originally planned, for reliable performance #57226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jan 31, 2025

This implements several sources of bugfixes and improvements:

  • direct edges are correctly represented
  • performance does not degrade when edges trigger
  • the JIT does not call jl_infer_type until actually required
  • constant return can handle invalidation and emitting efficient code

This generates the code for the exact signature specified by the user, instead of using jl_infer_type to compute a different signature which previously might cause unnecessary boxing and previously introduced unstable performance characteristics. This lets us defer generating the actual thunk required at runtime with the JIT when the required information is already available, and also to validate that information is still correct, and regenerate it when not correct anymore.

@vtjnash vtjnash added compiler:codegen Generation of LLVM IR and native code backport 1.12 Change should be backported to release-1.12 labels Jan 31, 2025
@vtjnash vtjnash requested a review from gbaraldi January 31, 2025 19:09
@oscardssmith oscardssmith added the performance Must go faster label Jan 31, 2025
@vtjnash vtjnash force-pushed the jn/fast-cfunction-update branch from 524b478 to 07dce1e Compare January 31, 2025 20:46
@JeffBezanson
Copy link
Member

I will read this in more detail soon but looks nice 👍

@vtjnash vtjnash force-pushed the jn/fast-cfunction-update branch 2 times, most recently from 1be6ee4 to 65f1345 Compare February 3, 2025 22:15
@KristofferC KristofferC mentioned this pull request Feb 4, 2025
32 tasks
jl_method_instance_t *mi = jl_get_specialization1((jl_tupletype_t*)sigt, latestworld, 0);
if (mi) {
auto it = compiled_mi.find(mi);
if (it != compiled_mi.end()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also do a cache lookup for 0b100 i.e looking for code in a packageimage

This implements several sources of bugfixes and improvements:
 - direct edges are correctly represented
 - performance does not degrade when edges trigger
 - the JIT does not call `jl_infer_type` until actually required
 - constant return can handle invalidation and emitting efficient code

This generates the code for the exact signature specified by the user,
instead of using `jl_infer_type` to compute a different signature which
previously might cause unnecessary boxing and previously introduced
unstable performance characteristics. This lets us defer generating the
actual thunk required at runtime with the JIT when the required
information is already available, and also to validate that information
is still correct, and regenerate it when not correct anymore.
@vtjnash vtjnash force-pushed the jn/fast-cfunction-update branch from 65f1345 to 198310f Compare February 5, 2025 20:38
@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Feb 5, 2025
@vtjnash vtjnash merged commit ca7cf30 into master Feb 6, 2025
6 of 8 checks passed
@vtjnash vtjnash deleted the jn/fast-cfunction-update branch February 6, 2025 04:45
@inkydragon inkydragon removed the merge me PR is reviewed. Merge when all tests are passing label Feb 6, 2025
KristofferC pushed a commit that referenced this pull request Feb 6, 2025
KristofferC pushed a commit that referenced this pull request Feb 6, 2025
KristofferC added a commit that referenced this pull request Feb 13, 2025
Backported PRs:
- [x] #57142 <!-- Add reference to time_ns in time -->
- [x] #57241 <!-- Handle `waitpid` race condition when `SIGCHLD` is set
to `SIG_IGN` -->
- [x] #57249 <!-- restore non-freebsd-unix fix for profiling -->
- [x] #57211 <!-- Ensure read/readavailable for BufferStream are
threadsafe -->
- [x] #57262 <!-- edit NEWS for v1.12 -->
- [x] #57226 <!-- cfunction: reimplement, as originally planned, for
reliable performance -->
- [x] #57253 <!-- bpart: Fully switch to partitioned semantics -->
- [x] #57273 <!-- fix "Right arrow autocompletes at line end"
implementation -->
- [x] #57280 <!-- dep: Update JuliaSyntax -->
- [x] #57229 <!-- staticdata: Close data race after backedge insertion
-->
- [x] #57298 <!-- Updating binding version to fix MMTk CI -->
- [x] #57248 <!-- improve concurrency safety for `Compiler.finish!` -->
- [x] #57312 <!-- Profile.print: de-focus sleeping frames as gray -->
- [x] #57289 <!-- Make `OncePerX` subtype `Function` -->
- [x] #57310 <!-- Make ptls allocations at least 128 byte aligned -->
- [x] #57311 <!-- Add a warning for auto-import of types -->
- [x] #57338 <!-- fix typo in Float32 random number generation -->
- [x] #57293 <!-- Fix getfield_tfunc when order or boundscheck is Vararg
-->
- [x] #57349 <!-- docs: fix-up world-age handling for META access -->
- [x] #57344 <!-- Add missing type asserts when taking the queue out of
the task struct -->
- [x] #57348 <!-- 🤖 [master] Bump the SparseArrays stdlib from 212981b
to 72c7cac -->
- [x] #55040 <!-- Allow macrocall as function sig -->
- [x] #57299 <!-- Add missing latestworld after parameterized type alias
-->
@KristofferC KristofferC removed the backport 1.12 Change should be backported to release-1.12 label Feb 13, 2025
@maleadt
Copy link
Member

maleadt commented Mar 3, 2025

@vtjnash This broke GPUCompiler.jl; compiling a single one-function kernel, jl_get_function_id now returns -5 and 0, two seemingly invalid values for the func and specfunc LLVM function indices (on 1.11, they are simply 1 and 2 in this case). Can you elaborate what has to change?

@vtjnash
Copy link
Member Author

vtjnash commented Mar 3, 2025

-5 is the info code for an unhandled constant function and 0 is the expected result when nothing needed to be compiled since it can not get called

@maleadt
Copy link
Member

maleadt commented Mar 4, 2025

The example I'm looking at indeed involves a constant function (kernel() = nothing), but that used to compile fine. I am passing a valid sequence of [instance, info] to jl_emit_native (CC.use_const_api is true, so I end up calling CC.codeinfo_for_const).

@rekabrnalla
Copy link

Sorry for the noob-ish question. What versions of julia will this hit? Will it be in 1.10 or 1.11?

@vtjnash
Copy link
Member Author

vtjnash commented Apr 3, 2025

v1.12

serenity4 pushed a commit to serenity4/julia that referenced this pull request May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants