Skip to content

Proposal: Run precompilation workloads at top-level instead of within package  #51905

Open

Description

Currently, as part of the PkgImages feature introduced in julia 1.9 (and even in older versions of julia), users are encouraged to run snoop workloads at the end of the module definition, in order to capture compilation of julia generic functions that are invoked during that workload.

This introduces at least two problems:

  1. PrecompileTools doesn't run __init__() so some functionality may not work during package compilation? PrecompileTools.jl#32
    • For some modules that use __init__() the expectation is that the module's functions will not be called until the module is initialized, which we don't do during precompilation.
    • We don't init the modules for good reason: we don't want to serialize the module's runtime data, we only want to initialize that data at runtime.
    • But this is a conundrum! We need to init to run the snoop but we must not init to preserve the correctness.
    • The best answer is currently to manually init the module's data, run the workload, then uninit the data at the end. :/
      • obviously tricky and error prone.
  2. Task cannot be serialized error during precompilation disappeared in 1.9 #49513
    • As I understand it, this is the fundamental issue:
      • Since we serialize the entire module at the end of precompilation, if the module is pointing to any running Tasks, we cannot (de)serialize those safely. So starting in 1.10, we introduce a mechanism to block until all those tasks have finished.
      • But if you are running a complex snoop workload, and that workload creates some tasks or IO objects, it can be difficult and error prone to track all of them down and correctly shut them down before finishing the workload.

I would like to propose that we introduce an in-language supported mechanism to run a snoop workload, after a module is closed.

Syntactically, I think it could be as simple as moving the snoop / precompile statements to after the module, maybe by registering them in a callback that will be called when the runtime is finished closing the module. Something like:

module MyPackage
end

Base.precompilation(MyPackage) do
    # setup state
    MyPackage.setup()
    # run precompiles and/or snoop workload
    precompile(...)
    MyPackage.do_stuff()
end

Semantically, I propose that this would do something like the following:

  1. After the user's file is included, the module is closed just like it currently is.
  2. If the user provided a precompilation callback:
    1. We first make a deepcopy of the module, which is what will be used for serialization.
    2. Then, we run the user-provided callback, which will mutate state in the module and also trigger the compilations we want.
    3. Finally, we can now extract only the newly added method instances in the module's method tables, and move/copy them into the originally checkpointed module,
  3. and then we serialize that module.

This allows us to separate the concerns of defining a module and running a workload to snoop compile it.

It allows us to ensure that the snoop workload doesn't accidentally introduce state into the module that is serialized, causing unexpected behaviors.

It allows us to be able to robustly ignore "dangling tasks", which preserves the behavior that pre-1.9 users have with PackageCompiler.

And the implementation doesn't seem too burdensome, and is free unless users use the new feature.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    compiler:precompilationPrecompilation of modulesfeatureIndicates new feature / enhancement requests

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions