Proposal: Run precompilation workloads at top-level instead of within package 

Currently, as part of the PkgImages feature introduced in julia 1.9 (and even in older versions of julia), users are encouraged to **run snoop workloads at the end of the module definition,** in order to capture compilation of julia generic functions that are invoked during that workload.

This introduces at least two problems:

1. https://github.com/JuliaLang/PrecompileTools.jl/issues/32
    - For some modules that use `__init__()` the expectation is that the module's functions will not be called until the module is initialized, which we don't do during precompilation.
    - We don't init the modules for good reason: we don't want to serialize the module's runtime data, we only want to initialize that data _at runtime._
    - But this is a conundrum! We need to init to run the snoop but we must not init to preserve the correctness.
    - The best answer is currently to _manually init the module's data, run the workload, then uninit the data at the end._ :/
        - obviously tricky and error prone.
2. https://github.com/JuliaLang/julia/issues/49513
    - As I understand it, this is the fundamental issue:
        - Since we serialize the entire module at the end of precompilation, if the module is pointing to any _running Tasks,_ we cannot (de)serialize those safely. So starting in 1.10, we introduce a mechanism to _block_ until all those tasks have finished.
        - But if you are running a complex snoop workload, and that workload creates some tasks or IO objects, it can be difficult and error prone to track all of them down and correctly shut them down before finishing the workload.

I would like to propose that we introduce an in-language supported mechanism to **run a snoop workload, after a module is closed.**

Syntactically, I think it could be as simple as moving the snoop / precompile statements to _after_ the module, maybe by registering them in a callback that will be called when the runtime is finished closing the module. Something like:
```julia
module MyPackage
end

Base.precompilation(MyPackage) do
    # setup state
    MyPackage.setup()
    # run precompiles and/or snoop workload
    precompile(...)
    MyPackage.do_stuff()
end
```

Semantically, I propose that this would do something like the following:

1. After the user's file is included, the module is _closed_ just like it currently is.
2. If the user provided a precompilation callback:
    1. We first make a `deepcopy` of the module, which is what will be used for serialization.
    1. Then, we run the user-provided callback, which will mutate state in the module and also trigger the compilations we want.
    1. Finally, we can now extract _only the newly added_ method instances in the module's method tables, and move/copy them into the originally checkpointed module,
1. and then we serialize that module.


This allows us to separate the concerns of defining a module and running a workload to snoop compile it.

It allows us to ensure that the snoop workload doesn't accidentally introduce state into the module that is serialized, causing unexpected behaviors.

It allows us to be able to robustly ignore "dangling tasks", which preserves the behavior that pre-1.9 users have with PackageCompiler.

And the implementation doesn't seem _too_ burdensome, and is free unless users use the new feature.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Run precompilation workloads at top-level instead of within package #51905

NHDaly
openedon Oct 27, 2023

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Run precompilation workloads at top-level instead of within package #51905

Description

NHDalyopenedon Oct 27, 2023

Metadata