Description
tl;dr: I think we can improve the lookup strategies for compile-time preferences than the current implementation #37595. In particular, can we make it independent of the content of Manifest.toml
files?
Continuing the discussion in #37595 (comment), I think we need to explore different strategies of compile-time preference lookup for stacked environments before 1.6 is out and the spec is frozen.
(@staticfloat I'm opening the issue here since it's about code loading and I think resolving this is a blocker for 1.6. But let me know if you want to move this discussion to Preferences.jl)
What is the motivation of compile-time preference?
Before discussing how to lookup preferences, I think it would be better to have a shared vision of the use-cases of compile-time preference.
I imagine that a common example would be for choosing some kind of default "backend" such as CPU vs GPU JuliaLang/Pkg.jl#977. IIUC @timholy's ComputationalResources.jl achieves a similar effect with run-time @eval
. FFTW's deps/build.jl uses a text file ~/.julia/prefs/FFTW
to switch the provider of the external library. This can be migrated to the compile-time preferences system. It's also useful for toggling debugging support (in a semi-ad-hoc way). For example, ForwardDiff uses the constant NANSAFE_MODE_ENABLED
for adding debugging instructions.
I think another important use-case is for handling machine-specific configuration such as system libraries and hardware properties. For example, previous discussions of package options (JuliaLang/Pkg.jl#458 and JuliaLang/Juleps#38) mentioned that configuring libpython for PyCall as an important use-case. In general, it is useful to be able to use Julia with external libraries with various sources. For example, libpython may come from JLL, OS's package manager, custom build, conda, etc. Such setting is inevitably machine-specific. Thus, recording such information in Project.toml
that is meant to be shared is a bad idea. At the same time, it is crucial to have per-project per-machine preferences in a self-contained file for reproducibility.
Are they good motivations? Can we agree that it's ideal to have (1) pre-project machine-agnostic preferences and (2) per-project per-machine preferences? If so, I think it's necessary to change the current lookup strategy.
Strategies
There are various ways to lookup preferences of stacked environments (i.e., Base.load_path()
). To start the conversation, I discuss following threee strategies:
Strategy 1: First package hit in Manifest.toml
files (current implementation as of #37595)
The current strategy for finding the preference for a package is to walk through load_path()
one by one, find a manifest (environment) that includes the package, and look at the corresponding project file.
Strategy 2: First preference hit in Project.toml
files
Search Project.toml
files in load_path()
and find the first Project.toml
file with the preference of the target package.
Strategy 3: First package hit in Project.toml
files
Search Project.toml
files in load_path()
and find the first Project.toml
file with the target package.
Example
To illustrate the difference between these strategies, consider the following environment stack (i.e., Base.load_path() == [X, Y, Z]
)
- Project
X
:Project.toml
has packageA
which has packageB
as a dependency (i.e.,B
is inManifest.toml
but not inProject.toml
).Package.toml
has no compile-preferences table. - Project
Y
:Project.toml
has the compile-preferences table forB
. However,Project.toml
'sdeps
table does not containB
. - Project
Z
:Project.toml
has the compile-preferences table forB
.Project.toml
includesB
indeps
; i.e., the user ranpkg> add B
while activatingZ
.
Strategy 1 finds the preferences for B
in X
(i.e., empty). Strategy 2 finds the preferences for B
in Y
. Strategy 3 finds the preferences for B
in Z
.
To summarize:
Project | deps |
compile-preferences |
Manifest.toml |
found by |
---|---|---|---|---|
X | [A, ...] |
empty | has B as an indirect dependency |
Strategy 1 |
Y | [...] |
has B 's preferences |
has B as an indirect dependency |
Strategy 2 |
Z | [B] |
has B 's preferences |
has B |
Strategy 3 |
Analysis
As I discussed in #37595 (comment), I think Strategy 1 (First package hit in manifests) is not desirable because the fact that package A
depends on B
is (usually) an implementation detail. Package A
's author may silently drop B
from the dependency when bumping v1.1 to v1.2. Then, after Pkg.update
, Strategy 1 would pick up project Y
as the source of preferences. OTOH, with Strategy 2 and 3, it's more explicit for the user to control which environment changes the preference of a given package. I don't think it is ideal to rely on the state of Manifest.toml
since it is a large opaque file to the users and it is often not checked in to the version control system.
Strategy 3 has an advantage over Strategy 2 that the compatibility of the recorded preferences can be imposed via the compat
entry. For example, the package can add the compat
bound for the given preference support. The only disadvantage for Strategy 3 compared to Strategy 2 I can think of is that the user may end up having "stale" package in Project.toml
that they added just for configuring a transitive dependency.
Alternative: shallow-merge all preference tables?
It's also conceivable to aggressively combine preference tables for a given package using merge(dicts...)
. That is to say, given
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 1
b = 2
and
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 10
c = 30
we'd have merge(Dict("a" => 10, "c" => 30), Dict("a" => 1, "b" => 2))
(i.e., Dict("a" => 1, "b" => 2, "c" => 30)
).
Since this is "shallow-merge", each package can opt-out this behavior and use Strategy 2/3 by creating sub-table explicitly:
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences] # note `.preferences` suffix
a = 1
b = 2
and
[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences]
a = 10
c = 30
As long as the specification is clearly documented, the package authors can use the appropriate behavior.
Opinion
I think Strategy 3 or the shallow-merge variant of Strategy 3 is better.
Appendix: Current implementation
The entry point for the precompilation cache manager is get_preferences_hash
Lines 325 to 348 in 6596f95
Lines 1458 to 1484 in 6596f95