Skip to content

Switch to a better lookup strategy for compile-time preferences in stacked environments before releasing 1.6? #37791

Closed
@tkf

Description

@tkf

tl;dr: I think we can improve the lookup strategies for compile-time preferences than the current implementation #37595. In particular, can we make it independent of the content of Manifest.toml files?

Continuing the discussion in #37595 (comment), I think we need to explore different strategies of compile-time preference lookup for stacked environments before 1.6 is out and the spec is frozen.

(@staticfloat I'm opening the issue here since it's about code loading and I think resolving this is a blocker for 1.6. But let me know if you want to move this discussion to Preferences.jl)

cc @fredrikekre @KristofferC

What is the motivation of compile-time preference?

Before discussing how to lookup preferences, I think it would be better to have a shared vision of the use-cases of compile-time preference.

I imagine that a common example would be for choosing some kind of default "backend" such as CPU vs GPU JuliaLang/Pkg.jl#977. IIUC @timholy's ComputationalResources.jl achieves a similar effect with run-time @eval. FFTW's deps/build.jl uses a text file ~/.julia/prefs/FFTW to switch the provider of the external library. This can be migrated to the compile-time preferences system. It's also useful for toggling debugging support (in a semi-ad-hoc way). For example, ForwardDiff uses the constant NANSAFE_MODE_ENABLED for adding debugging instructions.

I think another important use-case is for handling machine-specific configuration such as system libraries and hardware properties. For example, previous discussions of package options (JuliaLang/Pkg.jl#458 and JuliaLang/Juleps#38) mentioned that configuring libpython for PyCall as an important use-case. In general, it is useful to be able to use Julia with external libraries with various sources. For example, libpython may come from JLL, OS's package manager, custom build, conda, etc. Such setting is inevitably machine-specific. Thus, recording such information in Project.toml that is meant to be shared is a bad idea. At the same time, it is crucial to have per-project per-machine preferences in a self-contained file for reproducibility.

Are they good motivations? Can we agree that it's ideal to have (1) pre-project machine-agnostic preferences and (2) per-project per-machine preferences? If so, I think it's necessary to change the current lookup strategy.

Strategies

There are various ways to lookup preferences of stacked environments (i.e., Base.load_path()). To start the conversation, I discuss following threee strategies:

Strategy 1: First package hit in Manifest.toml files (current implementation as of #37595)

The current strategy for finding the preference for a package is to walk through load_path() one by one, find a manifest (environment) that includes the package, and look at the corresponding project file.

Strategy 2: First preference hit in Project.toml files

Search Project.toml files in load_path() and find the first Project.toml file with the preference of the target package.

Strategy 3: First package hit in Project.toml files

Search Project.toml files in load_path() and find the first Project.toml file with the target package.

Example

To illustrate the difference between these strategies, consider the following environment stack (i.e., Base.load_path() == [X, Y, Z])

  • Project X: Project.toml has package A which has package B as a dependency (i.e., B is in Manifest.toml but not in Project.toml). Package.toml has no compile-preferences table.
  • Project Y: Project.toml has the compile-preferences table for B. However, Project.toml's deps table does not contain B.
  • Project Z: Project.toml has the compile-preferences table for B. Project.toml includes B in deps; i.e., the user ran pkg> add B while activating Z.

Strategy 1 finds the preferences for B in X (i.e., empty). Strategy 2 finds the preferences for B in Y. Strategy 3 finds the preferences for B in Z.

To summarize:

Project deps compile-preferences Manifest.toml found by
X [A, ...] empty has B as an indirect dependency Strategy 1
Y [...] has B's preferences has B as an indirect dependency Strategy 2
Z [B] has B's preferences has B Strategy 3

Analysis

As I discussed in #37595 (comment), I think Strategy 1 (First package hit in manifests) is not desirable because the fact that package A depends on B is (usually) an implementation detail. Package A's author may silently drop B from the dependency when bumping v1.1 to v1.2. Then, after Pkg.update, Strategy 1 would pick up project Y as the source of preferences. OTOH, with Strategy 2 and 3, it's more explicit for the user to control which environment changes the preference of a given package. I don't think it is ideal to rely on the state of Manifest.toml since it is a large opaque file to the users and it is often not checked in to the version control system.

Strategy 3 has an advantage over Strategy 2 that the compatibility of the recorded preferences can be imposed via the compat entry. For example, the package can add the compat bound for the given preference support. The only disadvantage for Strategy 3 compared to Strategy 2 I can think of is that the user may end up having "stale" package in Project.toml that they added just for configuring a transitive dependency.

Alternative: shallow-merge all preference tables?

It's also conceivable to aggressively combine preference tables for a given package using merge(dicts...). That is to say, given

[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 1
b = 2

and

[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d]
a = 10
c = 30

we'd have merge(Dict("a" => 10, "c" => 30), Dict("a" => 1, "b" => 2)) (i.e., Dict("a" => 1, "b" => 2, "c" => 30)).

Since this is "shallow-merge", each package can opt-out this behavior and use Strategy 2/3 by creating sub-table explicitly:

[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences] # note `.preferences` suffix
a = 1
b = 2

and

[compile-preferences.342fba16-3e17-4664-b1bb-a60ccdbe268d.preferences]
a = 10
c = 30

As long as the specification is clearly documented, the package authors can use the appropriate behavior.

Opinion

I think Strategy 3 or the shallow-merge variant of Strategy 3 is better.

Appendix: Current implementation

The entry point for the precompilation cache manager is get_preferences_hash

julia/base/loading.jl

Lines 325 to 348 in 6596f95

function uuid_in_environment(project_file::String, uuid::UUID, cache::TOMLCache)
# First, check to see if we're looking for the environment itself
proj_uuid = get(parsed_toml(cache, project_file), "uuid", nothing)
if proj_uuid !== nothing && UUID(proj_uuid) == uuid
return true
end
# Check to see if there's a Manifest.toml associated with this project
manifest_file = project_file_manifest_path(project_file, cache)
if manifest_file === nothing
return false
end
manifest = parsed_toml(cache, manifest_file)
for (dep_name, entries) in manifest
for entry in entries
entry_uuid = get(entry, "uuid", nothing)::Union{String, Nothing}
if uuid !== nothing && UUID(entry_uuid) == uuid
return true
end
end
end
# If all else fails, return `false`
return false
end

julia/base/loading.jl

Lines 1458 to 1484 in 6596f95

# Find the Project.toml that we should load/store to for Preferences
function get_preferences_project_path(uuid::UUID, cache::TOMLCache = TOMLCache())
for env in load_path()
project_file = env_project_file(env)
if !isa(project_file, String)
continue
end
if uuid_in_environment(project_file, uuid, cache)
return project_file
end
end
return nothing
end
function get_preferences(uuid::UUID, cache::TOMLCache = TOMLCache();
prefs_key::String = "compile-preferences")
project_path = get_preferences_project_path(uuid, cache)
if project_path !== nothing
preferences = get(parsed_toml(cache, project_path), prefs_key, Dict{String,Any}())
if haskey(preferences, string(uuid))
return preferences[string(uuid)]
end
end
# Fall back to default value of "no preferences".
return Dict{String,Any}()
end
get_preferences_hash(uuid::UUID, cache::TOMLCache = TOMLCache()) = UInt64(hash(get_preferences(uuid, cache)))

Metadata

Metadata

Assignees

No one assigned

    Labels

    packagesPackage management and loading

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions