-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce invalidations when loading JuliaData packages #47889
Conversation
@@ -830,7 +830,7 @@ julia> hex2bytes(a) | |||
""" | |||
function hex2bytes end | |||
|
|||
hex2bytes(s) = hex2bytes!(Vector{UInt8}(undef, length(s) >> 1), s) | |||
hex2bytes(s) = hex2bytes!(Vector{UInt8}(undef, length(s)::Int >> 1), s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly controversial. However, JuliaHub does not list an InfiniteStrings package (there is an InifiniteArrays package).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo; thanks @timholy!
a0553a9
to
90ea0b5
Compare
The doctest failure seems to come from changes in this PR? |
Hmm, passes for me locally. |
They're failing only on the new AWS runners, I'm not sure those are good doctests if results can be slightly different on different CPUs |
(cherry picked from commit e84634e)
This fixes some invalidations that hinder both CSV (@quinnj) and DataFrames (@bkamins and @nalimilan). Both packages were benchmarked in the discussion of #47184 and @giordano noted that DataFrames had a large load-time regression.
This PR, on top of #47184, together with JuliaLang/Pkg.jl#3275 delivers an unqualified gain in the upcoming Julia 1.9 (workloads are defined in detail farther below):
using CSV
CSV.File(...)
using DataFrames...
The substantial load-time penalty on "1.9" with just #47184 is explained by the fact that
Base.require
is among the invalidated targets, and therefore has to be recompiled while DataFrames is being loaded. This PR fixes that.Here are the workloads:
using CSV
:@time using CSV
CSV.File(...)
:@time @eval CSV.File(joinpath(pkgdir(CSV), "test", "testfiles", "precompile.csv"))
using DataFrames...
:@time begin using PooledArrays: PooledArrays, PooledArray; using DataFrames, Statistics; end
DataFrames TTFX: uses the precompile workload.
CC @vchuravy, @vtjnash
@bkamins, one thing I also noted is that loading both DataFrames and CSV (either before or after, order shouldn't matter) invalidates some of the code in DataFrames. Happy to consult with you about fixing it if you need help.
precompile_blockers
seems useful in this context, as it led me directly to some DataFrames code that wasn't very inferrable.