-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
This is a write up from the issue reported in https://twitter.com/mberesewicz/status/1612721251209908224.
To repro add version 1.12.0 of JSON3::
pkg> add JSON3@1.12.0
Repro script:
using JSON3
import Downloads
import ZipFile
kprm_file = "https://github.com/SKNEstymator/kprm-job-offers/raw/main/data-raw/kprm-with-salary-update-20230102.json.zip"
kprm_file_zip = basename(kprm_file)
kprm_file_json = replace(kprm_file_zip, ".zip" => "")
if !isfile(kprm_file_zip)
Downloads.download(kprm_file, kprm_file_zip)
end
if !isfile(kprm_file_json)
zarchive = ZipFile.Reader(kprm_file_zip)
out = open(kprm_file_json,"w")
write(out,read(zarchive.files[1], String))
close(out)
end
col_names = ["id", "poz", "prac", "miej", "adr_urz", "adr_miej", "adr_urz_str", "adr_gdzie",
"data_um", "data_kon", "data_zgl", "wynik", "wyn", "wakaty", "etat", "status",
"zakres", "wykszt", "wymagania", "wymagania_dod", "kontakt", "liczba_odw"]
function json3_readlines(kprm_file_json)
for line in eachline(kprm_file_json)
JSON3.read(line)
end
end
@time json3_readlines(kprm_file_json)On 1.9 this takes about:
julia> @time json3_readlines(kprm_file_json)
2.537686 seconds (6.47 M allocations: 1019.936 MiB, 2.50% gc time, 0.46% compilation time)while on 1.8 it takes:
julia> @time json3_readlines(kprm_file_json)
0.599276 seconds (1.07 M allocations: 665.072 MiB, 21.89% gc time)Looking at the profile, we have on 1.9 a bunch of subtyping that comes in the call to read:
╎ 1262 @JSON3/src/read.jl:64; read(str::String; jsonlines::Bool, numbertype::Nothing, kw::Base.Pairs{Symbol, Union{}, Tup...
2╎ ╎ 1231 @JSON3/src/JSON3.jl:22; Array
╎ ╎ 1217 /Users/kristoffercarlsson/julia1.9/src/gf.c:2828; ijl_apply_generic
2╎ ╎ 1217 /Users/kristoffercarlsson/julia1.9/src/gf.c:0; _jl_invoke
1╎ ╎ 1085 /Users/kristoffercarlsson/julia1.9/src/builtins.c:1599; jl_f__compute_sparams
╎ ╎ ╎ 1072 /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1682; jl_inst_arg_tuple_type
╎ ╎ ╎ 1072 /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1907; jl_wrap_Type
╎ ╎ ╎ 1072 /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1079; ijl_instantiate_unionall
╎ ╎ ╎ 1072 /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1863; inst_type_w_
╎ ╎ ╎ 106 /Users/kristoffercarlsson/julia1.9/src/jltypes.c:1464; inst_datatype_inner
╎ ╎ ╎ ╎ 54 /Users/kristoffercarlsson/julia1.9/src/subtype.c:1986; ijl_types_equal
╎ ╎ ╎ ╎ 54 /Users/kristoffercarlsson/julia1.9/src/subtype.c:1432; forall_exists_subtype
╎ ╎ ╎ ╎ 54 /Users/kristoffercarlsson/julia1.9/src/subtype.c:1396; exists_subtype
which seems to come from the constructor in https://github.com/quinnj/JSON3.jl/blob/faa521e37317826a38bd211898b512be592be5c7/src/JSON3.jl#L22. This is not present in 1.8:
╎ 263 @JSON3/src/read.jl:30; read
╎ 207 @JSON3/src/read.jl:56; read(str::String; jsonlines::Bool, numbertype::Nothing, kw...
╎ 207 @JSON3/src/read.jl:86; read!
╎ 207 @JSON3/src/read.jl:90; read!(buf::Base.CodeUnits{UInt8, String}, pos::Int64, len...
╎ 207 @JSON3/src/read.jl:311; (::JSON3.var"#read!##kw")(::NamedTuple{(:allow_inf,), T...
1 ╎ ╎ 203 @JSON3/src/read.jl:332; read!(buf::Base.CodeUnits{UInt8, String}, pos::Int64, l...
╎ ╎ 202 @JSON3/src/read.jl:86; (::JSON3.var"#read!##kw")(::NamedTuple{(:allow_inf,), T...
╎ ╎ 142 @JSON3/src/read.jl:90; read!(buf::Base.CodeUnits{UInt8, String}, pos::Int64, ...
╎ ╎ 142 @JSON3/src/read.jl:311; (::JSON3.var"#read!##kw")(::NamedTuple{(:allow_inf,),...
An interesting thing is that the performance issue has been somewhat fixed on the main branch of JSON3 by commit quinnj/JSON3.jl@faa521e:
# With JSON3#main
julia> @time json3_readlines(kprm_file_json)
0.700113 seconds (1.07 M allocations: 663.462 MiB, 6.81% gc time)However, the difference this commit made what I can see is actually worse type inference https://github.com/quinnj/JSON3.jl/blob/faa521e37317826a38bd211898b512be592be5c7/src/read.jl#L65 now gets inferred as (looking at optimized IR):
(1.12.0) (slow)
│ %147 = Core.apply_type(JSON3.Array, %143, Base.CodeUnits{UInt8, String}, Vector{UInt64})::Type{JSON3.Array{_A, Base.CodeUnits{UInt8, String}, Vector{UInt64}}} where _A│
│ %148 = %new(%147, %1, %45, %139)::JSON3.Array{_A, Base.CodeUnits{UInt8, String}, Vector{UInt64}} where _A ││╻ Array
65 │ JSON3.populateinds!(%148)
main (fast)
│ %257 = Core.apply_type(JSON3.Array, %256)::Type{JSON3.Array{_A}} where _A │
└──── %258 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Int64}, svec(Any, Int64), 0, :(:ccall), Vector{Int64}, 0, 0))::Vector{Int64} ││╻ Array
90 ── nothing │
91 ── goto #92 │╻ getindex
92 ── %261 = (%257)(%61, %122, %258)::JSON3.Array{T, S, Vector{UInt64}} where {T, S<:AbstractVector{UInt8}} │
66 │ JSON3.populateinds!(%261)
So in the fast case we only know that the second type parameter S is <: AbstractVector{UInt8} but in the slow case we know that S == Base.CodeUnits{UInt8, String}.
It seems bad here that more precise type information seems to cause there to be a bunch of subtyping that slows down the code.