Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile: multithread getdict! (redo of #43805) #43816

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 24 additions & 9 deletions stdlib/Profile/src/Profile.jl
Original file line number Diff line number Diff line change
Expand Up @@ -349,19 +349,34 @@ function getdict(data::Vector{UInt})
end

function getdict!(dict::LineInfoDict, data::Vector{UInt})
for ip in data
# Lookup is expensive, so do it only once per ip.
haskey(dict, UInt64(ip)) && continue
st = lookup(convert(Ptr{Cvoid}, ip))
# To correct line numbers for moving code, put it in the form expected by
# Base.update_stackframes_callback[]
stn = map(x->(x, 1), st)
try Base.invokelatest(Base.update_stackframes_callback[], stn) catch end
dict[UInt64(ip)] = map(first, stn)
# we don't want metadata here as we're just looking up ips
unique_ips = unique(has_meta(data) ? strip_meta(data) : data)
n_unique_ips = length(unique_ips)
n_unique_ips == 0 && return dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expensive part is the lookup. Can't you just

iplookups = similar(unique_ips, Vector{StackFrame})
Threads.@threads for i = 1:n_unique_ips
    iplookups[i] = _lookup_corrected(unique_ips[i])
end

and then stash in the Dict in single-threaded mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use @threads because @jpsamaroo pointed out @threads would get blocked if any thread is doing something that isn't yielding.

Though, regarding just putting it into a buffer, I was thinking that would be memory inefficient, but maybe I'm wrong because the dict is empty to start with..

I'll try removing the channel stuff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but you're not using the limited-capacity aspect of the Channel at all, so this is really just a glorified array-store. If you store by the index of the ip you don't have to worry about races.Indeed, you could replace the @threads with @spawned blocks.

iplookups = similar(unique_ips, Vector{StackFrame})
@sync for indexes_part in Iterators.partition(eachindex(unique_ips), div(n_unique_ips, Threads.nthreads(), RoundUp))
Threads.@spawn begin
for i in indexes_part
iplookups[i] = _lookup_corrected(unique_ips[i])
end
end
end
for i in eachindex(unique_ips)
dict[unique_ips[i]] = iplookups[i]
end
return dict
end

function _lookup_corrected(ip::UInt)
st = lookup(convert(Ptr{Cvoid}, ip))
# To correct line numbers for moving code, put it in the form expected by
# Base.update_stackframes_callback[]
stn = map(x->(x, 1), st)
# Note: Base.update_stackframes_callback[] should be data-race free
try Base.invokelatest(Base.update_stackframes_callback[], stn) catch end
return map(first, stn)
end

"""
flatten(btdata::Vector, lidict::LineInfoDict) -> (newdata::Vector{UInt64}, newdict::LineInfoFlatDict)

Expand Down