-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile: multithread getdict! (redo of #43805) #43816
Profile: multithread getdict! (redo of #43805) #43816
Conversation
stdlib/Profile/src/Profile.jl
Outdated
Threads.@spawn begin | ||
unique_ips = unique(has_meta(data) ? strip_meta(data) : data) | ||
chnl = Channel{Tuple{UInt64,Vector{StackFrame}}}(length(unique_ips)) do ch | ||
Threads.@threads for ip in unique_ips |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will hang if a thread is already running in non-yielding code. Can we at a minimum make this threading optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Is there a tidy way to make a @threads
optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea, I was wondering the same thing when I wrote this. Maybe refactor the internals into its own function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed what appears to be non-blocking in this case
12f5582
to
eaf99c0
Compare
# we don't want metadata here as we're just looking up ips | ||
unique_ips = unique(has_meta(data) ? strip_meta(data) : data) | ||
n_unique_ips = length(unique_ips) | ||
n_unique_ips == 0 && return dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expensive part is the lookup. Can't you just
iplookups = similar(unique_ips, Vector{StackFrame})
Threads.@threads for i = 1:n_unique_ips
iplookups[i] = _lookup_corrected(unique_ips[i])
end
and then stash in the Dict in single-threaded mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't use @threads
because @jpsamaroo pointed out @threads
would get blocked if any thread is doing something that isn't yielding.
Though, regarding just putting it into a buffer, I was thinking that would be memory inefficient, but maybe I'm wrong because the dict is empty to start with..
I'll try removing the channel stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but you're not using the limited-capacity aspect of the Channel at all, so this is really just a glorified array-store. If you store by the index of the ip
you don't have to worry about races.Indeed, you could replace the @threads
with @spawn
ed blocks.
This PR essentially just needs a parallel map. IMHO it doesn't seem like a good idea to write an ad-hoc inefficient parallel map everywhere that needs it. However, Alternatively, maybe we can write slightly clumsy parallel folds inside |
I guess a faster Regarding how to implement this if it stays in Note that:
For a small profile data buffer: This PR
Master
|
How does this look @vtjnash ? |
Bump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I was hoping to get your review on this @vtjnash but will merge tomorrow if not |
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
…aLang#43816)" This reverts commit e6fa3ec.
Sometime between 1.7 and master, Profile printing has gotten way slower. Could this be the cause? |
Are you comparing both with the same number of threads? I don't see a regression, and in larger tests saw an improvement with this PR. 1.7.1
master
|
It was pointed out there were at least 2 problems with #43805 (which is now reverted)
This PR moves to a channel-based approach, and uses one
@spawn
pernthreads()
.I didn't use
Threads.@threads
here to avoid blocking, so this is possible:(I think the allocations and GC of the spawned loop are being picked up by
@time
here?)cc. @vtjnash @timholy