Skip to content

Surround the GIL with a ReentrantLock on the Julia side. #637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,7 @@ dist/
.CondaPkg/
/jltest.*
uv.lock

# pixi environments
.pixi
*.egg-info
109 changes: 94 additions & 15 deletions src/GIL/GIL.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,44 +9,99 @@ module GIL

using ..C: C

# Ensure that only one Julia task tries to acquire the Python GIL.
# Avoid the potential issue that a task could miscompute whether
# it actually has the GIL simply because a different task that ran
# on the same thread that once had the GIL.
# https://github.com/JuliaPy/PythonCall.jl/issues/627
const _jl_gil_lock = ReentrantLock()

"""
hasgil()

Returns `true` if the current thread has the GIL or `false` otherwise.
"""
hasgil() = C.PyGILState_Check() == Cint(1)

"""
lock(f)
lock(f; exclusive=true)

Lock the GIL, compute `f()`, unlock the GIL, then return the result of `f()`.

Use this to run Python code from threads that do not currently hold the GIL, such as new
threads. Since the main Julia thread holds the GIL by default, you will need to
[`unlock`](@ref) the GIL before using this function.

Setting the `exclusive` keyword to `false` allows more than one Julia `Task`
to attempt to acquire the GIL. This may be useful if one Julia `Task` calls
code which releases the GIL, in which case another Julia task could acquire
it. However, this is probably not a sound approach.

See [`@lock`](@ref) for the macro form.
"""
function lock(f)
state = C.PyGILState_Ensure()
function lock(f; exclusive=true)
exclusive && Base.lock(_jl_gil_lock)
try
f()
state = C.PyGILState_Ensure()
try
f()
finally
C.PyGILState_Release(state)
end
finally
C.PyGILState_Release(state)
exclusive && Base.unlock(_jl_gil_lock)
end
end

"""
@lock expr
@lock [exclusive=true] expr

Lock the GIL, compute `expr`, unlock the GIL, then return the result of `expr`.

Use this to run Python code from threads that do not currently hold the GIL, such as new
threads. Since the main Julia thread holds the GIL by default, you will need to
[`@unlock`](@ref) the GIL before using this function.

Setting the `exclusive` parameter to `false` allows more than one Julia `Task`
to attempt to acquire the GIL. This may be useful if one Julia `Task` calls
code which releases the GIL, in which case another Julia task could acquire
it. However, this is probably not a sound approach.

The macro equivalent of [`lock`](@ref).
"""
macro lock(expr)
quote
state = C.PyGILState_Ensure()
Base.lock(_jl_gil_lock)
try
$(esc(expr))
state = C.PyGILState_Ensure()
try
$(esc(expr))
finally
C.PyGILState_Release(state)
end
finally
C.PyGILState_Release(state)
Base.unlock(_jl_gil_lock)
end
end
end

macro lock(parameter, expr)
parameter.head == :(=) &&
parameter.args[1] == :exclusive ||
throw(ArgumentError("The only accepted parameter to @lock is `exclusive`."))

do_lock = esc(parameter.args[2])
quote
$do_lock && Base.lock(_jl_gil_lock)
try
state = C.PyGILState_Ensure()
try
$(esc(expr))
finally
C.PyGILState_Release(state)
end
finally
$do_lock && Base.unlock(_jl_gil_lock)
end
end
end
Expand All @@ -63,11 +118,17 @@ Python code. That other thread can be a Julia thread, which must lock the GIL us
See [`@unlock`](@ref) for the macro form.
"""
function unlock(f)
state = C.PyEval_SaveThread()
_locked = Base.islocked(_jl_gil_lock)
_locked && Base.unlock(_jl_gil_lock)
try
f()
state = C.PyEval_SaveThread()
try
f()
finally
C.PyEval_RestoreThread(state)
end
finally
C.PyEval_RestoreThread(state)
_locked && Base.lock(_jl_gil_lock)
end
end

Expand All @@ -84,13 +145,31 @@ The macro equivalent of [`unlock`](@ref).
"""
macro unlock(expr)
quote
state = C.PyEval_SaveThread()
_locked = Base.islocked(_jl_gil_lock)
_locked && Base.unlock(_jl_gil_lock)
try
$(esc(expr))
state = C.PyEval_SaveThread()
try
$(esc(expr))
finally
C.PyEval_RestoreThread(state)
end
finally
C.PyEval_RestoreThread(state)
_locked && Base.lock(_jl_gil_lock)
end
end
end

#=
# Disable this for now since holding this lock will disable finalizers
# If the main thread already has the GIL, we should lock _jl_gil_lock.
function __init__()
if hasgil()
Base.lock(_jl_gil_lock)
end
end
Comment on lines +166 to +170
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locking on __init__ is probably a bad idea even though we are technically holding the GIL since this will disable finalizers.

Rather we should consider invoking lock while executing other calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling this apparently allowed the python tests to succeed.

Copy link

@vtjnash vtjnash Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After sleeping on this, I propose you make the following changes to make this fix work:

  • Replace this ReentrantLock with a OncePerThread{@NamedTuple{stack::Vector{Task}, set::IdSet{Task}, condvar::Condition}} lockowners. The point being mostly to convert the state on ReentrantLock into something similar but protected by the GIL to avoid concurrency, finalizer, and multi-threading issues.
  • Perhaps, after locking the GIL on this thread, check that the current Task is in(lockowners().set, current_task) for safety (forcing the user to make explicit GIL usage on each Task with a call to lock on each Task)?
  • During the calls to lock (after getting GIL), push the current Task to the lockowners stack and add it to the set (if using that)
  • On unlock, block (on condvar) until the current_task is at the top of the stack (notify condvar after every unlock) to enforce that GIL unlock calls are correctly sequenced
  • Where necessary (anywhere internally that is holding the GIL to update a Julia data structure related to python), add calls to enable/disable finalizers to prevent memory corruption from concurrent mutation of the data structures (separate PR)
  • Whenever a Task interacts with python, force current_task().sticky = true to prevent it migrating to a thread that is not holding the GIL.
  • Adding __init__ is dangerous since that doesn't run on the main Task in some user applications even though it always does so during your CI tests. You might just want to add some special exemptions for when current_task() == root_task(0)?

Does that all sound logical and reasonable? My proposal already went through a few drafts as I tried to make it seem straightforward enough to explain and use, so it might not be the only way to deal with some of these constraints, and I'm probably forgetting at least some important details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OncePerThread is to be introduced in Julia 1.12 which has yet to be released at the time of this writing. Is there a lesser version of this that could be implemented prior to Julia 1.12, preferably to Julia 1.10 LTS?

=#

include("GlobalInterpreterLock.jl")

end
178 changes: 178 additions & 0 deletions src/GIL/GlobalInterpreterLock.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
"""
TaskState

When a `Task` acquires the GIL, save the GIL state and the stickiness of the
`Task` since we will force the `Task` to be sticky. We need to restore the GIL
state on release of the GIL via `C.PyGILState_Release`.
"""
struct TaskState
task::Task
sticky::Bool # original stickiness of the task
state::C.PyGILState_STATE
end

"""
TaskStack

For each thread the `TaskStack` maintains a first-in-last-out list of tasks
as well as the GIL state and their stickiness upon entering the stack. This
forces tasks to unlock the GIL in the reverse order of which they locked it.
"""
struct TaskStack
stack::Vector{TaskState}
count::IdDict{Task,Int}
condvar::Threads.Condition
function TaskStack()
return new(TaskState[], IdDict{Task,Int}(), Threads.Condition())
end
end
function Base.last(task_stack::TaskStack)::Task
return last(task_stack.stack).task
end
function Base.push!(task_stack::TaskStack, task::Task)
original_sticky = task.sticky
# The task should not migrate threads while acquiring or holding the GIL
task.sticky = true
gil_state = C.PyGILState_Ensure()

# Save the stickiness and state for when we release
state = TaskState(task, original_sticky, gil_state)
push!(task_stack.stack, state)

# Increment the count for this task
count = get(task_stack.count, task, 0)
task_stack.count[task] = count + 1

return task_stack
end
function Base.pop!(task_stack::TaskStack)::Task
state = pop!(task_stack.stack)
task = state.task
sticky = state.sticky
gil_state = state.state

# Decrement the count for this task
count = task_stack.count[task] - 1
if count == 0
# If 0, remove it from the key set
pop!(task_stack.count, task)
else
task_stack.count[task] = count
end

C.PyGILState_Release(gil_state)

Base.lock(task_stack.condvar) do
notify(task_stack.condvar)
end

# Restore sticky state after releasing the GIL
task.sticky = sticky

return task
end
Base.in(task::Task, task_stack::TaskStack) = haskey(task_stack.count)
Base.isempty(task_stack::TaskStack) = isempty(task_stack.stack)

if !isdefined(Base, :OncePerThread)

const PerThreadLock = Base.ThreadSynchronizer()

# OncePerThread is implemented in full in Julia 1.12
# This implementation is meant for compatibility with Julia 1.10 and 1.11.
# Using Julia 1.12 is recommended.
mutable struct OncePerThread{T,F} <: Function
@atomic xs::Dict{Int, T} # values
@atomic ss::Dict{Int, UInt8} # states: 0=initial, 1=hasrun, 2=error, 3==concurrent
const initializer::F
function OncePerThread{T,F}(initializer::F) where {T,F}
nt = Threads.maxthreadid()
return new{T,F}(Dict{Int,T}(), Dict{Int,UInt8}(), initializer)
end
end
OncePerThread{T}(initializer::Type{U}) where {T, U} = OncePerThread{T,Type{U}}(initializer)
(once::OncePerThread{T,F})() where {T,F} = once[Threads.threadid()]
function Base.getindex(once::OncePerThread, tid::Integer)
tid = Int(tid)
ss = @atomic :acquire once.ss
xs = @atomic :monotonic once.xs

if haskey(ss, tid) && ss[tid] == 1
return xs[tid]
end

Base.lock(PerThreadLock)
try
state = get(ss, tid, 0)
if state == 0
xs[tid] = once.initializer()
ss[tid] = 1
end
finally
Base.unlock(PerThreadLock)
end
return xs[tid]
end
end

"""
GlobalInterpreterLock

Provides a thread aware reentrant lock around Python's interpreter lock that
ensures that `Task`s acquiring the lock stay on the same thread.
"""
struct GlobalInterpreterLock <: Base.AbstractLock
lock_owners::OncePerThread{TaskStack}
function GlobalInterpreterLock()
return new(OncePerThread{TaskStack}(TaskStack))
end
end
function Base.lock(gil::GlobalInterpreterLock)
push!(gil.lock_owners(), current_task())
return nothing
end
function Base.unlock(gil::GlobalInterpreterLock)
lock_owner::TaskStack = gil.lock_owners()
last_owner::Task = if isempty(lock_owner)
current_task()
else
last(lock_owner)
end
while last_owner != current_task()
if istaskdone(last_owner) && !isempty(lock_owner)
# Last owner is done and unable to unlock the GIL
pop!(lock_owner)
error("Unlock from the wrong task. The Task that owned the GIL is done and did not unlock the GIL: $(last_owner)")
else
# This task does not own the GIL. Wait to unlock the GIL until
# another task successfully unlocks the GIL.
Base.lock(lock_owner.condvar) do
wait(lock_owner.condvar)
end
end
last_owner = if isempty(lock_owner)
current_task()
else
last(lock_owner)
end
end
if isempty(lock_owner)
error("Unlock from wrong task: $(current_task). No tasks on this thread own the lock.")
else
task = pop!(lock_owner)
end
@assert task == current_task()
return nothing
end
function Base.islocked(gil::GlobalInterpreterLock)
return any(!isempty(gil.lock_owners[thread_index]) for thread_index in 1:Threads.maxthreadid())
end
function haslock(gil::GlobalInterpreterLock, task::Task)
lock_owner::TaskStack = gil.lock_owners()
if isempty(lock_owner)
return false
end
return last(lock_owner)::Task == task
end

const _GIL = GlobalInterpreterLock()
10 changes: 10 additions & 0 deletions test/GC.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
using TestItemRunner

@testitem "GC.gc()" begin
using PythonCall
let
pyobjs = map(pylist, 1:100)
PythonCall.GIL.@unlock Threads.@threads for obj in pyobjs
Expand All @@ -13,6 +16,7 @@
end

@testitem "GC.GCHook" begin
using PythonCall
let
pyobjs = map(pylist, 1:100)
PythonCall.GIL.@unlock Threads.@threads for obj in pyobjs
Expand All @@ -23,5 +27,11 @@ end
VERSION >= v"1.10.0-" &&
@test !isempty(PythonCall.GC.QUEUE.items)
GC.gc()

# Unlock and relocking the ReentrantLock allows this test to pass
# if _jl_gil_lock is locked on init
# Base.unlock(PythonCall.GIL._jl_gil_lock)
# Base.lock(PythonCall.GIL._jl_gil_lock)

@test isempty(PythonCall.GC.QUEUE.items)
end
Loading
Loading