Open
Description
This slowdown was found with one of my favorite benchmarks, which is calculating the pi value with the Monte Carlo method.
import os
import random
import time
from threading import Thread
def monte_carlo_pi_part(n: int, idx: int, results: list[int]) -> None:
count = 0
for i in range(n):
x = random.random()
y = random.random()
if x*x + y*y <= 1:
count += 1
results[idx] = count
n = 10000
threads = []
num_threads = 100
results = [0] * num_threads
a = time.time()
for i in range(num_threads):
t = Thread(target=monte_carlo_pi_part, args=(n, i, results))
t.start()
threads.append(t)
while threads:
t = threads.pop()
t.join()
b = time.time()
print(sum(results) / (n * num_threads) * 4)
print(b-a)
Acquiring critical sections for random methods causes this slowdown.
Removing @critical_section
from the method, which uses genrand_uint32
and then updating genrand_uint32
to use atomic operation makes the performance acceptable.
Build | Elapsed | PI |
---|---|---|
Default (with specialization) | 0.16528010368347168 | 3.144508 |
Free-threading (with no specialization) | 0.548654317855835 | 3.1421 |
Free-threading with my patch (with no specialization) | 0.2606849670410156 | 3.141108 |