Open
Description
Bug report
Bug description:
I've identified a significant performance regression when using Python's free-threaded mode with shared list appends. In my test case, simply appending to a shared list causes a 10-15x performance decrease compared to normal Python operation.
Test Case:
import itertools
import time
def performance_test(n_options=5, n_items=5, iterations=50):
list = []
def expensive_operation():
# Create lists of tuples
data = []
for _ in range(n_options):
data.append([(f"a{i}", f"b{i}") for i in range(n_items)])
# Generate all combinations and create result tuples
results = []
for combo in itertools.product(*data):
result = tuple((x[0], x[1], f"name_{i}") for i, x in enumerate(combo))
results.append(result)
# Commenting the following line solves the performance regression in free-threaded mode
list.append(results)
return results
start = time.time()
for _ in range(iterations):
result = expensive_operation()
duration = time.time() - start
print(f"n_options={n_options}, n_items={n_items}, iterations={iterations}")
print(f"Time: {duration:.4f}s, Combinations: {len(result)}")
return duration
if __name__ == "__main__":
print("Python Performance Regression Test")
print("-" * 40)
performance_test()
Results:
- Standard Python3.13: 0.1290s
- Free-threaded Python3.13t: 2.1643s
- Free-threaded Python 3.14.0a7: 2.1923s
- Free-threaded Python3.13t with list.append commented out: 0.1332s
The regression appears to be caused by contention on the per-list locks and reference count fields when appending to a shared list in free-threaded mode.
CPython versions tested on:
3.14
Operating systems tested on:
Linux
Linked PRs
- gh-132917: Check resident set size (RSS) before GC trigger. #133399
- gh-132917: Use RSS + swap for estimate of process memory usage #133464
- gh-132917: Fix data race detected by tsan #133508
- gh-132917: Use /proc/self/status for mem usage info. #133544
- [3.14] gh-132917: Use /proc/self/status for mem usage info. (GH-133544) #133718