Updates by jgarzik · Pull Request #2 · jgarzik/apython

jgarzik · 2026-02-22T04:11:15Z

No description provided.

- set_find_slot: equality result in eax was clobbered by pop rax, causing false matches (e.g. {-2} contained -1). Save result in edi before restoring stack. - BINARY_OP inplace float fallback: TAG_FLOAT lacks TAG_RC_BIT so inplace ops (x -= 0.5) fell through to dunder lookup. Add float coercion check in the inplace→non-inplace fallback path. - min/max: rewrite as shared DRY minmax_impl supporting both multi-arg (min(a,b,c)) and single-iterable (min(list)) forms. Previous code only handled multi-arg, returning the list itself for min([1,2,3]). - sum: dispatch to float_add when either operand is TAG_FLOAT, fixing crash on sum([1.0, 2.0, 3.0]). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- COMPARE_OP_INT now pushes TAG_BOOL instead of heap bool (no INCREF/DECREF) - POP_JUMP_IF_FALSE/TRUE inline TAG_BOOL fast path (skip obj_is_true call) - Fused COMPARE_OP_INT_JUMP_FALSE/TRUE superinstructions (opcodes 215/216) - Specialized float binops: ADD/SUB/MUL/TRUEDIV (opcodes 217-220) - Specialized SmallInt multiply (opcode 221) with imul+overflow check - Float truediv includes zero-divisor guard that deopts to generic path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ap_memmove (forward rep movsq, backward manual qword loop) and replace 3 identical element-by-element shift loops with single calls. Eliminates redundant ob_item reload per iteration and leverages hardware-optimized string ops for the forward (pop/remove) case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e zeroing optimization Phase 1: Slice object freelist pool (16 entries) eliminates malloc/free churn for ephemeral slices in tight loops. list_ass_subscript shift loops replaced with ap_memmove. list_getslice step=1 uses ap_memcpy for contiguous copies instead of per-element imul loop. Phase 2: SmallInt floor division fast path in op_binary_op with specialization to opcode 222 (op_binary_op_floordiv_int), bypassing the generic 8-comparison dispatch chain. Phase 3: Unrolled zeroing for frames with 1-4 locals avoids rep stosq startup penalty for small functions (common case). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…mallInt - Tuple pool: freelist for sizes 1-3 (max 16 each), avoiding gc_alloc/gc_dealloc - List header pool: freelist for 40-byte PyListObject headers (max 16) - float_pow: sqrtsd fast path for **0.5 (~12 vs ~100+ cycles), mulsd for **2.0 - enumerate_iternext: inline SmallInt construction, removing int_from_i64 call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This pull request implements a comprehensive set of performance optimizations for a Python interpreter written in x86-64 assembly. The changes focus on reducing memory allocation overhead through object pooling, optimizing memory operations with specialized routines, and adding fast paths for common arithmetic and comparison operations.

Changes:

Object pooling for frequently allocated types (tuples size 1-3, slices, and list headers) to reduce GC pressure
Memory operation optimizations: new ap_memmove implementation and replacement of manual shift loops with bulk memory operations
Arithmetic fast paths: specialized opcodes for int/float operations (add, sub, mul, div) and float power optimization for common exponents (0.5, 2.0)
Control flow optimizations: superinstructions fusing COMPARE_OP_INT with POP_JUMP_IF_FALSE/TRUE, and fast path for TAG_BOOL in conditional jumps
Iterator optimizations: inline SmallInt creation in enumerate, float handling in sum(), and unified min/max implementation with iterator support

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_list_insert_pop.py	New test file for list insert/pop/remove operations
src/pyo/tuple.asm	Implements object pooling for 1-3 element tuples
src/pyo/slice.asm	Implements object pooling for slice objects
src/pyo/list.asm	Implements list header pooling and memcpy/memmove optimizations
src/pyo/set.asm	Fixes register preservation in set_find_slot
src/pyo/float.asm	Adds fast paths for pow(x, 0.5) and pow(x, 2.0)
src/opcodes_misc.asm	Adds specialized opcodes for arithmetic ops and fused compare-jump
src/eval.asm	Updates opcode dispatch table with new specialized opcodes
src/methods.asm	Replaces manual shift loops with ap_memmove in list methods
src/lib/memops.asm	Implements ap_memmove for overlapping memory regions
src/frame.asm	Unrolls zero-fill loops for small frames (1-4 slots)
src/itertools.asm	Inlines SmallInt creation in enumerate iterator
src/builtins_extra.asm	Adds float support to sum(), refactors min/max with shared implementation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-22T04:20:51Z

src/pyo/tuple.asm

+    ; Only gc_track if freshly allocated (pooled tuples are already tracked)
+    ; Check: if tuple came from pool, ob_type is already set from previous use
+    ; For fresh alloc, gc_alloc sets ob_type. We can skip gc_track for pooled.
+    ; Pooled tuples were gc_untracked in dealloc, so we must gc_track them again.


The comment at lines 98-101 is contradictory. It states "Only gc_track if freshly allocated (pooled tuples are already tracked)" but then says "Pooled tuples were gc_untracked in dealloc, so we must gc_track them again." The code calls gc_track unconditionally at line 103, which is correct since pooled tuples ARE untracked in dealloc (line 224). However, the comment should be clarified to avoid confusion.

Suggested change

; Only gc_track if freshly allocated (pooled tuples are already tracked)

; Check: if tuple came from pool, ob_type is already set from previous use

; For fresh alloc, gc_alloc sets ob_type. We can skip gc_track for pooled.

; Pooled tuples were gc_untracked in dealloc, so we must gc_track them again.

; Ensure the tuple is GC-tracked regardless of origin:

; - Fresh tuples from gc_alloc start out untracked.

; - Pooled tuples were gc_untracked in dealloc before being put back.

; Therefore every tuple returned from tuple_new must be gc_track'ed here.

Copilot · 2026-02-22T04:20:51Z

src/pyo/slice.asm

+    lea rdi, [rbx - GC_HEAD_SIZE]
+    call ap_free


When freeing a slice that was allocated with gc_alloc (at line 69), the code should use gc_dealloc instead of manually adjusting the pointer and calling ap_free. The current approach at lines 137-138 bypasses the GC deallocation logic and directly frees memory, which may cause issues with GC bookkeeping. Change to call gc_dealloc(rbx) instead.

Suggested change

lea rdi, [rbx - GC_HEAD_SIZE]

call ap_free

mov rdi, rbx

call gc_dealloc

Copilot · 2026-02-22T04:20:51Z

src/builtins_extra.asm

+    INCREF_VAL r14, rdx
+    DECREF_VAL rax, rdx                ; DECREF iternext result


At lines 2376-2377, the code INCREFs the first element and then immediately DECREFs it. Since iternext returns a new reference that we're taking ownership of, these operations cancel out and are redundant. Consider removing both the INCREF_VAL and DECREF_VAL lines to avoid unnecessary reference count manipulation.

Suggested change

INCREF_VAL r14, rdx

DECREF_VAL rax, rdx ; DECREF iternext result

jgarzik and others added 5 commits February 22, 2026 01:49

jgarzik requested a review from Copilot February 22, 2026 04:11

jgarzik self-assigned this Feb 22, 2026

jgarzik added bug Something isn't working enhancement New feature or request labels Feb 22, 2026

Copilot started reviewing on behalf of jgarzik February 22, 2026 04:11 View session

jgarzik merged commit 56786f4 into main Feb 22, 2026
3 checks passed

jgarzik deleted the updates branch February 22, 2026 04:14

Copilot AI reviewed Feb 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Updates#2

Updates#2
jgarzik merged 5 commits intomainfrom
updates

jgarzik commented Feb 22, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 22, 2026

Uh oh!

Copilot AI Feb 22, 2026

Uh oh!

Copilot AI Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		INCREF_VAL r14, rdx
		DECREF_VAL rax, rdx ; DECREF iternext result

Comments

Conversation

jgarzik commented Feb 22, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant