Skip to content

gh-126835: Avoid creating unnecessary tuple when looking for constant sequence during constant folding #131054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 12, 2025

Conversation

WolframAlph
Copy link
Contributor

@WolframAlph WolframAlph commented Mar 10, 2025

@WolframAlph
Copy link
Contributor Author

@iritkatriel what would be the good way to test this?

Copy link
Member

@iritkatriel iritkatriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good change. Some naming issues.

return SUCCESS;

nop_out(bb, seq, BINOP_OPERAND_COUNT);
return instr_make_load_const(binop, newconst, consts, const_cache);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense for instr_make_load_const to do the nop_out. It already changes the bytecode with INSTR_SET_OP1, so it makes sense for it to do the whole transformation.

It would need an arg for BINOP_OPERAND_COUNT, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do it. Functions do precisely what their names suggest. What if we would need to use instr_make_load_const in future without noping out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can pass 0 for the "how many to NOP out" arg. We can also, if we want, add assertions in this function that the net change to stack effect of what we're doing is 0.

If you replace BUILD_TUPLE 5 by a LOAD_CONST without the NOPing, then you end up with invalid code.

Copy link
Contributor Author

@WolframAlph WolframAlph Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. My only concern is that function would be doing too much, and it's kind of hard to find a proper name for it to reflect everything it is doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is usage of nop_out without instr_make_load_const so I would probably leave this for another PR as it would require a bit of adjusting some unrelated (to this PR) code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is that function would be doing too much, and it's kind of hard to find a proper name for it to reflect everything it is doing.

The current name of the function doesn't reflect what it does (and that it can leave the bytecode in bad state).

There is usage of nop_out without instr_make_load_const so I would probably leave this for another PR as it would require a bit of adjusting some unrelated (to this PR) code.

You can just leave that usage as is. It won't be impacted by moving the nop_out in other places into instr_make_load_const.

Could also be in another PR.

@iritkatriel
Copy link
Member

@iritkatriel what would be the good way to test this?

The existing functional tests should cover this because this PR changes only implementation details and not behaviour.

Might be worth making sure that we have tests for this on the boundaries of the relevant constants (_PY_STACK_USE_GUIDELINE and MIN_CONST_SEQUENCE_SIZE, so each of these and +-1).

@WolframAlph
Copy link
Contributor Author

WolframAlph commented Mar 12, 2025

MIN_CONST_SEQUENCE_SIZE is covered by test_optimize_if_const_set & test_optimize_if_const_list in test_peepholer.py.

_PY_STACK_USE_GUIDELINE is not covered anywhere AFAIK. I will add test in test_compiler_codegen.py for _PY_STACK_USE_GUIDELINE boundary.

Additionally, I would like to assert that we are not missing optimization by returning early if seq_size > _PY_STACK_USE_GUIDELINE. Basically small function counting how many consecutive constants there are and return count > _PY_STACK_USE_GUIDELINE. It would be stripped out on non debug builds however, so we would need to #ifdef it (or some other way (macro?)). WDYT?

@WolframAlph
Copy link
Contributor Author

It would be stripped out on non debug builds however, so we would need to #ifdef it

Actually, we already have such section in flowgraph.c:

cpython/Python/flowgraph.c

Lines 386 to 405 in 98fa4a4

#ifndef NDEBUG
static bool
cfg_builder_check(cfg_builder *g)
{
assert(g->g_entryblock->b_iused > 0);
for (basicblock *block = g->g_block_list; block != NULL; block = block->b_list) {
assert(!_PyMem_IsPtrFreed(block));
if (block->b_instr != NULL) {
assert(block->b_ialloc > 0);
assert(block->b_iused >= 0);
assert(block->b_ialloc >= block->b_iused);
}
else {
assert (block->b_iused == 0);
assert (block->b_ialloc == 0);
}
}
return true;
}
#endif

So maybe just put it there?

@iritkatriel
Copy link
Member

Basically small function counting how many consecutive constants there are and return count > _PY_STACK_USE_GUIDELINE.

I think it's possible to have a long sequence of LOAD_CONSTS which are not later turned into a tuple, so this check is too harsh.

>>> def f():
...     g(1,2,3,4,5,6,7,8,9,10)
...     
>>> dis.dis(f)
  1           RESUME                   0

  2           LOAD_GLOBAL              1 (g + NULL)
              LOAD_SMALL_INT           1
              LOAD_SMALL_INT           2
              LOAD_SMALL_INT           3
              LOAD_SMALL_INT           4
              LOAD_SMALL_INT           5
              LOAD_SMALL_INT           6
              LOAD_SMALL_INT           7
              LOAD_SMALL_INT           8
              LOAD_SMALL_INT           9
              LOAD_SMALL_INT          10
              CALL                    10
              POP_TOP
              LOAD_CONST               0 (None)
              RETURN_VALUE

@WolframAlph
Copy link
Contributor Author

Then maybe assert constant sequence is within _PY_STACK_USE_GUIDELINE upon entering fold_tuple_of_constants & optimize_lists_and_sets?

static bool
const_folding_within_stack_use_guideline(basicblock *bb, int i)
{
    assert(i < bb->b_iused);

    int count = 0;
    for (; i >= 0; i--) {
        int opcode = bb->b_instr[i].i_opcode;
        if (opcode == NOP) {
            continue;
        }
        if (!loads_const(opcode)) {
            break;
        }
        count++;
    }
    
    return count <= _PY_STACK_USE_GUIDELINE;
}
static int
fold_tuple_of_constants(basicblock *bb, int i, PyObject *consts, PyObject *const_cache)
{
    /* Pre-conditions */
    assert(PyDict_CheckExact(const_cache));
    assert(PyList_CheckExact(consts));
    assert(const_folding_within_stack_use_guideline(bb, i-1));
    ...

@iritkatriel
Copy link
Member

I don't see what you're testing here that we're not covering with the normal tests.

@iritkatriel iritkatriel merged commit 3618240 into python:main Mar 12, 2025
45 checks passed
@iritkatriel
Copy link
Member

I merged, so we won't get more conflicts. We can always add more tests later.

@WolframAlph
Copy link
Contributor Author

Alright. My point being that since we return early with if (seq_size > _PY_STACK_USE_GUIDELINE), maybe it's worth adding assertion to check that we are not missing optimization due to this. In case for some reason in future, codegen emits more consecutive constant stack loads then _PY_STACK_USE_GUIDELINE. But maybe I'm wrong and we don't need it.

plashchynski pushed a commit to plashchynski/cpython that referenced this pull request Mar 17, 2025
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants