Skip to content

GH-115506: Improve handling of constants in tier two #124809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Sep 30, 2024

This adds a refs tuple to executor objects, which contains constants created during optimization of tier two traces. This isn't deduplicated at all, since not all of our "constants" are actually "constant", and arbitrary hashes/comparisons can open up a whole can of worms. If we want to go that route, we can probably re-use _PyCode_ConstantKey for known safe, immutable types, and compare everything else by identity.

This also updates some parts of the optimizer to improve the handling of known constants (such as adding peepholing for _POP_TOP_LOAD_CONST_INLINE_BORROW, _REPLACE_WITH_TRUE, and _COPY/_LOAD_FAST with known constant values).

Performance and memory are in the noise... perhaps a bit faster if you squint hard enough. But it's working: the stats show lots of instructions like _REPLACE_WITH_TRUE, _POP_TOP_LOAD_CONST_INLINE_BORROW, _LOAD_FAST, _COPY, and _BINARY_OP_ADD_INT being replaced with simpler instructions like _LOAD_CONST_INLINE_BORROW, _POP_TOP, and _LOAD_CONST_INLINE.

My next step will be experimenting with no-refcount variants of _COPY, _LOAD_FAST, _STORE_FAST, and _POP_TOP with known immortal values.

@brandtbucher brandtbucher added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 30, 2024
@brandtbucher brandtbucher self-assigned this Sep 30, 2024
@markshannon
Copy link
Member

I think this belongs in the (yet to be created) partial evaluation pass.

Any optimization that does evaluation, or transforms representation belongs in that pass.

In that pass, the _BINARY_OP_MULTIPLY_FLOAT case would look something like:

op(_BINARY_OP_MULTIPLY_FLOAT, (left, right -- res)) {
    if (is_const(left) && is_const(right)) {
        assert(PyFloat_CheckExact(get_const(left)));
        assert(PyFloat_CheckExact(get_const(right)));
        PyObject *temp = PyFloat_FromDouble(
            PyFloat_AS_DOUBLE(sym_get_const(left)) *
            PyFloat_AS_DOUBLE(sym_get_const(right)));
        res = virtual_const(temp, ctx);
    }
    else {
        materialize(left, ctx);
        materialize(right, ctx);
        emit(_BINARY_OP_MULTIPLY_FLOAT, ctx);
        res = concrete_tos(ctx);
     }
}

@Fidget-Spinner
Copy link
Member

I think we could extract the useful part (the API to maintain a constant pool in the executor) though, and make the PR just that.

@brandtbucher
Copy link
Member Author

brandtbucher commented Oct 2, 2024

Well, I think the rest of the PR is useful too... 🙂

I'm not sure I understand the desire to have an entirely new pass when the same work could be easily done as part of this one. It seems like a lot of extra code (and computation), for little gain. I also suspect that we would be repeating a ton of the same work in the second pass.

Is there a paper or something that explains why two passes are better than one? My intuition would be to reduce the number of passes, rather than increasing it (for example, by doing abstract interpretation during trace projection).

@Fidget-Spinner
Copy link
Member

Is there a paper or something that explains why two passes are better than one? My intuition would be to reduce the number of passes, rather than increasing it (for example, by doing abstract interpretation during trace projection).

Actually the opposite, the literature suggests combining passes produces better code, not the latter 😉. I think we just want a clean separation of concerns though.

@brandtbucher
Copy link
Member Author

This can wait until the mythical second pass is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants