Skip to content

GH-105229: Replace some superinstructions with single instruction equivalent. #105230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 5, 2023

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Jun 2, 2023

Shows a surprising, but pleasing, 1.7% speedup.
These are common instructions, so the reduction in code size and memory reads is considerable.

In faster-cpython/ideas#585 I refer these to as "single instruction superinstructions", which is a rather clumsy name. I'm not sure if these need a name, but suggestions for a better name are welcome.

I've left the superinstructions involving LOAD_CONST alone, as we are looking at other approaches to speed up constants.

@markshannon markshannon changed the title Replace some superinstructions with single instruction equivalent. GH-105229: Replace some superinstructions with single instruction equivalent. Jun 2, 2023
Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these need a name, but suggestions for a better name are welcome.

"Combined instructions", maybe?

@@ -222,6 +222,9 @@ def pseudo_op(name, op, real_ops):
def_op('DICT_MERGE', 164)
def_op('DICT_UPDATE', 165)

def_op('LOAD_FAST_LOAD_FAST', 168)
def_op('STORE_FAST_LOAD_FAST', 169)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, how are the stats on this? Is it worth keeping?

I would expect most of these pairs to span multiple lines. Although maybe comprehensions and generator expressions make up for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, as you say it might be quite low.
I think we should keep it for this PR anyway, as we are replacing two-codeunit superinstructions with the one-codeunit equivalent. We can change the choice of superinstructions in another PR.

@@ -0,0 +1 @@
Replace some dynamic superinstructions will single instruction equivalents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Replace some dynamic superinstructions will single instruction equivalents.
Replace some dynamic superinstructions with single instruction equivalents.

@@ -1586,6 +1586,54 @@ optimize_cfg(cfg_builder *g, PyObject *consts, PyObject *const_cache)
return SUCCESS;
}

static void
make_super_instruction(cfg_instr *inst1, cfg_instr *inst2, int super_op)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
make_super_instruction(cfg_instr *inst1, cfg_instr *inst2, int super_op)
make_combined_instruction(cfg_instr *inst1, cfg_instr *inst2, int combined_op)

@@ -0,0 +1 @@
Replace some dynamic superinstructions will single instruction equivalents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Replace some dynamic superinstructions will single instruction equivalents.
Replace some dynamic superinstructions with single instruction equivalents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. Crossed posts.

Comment on lines 1592 to 1594
/* Skip if instructions are on different lines */
if (inst1->i_loc.lineno != inst2->i_loc.lineno) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a comment from @brandtbucher that this requirement eliminated most super-instructions (faster-cpython/ideas#585 (comment)). And yet your pyperformance results on this PR are quite good. I wonder how these two things are both true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instructions without observable side effects (e.g. LOAD_FAST, or maybe POP_TOP), can't we still do a super instruction and keep the NOP for the extra lineno?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a comment

Oh, I realize now that this comment was specific to STORE_FAST__LOAD_FAST.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how these two things are both true.

Oh, I guess the most likely answer is in faster-cpython/ideas#585 (comment) :

it appears that the reduction in code size more than compensates.

Copy link
Member Author

@markshannon markshannon Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is just my hypothesis.

The goal is to remove superinstructions that cross block and line boundaries, as they are problematic.
I was just trying to not slow things down by removing them. A speedup was just a pleasant surprise.
Even if the speedup is mainly from layout changes, or cache effects, the main motivation of this PR remains valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants