Perform reference count analysis in the bytecode compiler #308

markshannon · 2022-03-02T12:15:27Z

markshannon
Mar 2, 2022
Collaborator

Using some localized dataflow (or using the global dataflow if we implement #306),
we can track some refcounts in the compiler instead of the interpreter.

Consider x = a + b + c, which compiles as follows:

LOAD_FAST a
LOAD_FAST b
BINARY_OP '+'
LOAD_FAST c
BINARY_OP '+'
STORE_FAST x

Now annotate with reference count operations

LOAD_FAST a # inc a
LOAD_FAST b # inc b
BINARY_OP '+' # dec a; dec b
LOAD_FAST c # inc c;
BINARY_OP '+' # dec temporary; dec c
STORE_FAST x # dec old-value

Now consider the ideal sequence

LOAD_FAST_BORROW a 
LOAD_FAST_BORROW b 
BINARY_OP '+', 00
LOAD_FAST_BORROW c 
BINARY_OP '+', 10 # dec temporary
STORE_FAST x # dec old-value

That's 8 refcount operations reduced to 2 (or 1 with #306)
This is admittedly an extreme case, but there is clearly room for improvement.

There is some cost to checking the "decref bits" on the operations, but there is also a potential advantage in allowing us to refine the refcount == 1 trick to only those cases where there is a decrement of the refcount.

markshannon · 2022-03-02T12:38:38Z

markshannon
Mar 2, 2022
Collaborator Author

This will have to wait until #235 is complete

0 replies

gramster · 2022-03-02T18:44:13Z

gramster
Mar 2, 2022
Maintainer

Just as an aside, as I am interested in whether it was ever considered: would there be value in making refcount operations explicit bytecode ops rather than being implicit? As I can imagine that many of them could be optimized away at the function level. Or is the cost of those operations so negligible relative to other things that making them require a pass of the eval loop would far outweigh any benefit?

0 replies

markshannon · 2022-03-03T20:15:45Z

markshannon
Mar 3, 2022
Collaborator Author

The cost of the extra instruction will generally cost more than the refcount operation, but the idea of lowering the bytecode to make refcounting explicit has value.

You can consider instructions with refcounting to be a form of superinstruction.
E.g. LOAD_FAST could be viewed as a superinstruction combining LOAD_FAST_BORROW and INCREF.

So, one way to implement a refcount removal scheme would be for the front-end to emit low-level bytecodes, then the optimizer would then try to remove as many as possible. The assembler would combine the remnants into larger instructions where possible.
E.g.
x = a + b
Front end produces:

LOAD_FAST_BORROW a 
INCREF

LOAD_FAST_BORROW b 
INCREF

BINARY_OP '+', 11
STORE_FAST x

The optimizer would sink the INCREFs to produce:

LOAD_FAST_BORROW a
LOAD_FAST_BORROW b
BINARY_OP '+', 00
STORE_FAST x

In cases where the optimizer could not remove the INCREFs, the assembler would combine many of them back into the usual form.

1 reply

iritkatriel Mar 3, 2022
Maintainer

How does x = a + b + c work with this idea?

sbrunthaler · 2022-03-17T13:15:35Z

sbrunthaler
Mar 17, 2022

Just as a quick reference: I did implement this optimization (i.e., reference count operation elimination through quickening) in my DLS'10 paper (https://www.unibw.de/ucsrl/pubs/dls10.pdf/view).

The performance improved somewhat relative to the ECOOP'10 paper, which presented the original idea, but not applied to a lot of instructions. Although the performance wasn't a huge improvement, contrary to what I expected initially, this approach did lead me to the multi-level quickening (MLQ) work. MLQ effectively subsumes reference-count operation elimination by manipulating native-machine data objects, which need not be managed through automatic memory management. By way of unboxing, data locality also increases and interpreter instruction implementation can be cut down to a few machine instructions. Once interpreter instructions are at that performance level, type-based superinstructions are the final step in eliminating instruction dispatch costs.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perform reference count analysis in the bytecode compiler #308

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Perform reference count analysis in the bytecode compiler #308

Uh oh!

Uh oh!

markshannon Mar 2, 2022 Collaborator

Replies: 4 comments · 1 reply

Uh oh!

markshannon Mar 2, 2022 Collaborator Author

Uh oh!

gramster Mar 2, 2022 Maintainer

Uh oh!

markshannon Mar 3, 2022 Collaborator Author

Uh oh!

iritkatriel Mar 3, 2022 Maintainer

Uh oh!

sbrunthaler Mar 17, 2022

markshannon
Mar 2, 2022
Collaborator

Replies: 4 comments 1 reply

markshannon
Mar 2, 2022
Collaborator Author

gramster
Mar 2, 2022
Maintainer

markshannon
Mar 3, 2022
Collaborator Author

iritkatriel Mar 3, 2022
Maintainer

sbrunthaler
Mar 17, 2022