Skip to content

[analysis] Add a "Shared" lattice to represent shared state #6067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 2, 2023

Conversation

tlively
Copy link
Member

@tlively tlively commented Oct 31, 2023

The analysis framework stores a separate lattice element for each basic block
being analyzed to represent the program state at the beginning of the block.
However, in many analyses a significant portion of program state is not
flow-sensitive, so does not benefit from having a separate copy per block. For
example, an analysis might track constraints on the types of locals that do not
vary across blocks, so it really only needs a single copy of the constrains for
each local. It would be correct to simply duplicate the state across blocks
anyway, but it would not be efficient.

To make it possible to share a single copy of a lattice element across basic
blocks, introduce a Shared<L> lattice. Mathematically, this lattice represents
a single ascending chain in the underlying lattice and its elements are ordered
according to sequence numbers corresponding to positions in that chain.
Concretely, though, the Shared<L> lattice only ever materializes a single,
monotonically increasing element of L and all of its elements provide access
to that shared underlying element.

Shared<L> will let us get the benefits of having mutable shared state in the
concrete implementation of analyses without losing the benefits of keeping those
analyses expressible purely in terms of the monotone framework.

@kripken
Copy link
Member

kripken commented Oct 31, 2023

Mathematically, this lattice represents a single ascending chain in the underlying lattice

I followed up to here, and I got stuck.

Can you maybe give a concrete example of this?

As context, after reading the first paragraph I was expecting to see something like a "wrapper" for a lattice element that allows us to refer to an element without copying it. But I don't know why we need some new kind of structure here (the new chain) or even what that precisely means.

@tlively
Copy link
Member Author

tlively commented Oct 31, 2023

Unfortunately we can't just have a std::shared_ptr to a mutable underlying lattice element and call it a day, even though that's kind of what we would like to have. This new lattice needs to be a lattice so that it fits properly into the framework. This isn't just to make the compiler happy; it's important that join returns true to the analysis worklist algorithm when and only when a block needs to be re-analyzed because its input has changed since the last time it was analyzed.

To determine whether a block needs to be re-analyzed, we keep a sequence number alongside the pointer to the shared underlying element. When we propagate new results to the beginning of the block, we know the block needs to be re-analyzed if and only if the propagated sequence number is higher than the old sequence number. Every time the value of the shared underlying element increases, we increase the current sequence number so that any block that was analyzed based on a previous value will be re-analyzed.

The sequence numbers are an important part of getting the new lattice to do what we need it to do, but they're also mathematically interesting because each successive sequence number corresponds to a higher value that the shared underlying element has taken on at some point in time. The sequence of those values through time form an increasing chain in the underlying lattice and the sequence numbers are both timestamps and indices into that chain.

It's also nice to know that this Shared utility isn't just a hack to get shared, mutable state into the framework. It's actually a mathematically justifiable lattice. That gives me more confidence that it won't end up violating core assumptions and requiring hacky workarounds or special casing in the future.

@tlively tlively force-pushed the refactor-lattice-gtest branch from e8e0ab4 to f65d8a0 Compare November 1, 2023 01:19
@tlively tlively force-pushed the refactor-lattice-gtest branch from f65d8a0 to 3e6b7dd Compare November 1, 2023 17:16
Base automatically changed from refactor-lattice-gtest to main November 1, 2023 18:09
The analysis framework stores a separate lattice element for each basic block
being analyzed to represent the program state at the beginning of the block.
However, in many analyses a significant portion of program state is not
flow-sensitive, so does not benefit from having a separate copy per block. For
example, an analysis might track constraints on the types of locals that do not
vary across blocks, so it really only needs a single copy of the constrains for
each local. It would be correct to simply duplicate the state across blocks
anyway, but it would not be efficient.

To make it possible to share a single copy of a lattice element across basic
blocks, introduce a `Shared<L>` lattice. Mathematically, this lattice represents
a single ascending chain in the underlying lattice and its elements are ordered
according to sequence numbers corresponding to positions in that chain.
Concretely, though, the `Shared<L>` lattice only ever materializes a single,
monotonically increasing element of `L` and all of its elements provide access
to that shared underlying element.

`Shared<L>` will let us get the benefits of having mutable shared state in the
concrete implementation of analyses without losing the benefits of keeping those
analyses expressible purely in terms of the monotone framework.
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the idea now makes sense to me, and lgtm % comments

// Internally, there is only ever a single monotonically increasing element of L
// materialized. Dereferencing any element of the Shared lattice will produce
// the current value of that single element of L, which is generally safe
// because the current value always overapproximates the value at the time of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "overapproximate" mean here? A "higher" value in the lattice L?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep 👍

template<Lattice L> struct Shared {
class Element {
const typename L::Element* val;
uint32_t seq = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to see some of the comments in the PR itself added as comments here, to explain what seq is etc.

// sequence number. The sequence numbers monotonically increase along with
// `val` and serve to provide ordering between elements of this lattice.
mutable typename L::Element val;
mutable uint32_t seq = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be the 2nd and 3rd time we use mutable in this entire codebase... can we avoid that?

If we can't please add a comment explaining why this dangerous pattern is necessary here.

template<Lattice L> struct Shared {
class Element {
const typename L::Element* val;
uint32_t seq = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint32_t seq = 0;
Index seq = 0;

Index seems appropriate as this number is tied to the number of basic blocks (and iterations on them), which is tied to maximum sizes in wasm files. That is, if wasm supports more then 32 bits here, we'd want this type to change, which Index will do for us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number is not tied to the number of basic blocks. It is the number of times the shared value is increased, which can be dramatically higher than the number of basic blocks. Since analysis continues to a fixed point, it can visit basic blocks an unbounded number of times and each visit to a basic block may potentially increase the value multiple times.

Since this isn't actually an index into anything that is ever materialized, I would prefer to keep it a uint32_t for now, or possibly make it a uint64_t right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm. How about using a name for it for convenience later if we need to change it, using Seq = uint32_t; ?


auto zero = shared.getBottom();

auto one = zero;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having trouble reading a line that says literally 1 == 0...

(And later down I see 2 == 1. By induction, all natural numbers are apparently 0... 🤔 😆 )

How about a, b, c?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to increase a shared value is via a join, so to create our one value we have to start from zero and add one via the join on the next line. I'll clarify this in the code.


EXPECT_EQ(shared.compare(one, zero), analysis::GREATER);
EXPECT_EQ(shared.compare(one, one), analysis::EQUAL);
EXPECT_EQ(shared.compare(one, uno), analysis::EQUAL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I follow, uno had an extra join done on it, but it was of the same value as we had before, so the sequence number was not increased, hence it is equal to one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right 👍

@tlively
Copy link
Member Author

tlively commented Nov 2, 2023

Merge activity

@tlively tlively merged commit bc88d5d into main Nov 2, 2023
@tlively tlively deleted the shared-lattice branch November 2, 2023 18:31
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
…bly#6067)

The analysis framework stores a separate lattice element for each basic block
being analyzed to represent the program state at the beginning of the block.
However, in many analyses a significant portion of program state is not
flow-sensitive, so does not benefit from having a separate copy per block. For
example, an analysis might track constraints on the types of locals that do not
vary across blocks, so it really only needs a single copy of the constrains for
each local. It would be correct to simply duplicate the state across blocks
anyway, but it would not be efficient.

To make it possible to share a single copy of a lattice element across basic
blocks, introduce a `Shared<L>` lattice. Mathematically, this lattice represents
a single ascending chain in the underlying lattice and its elements are ordered
according to sequence numbers corresponding to positions in that chain.
Concretely, though, the `Shared<L>` lattice only ever materializes a single,
monotonically increasing element of `L` and all of its elements provide access
to that shared underlying element.

`Shared<L>` will let us get the benefits of having mutable shared state in the
concrete implementation of analyses without losing the benefits of keeping those
analyses expressible purely in terms of the monotone framework.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants