Skip to content

bpart: Turn on invalidation for guard->defined transitions #57615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

Keno
Copy link
Member

@Keno Keno commented Mar 3, 2025

This addresses one of the last remaining TODOs of the binding partition work by performing invalidations when bindings transition from being undefined to being defined. This in particular finally addresses the performance issue that #54733 was intended to address (the issue was closed when we merged the mechanism, but it had so far been turned off). Turning on the invalidations themselves were always easy (a one line deletion). What is harder is making sure that the additional invalidations don't take extra time.

To this end, we add two additional flags, one on Bindings, and one on methods. The flag on bindings tells us whether any method scan has so far found an implicit (not tracked in ->backedges) reference to this binding in any method body. The insight here is that most undefined bindings will not have been referenced previously (because they did not exist), so with a simple one bit saturating counter of the number of edges that would exist (if we did store them), we can fast-path the invalidation.

However, this is not quite sufficient, as people often do things like:

foo() = bar()
bar() = ...
...

which, without further improvements would incur an invalidation upon the definition of bar.

The second insight (and what the flag on Method is for) is that we don't actually need to scan the method body until there is something to invalidate (i.e. until some CodeInstance has been created for the method). By defering the scanning until the first time that inference accesses the lowered code (with a flag to only do it once), we can easily avoid invalidation in the above scenario (while still invalidating if foo() was called before the definition of bar).

As a further bonus, this also speeds up bootstrap by about 20% (putting us about back to where we used to be before the full bpart change) by skipping unnecessary invalidations even for non-guard transitions.

Finally, this does not yet turn on inference's ability to infer guard partitions as Union{}. The reason for this is that such partitions can be replaced by backdated constants without invalidation. However, as soon as we remove the backdated const mechanism, this PR will allow us to turn on that change, further speeding up inference (by cutting off inference on branches known to error due to missing bindings).

@Keno Keno added the backport 1.12 Change should be backported to release-1.12 label Mar 3, 2025
Keno added a commit that referenced this pull request Mar 3, 2025
This implements the optimizations I promised in #57602 by checking
in invalidation whether or not the information that inference sees
changes. The primary situation in which this is useful is avoiding
an invalidation for `export` flag changes or changes of which module
a binding is being imported from that do not change what the actual
binding being imported is. Base itself uses these patterns sparingly,
so the bootstrap impact over #57615 is minimal (though non-zero).
Base automatically changed from kf/bpartflagrename to master March 3, 2025 07:36
Keno added a commit that referenced this pull request Mar 3, 2025
This implements the optimizations I promised in #57602 by checking
in invalidation whether or not the information that inference sees
changes. The primary situation in which this is useful is avoiding
an invalidation for `export` flag changes or changes of which module
a binding is being imported from that do not change what the actual
binding being imported is. Base itself uses these patterns sparingly,
so the bootstrap impact over #57615 is minimal (though non-zero).
Keno added 2 commits March 3, 2025 08:34
On master, when we invalidate a CodeInstance, we also invalidate the
entire associated MethodInstance. However, this is highly problematic,
because we have a lot of CodeInstances that are associated with
`getproperty(::Module, ::Symbol)` through constant propagation.
If one of these CodeInstances gets invalidated (e.g. because
the resolution of const-propagated M.s binding changed), it
would invalidate essentially the entire world.

Prevent this by re-checking the forward edges list to make sure
that the code instance we're invalidating is actually in there.
This addresses one of the last remaining TODOs of the binding partition
work by performing invalidations when bindings transition from being
undefined to being defined. This in particular finally addresses the
performance issue that #54733 was intended to address (the issue was
closed when we merged the mechanism, but it had so far been turned
off). Turning on the invalidations themselves were always easy (a
one line deletion). What is harder is making sure that the additional
invalidations don't take extra time.

To this end, we add two additional flags, one on Bindings, and one on
methods. The flag on bindings tells us whether any method scan has so
far found an implicit (not tracked in ->backedges) reference to this
binding in any method body. The insight here is that most undefined
bindings will not have been referenced previously (because they did
not exist), so with a simple one bit saturating counter of the number
of edges that would exist (if we did store them), we can fast-path
the invalidation.

However, this is not quite sufficient, as people often do things like:
```
foo() = bar()
bar() = ...
...
```
which, without further improvements would incur an invalidation upon
the definition of `bar`.

The second insight (and what the flag on `Method` is for) is that we
don't actually need to scan the method body until there is something
to invalidate (i.e. until some `CodeInstance` has been created for
the method). By defering the scanning until the first time that inference
accesses the lowered code (with a flag to only do it once), we can
easily avoid invalidation in the above scenario (while still invalidating
if `foo()` was called before the definition of `bar`).

As a further bonus, this also speeds up bootstrap by about 20% (putting
us about back to where we used to be before the full bpart change) by
skipping unnecessary invalidations even for non-guard transitions.

Finally, this does not yet turn on inference's ability to infer guard
partitions as `Union{}`. The reason for this is that such partitions
can be replaced by backdated constants without invalidation. However,
as soon as we remove the backdated const mechanism, this PR will allow
us to turn on that change, further speeding up inference (by cutting off
inference on branches known to error due to missing bindings).
Keno added a commit that referenced this pull request Mar 3, 2025
This implements the optimizations I promised in #57602 by checking
in invalidation whether or not the information that inference sees
changes. The primary situation in which this is useful is avoiding
an invalidation for `export` flag changes or changes of which module
a binding is being imported from that do not change what the actual
binding being imported is. Base itself uses these patterns sparingly,
so the bootstrap impact over #57615 is minimal (though non-zero).
Keno added a commit that referenced this pull request Mar 3, 2025
This implements the optimizations I promised in #57602 by checking
in invalidation whether or not the information that inference sees
changes. The primary situation in which this is useful is avoiding
an invalidation for `export` flag changes or changes of which module
a binding is being imported from that do not change what the actual
binding being imported is. Base itself uses these patterns sparingly,
so the bootstrap impact over #57615 is minimal (though non-zero).
Keno added a commit that referenced this pull request Mar 3, 2025
Our implicit edge tracking for bindings does not explicitly store
any edges for bindings in the *current* module. The idea behind
this is that this is a good time-space tradeoff for validation,
because substantially all binding references in a module will be
to its defining module, while the total number of methods within
a module is limited and substantially smaller than the total number
of methods in the entire system.

However, we have an issue where the code that stores these edges
and the invalidation code disagree on which module is the *current*
one. The edge storing code was using the module in which the method
was defined, while the invalidation code was using the one in which
the MethodTable is defined. With these being misaligned, we can
miss necessary invalidations.

Both options are in principle possible, but I think the former is
better, because the module in which the method is defined is also
the module that we are likely to have a lot of references to (since
they get referenced implicitly by just writing symbols in the code).

However, this presents a problem: We don't actually have a way to
iterate all the methods defined in a particular module, without just
doing the brute force thing of scanning all methods and filtering.

To address this, build on the deferred scanning code added in #57615
to also add any scanned modules to an explicit list in `Module`. This
costs some space, but only proportional to the number of defined methods,
(and thus proportional to the written source code).

Note that we don't actually observe any issues in the test suite on
master due to this bug. However, this is because we are grossly
over-invalidating, which hides the missing invalidations from this
issue (#57617).
Keno added a commit that referenced this pull request Mar 3, 2025
This implements the optimizations I promised in #57602 by checking
in invalidation whether or not the information that inference sees
changes. The primary situation in which this is useful is avoiding
an invalidation for `export` flag changes or changes of which module
a binding is being imported from that do not change what the actual
binding being imported is. Base itself uses these patterns sparingly,
so the bootstrap impact over #57615 is minimal (though non-zero).
Keno added a commit that referenced this pull request Mar 3, 2025
Our implicit edge tracking for bindings does not explicitly store
any edges for bindings in the *current* module. The idea behind
this is that this is a good time-space tradeoff for validation,
because substantially all binding references in a module will be
to its defining module, while the total number of methods within
a module is limited and substantially smaller than the total number
of methods in the entire system.

However, we have an issue where the code that stores these edges
and the invalidation code disagree on which module is the *current*
one. The edge storing code was using the module in which the method
was defined, while the invalidation code was using the one in which
the MethodTable is defined. With these being misaligned, we can
miss necessary invalidations.

Both options are in principle possible, but I think the former is
better, because the module in which the method is defined is also
the module that we are likely to have a lot of references to (since
they get referenced implicitly by just writing symbols in the code).

However, this presents a problem: We don't actually have a way to
iterate all the methods defined in a particular module, without just
doing the brute force thing of scanning all methods and filtering.

To address this, build on the deferred scanning code added in #57615
to also add any scanned modules to an explicit list in `Module`. This
costs some space, but only proportional to the number of defined methods,
(and thus proportional to the written source code).

Note that we don't actually observe any issues in the test suite on
master due to this bug. However, this is because we are grossly
over-invalidating, which hides the missing invalidations from this
issue (#57617).
Keno added a commit that referenced this pull request Mar 3, 2025
This implements the optimizations I promised in #57602 by checking
in invalidation whether or not the information that inference sees
changes. The primary situation in which this is useful is avoiding
an invalidation for `export` flag changes or changes of which module
a binding is being imported from that do not change what the actual
binding being imported is. Base itself uses these patterns sparingly,
so the bootstrap impact over #57615 is minimal (though non-zero).
Keno added a commit that referenced this pull request Mar 3, 2025
Our implicit edge tracking for bindings does not explicitly store
any edges for bindings in the *current* module. The idea behind
this is that this is a good time-space tradeoff for validation,
because substantially all binding references in a module will be
to its defining module, while the total number of methods within
a module is limited and substantially smaller than the total number
of methods in the entire system.

However, we have an issue where the code that stores these edges
and the invalidation code disagree on which module is the *current*
one. The edge storing code was using the module in which the method
was defined, while the invalidation code was using the one in which
the MethodTable is defined. With these being misaligned, we can
miss necessary invalidations.

Both options are in principle possible, but I think the former is
better, because the module in which the method is defined is also
the module that we are likely to have a lot of references to (since
they get referenced implicitly by just writing symbols in the code).

However, this presents a problem: We don't actually have a way to
iterate all the methods defined in a particular module, without just
doing the brute force thing of scanning all methods and filtering.

To address this, build on the deferred scanning code added in #57615
to also add any scanned modules to an explicit list in `Module`. This
costs some space, but only proportional to the number of defined methods,
(and thus proportional to the written source code).

Note that we don't actually observe any issues in the test suite on
master due to this bug. However, this is because we are grossly
over-invalidating, which hides the missing invalidations from this
issue (#57617).
Keno added a commit that referenced this pull request Mar 3, 2025
This implements the optimizations I promised in #57602 by checking
in invalidation whether or not the information that inference sees
changes. The primary situation in which this is useful is avoiding
an invalidation for `export` flag changes or changes of which module
a binding is being imported from that do not change what the actual
binding being imported is. Base itself uses these patterns sparingly,
so the bootstrap impact over #57615 is minimal (though non-zero).
Keno added a commit that referenced this pull request Mar 3, 2025
Our implicit edge tracking for bindings does not explicitly store
any edges for bindings in the *current* module. The idea behind
this is that this is a good time-space tradeoff for validation,
because substantially all binding references in a module will be
to its defining module, while the total number of methods within
a module is limited and substantially smaller than the total number
of methods in the entire system.

However, we have an issue where the code that stores these edges
and the invalidation code disagree on which module is the *current*
one. The edge storing code was using the module in which the method
was defined, while the invalidation code was using the one in which
the MethodTable is defined. With these being misaligned, we can
miss necessary invalidations.

Both options are in principle possible, but I think the former is
better, because the module in which the method is defined is also
the module that we are likely to have a lot of references to (since
they get referenced implicitly by just writing symbols in the code).

However, this presents a problem: We don't actually have a way to
iterate all the methods defined in a particular module, without just
doing the brute force thing of scanning all methods and filtering.

To address this, build on the deferred scanning code added in #57615
to also add any scanned modules to an explicit list in `Module`. This
costs some space, but only proportional to the number of defined methods,
(and thus proportional to the written source code).

Note that we don't actually observe any issues in the test suite on
master due to this bug. However, this is because we are grossly
over-invalidating, which hides the missing invalidations from this
issue (#57617).
Keno added a commit that referenced this pull request Mar 4, 2025
Our implicit edge tracking for bindings does not explicitly store any
edges for bindings in the *current* module. The idea behind this is that
this is a good time-space tradeoff for validation, because substantially
all binding references in a module will be to its defining module, while
the total number of methods within a module is limited and substantially
smaller than the total number of methods in the entire system.

However, we have an issue where the code that stores these edges and the
invalidation code disagree on which module is the *current* one. The
edge storing code was using the module in which the method was defined,
while the invalidation code was using the one in which the MethodTable
is defined. With these being misaligned, we can miss necessary
invalidations.

Both options are in principle possible, but I think the former is
better, because the module in which the method is defined is also the
module that we are likely to have a lot of references to (since they get
referenced implicitly by just writing symbols in the code).

However, this presents a problem: We don't actually have a way to
iterate all the methods defined in a particular module, without just
doing the brute force thing of scanning all methods and filtering.

To address this, build on the deferred scanning code added in #57615 to
also add any scanned modules to an explicit list in `Module`. This costs
some space, but only proportional to the number of defined methods, (and
thus proportional to the written source code).

Note that we don't actually observe any issues in the test suite on
master due to this bug. However, this is because we are grossly
over-invalidating, which hides the missing invalidations from this issue
(#57617).
@Keno
Copy link
Member Author

Keno commented Mar 4, 2025

Updated version merged in #57625.

@Keno Keno closed this Mar 4, 2025
KristofferC pushed a commit that referenced this pull request Mar 4, 2025
Our implicit edge tracking for bindings does not explicitly store any
edges for bindings in the *current* module. The idea behind this is that
this is a good time-space tradeoff for validation, because substantially
all binding references in a module will be to its defining module, while
the total number of methods within a module is limited and substantially
smaller than the total number of methods in the entire system.

However, we have an issue where the code that stores these edges and the
invalidation code disagree on which module is the *current* one. The
edge storing code was using the module in which the method was defined,
while the invalidation code was using the one in which the MethodTable
is defined. With these being misaligned, we can miss necessary
invalidations.

Both options are in principle possible, but I think the former is
better, because the module in which the method is defined is also the
module that we are likely to have a lot of references to (since they get
referenced implicitly by just writing symbols in the code).

However, this presents a problem: We don't actually have a way to
iterate all the methods defined in a particular module, without just
doing the brute force thing of scanning all methods and filtering.

To address this, build on the deferred scanning code added in #57615 to
also add any scanned modules to an explicit list in `Module`. This costs
some space, but only proportional to the number of defined methods, (and
thus proportional to the written source code).

Note that we don't actually observe any issues in the test suite on
master due to this bug. However, this is because we are grossly
over-invalidating, which hides the missing invalidations from this issue
(#57617).

(cherry picked from commit 274d80e)
serenity4 pushed a commit to serenity4/julia that referenced this pull request May 1, 2025
This implements the optimizations I promised in JuliaLang#57602 by checking
in invalidation whether or not the information that inference sees
changes. The primary situation in which this is useful is avoiding
an invalidation for `export` flag changes or changes of which module
a binding is being imported from that do not change what the actual
binding being imported is. Base itself uses these patterns sparingly,
so the bootstrap impact over JuliaLang#57615 is minimal (though non-zero).
serenity4 pushed a commit to serenity4/julia that referenced this pull request May 1, 2025
Our implicit edge tracking for bindings does not explicitly store
any edges for bindings in the *current* module. The idea behind
this is that this is a good time-space tradeoff for validation,
because substantially all binding references in a module will be
to its defining module, while the total number of methods within
a module is limited and substantially smaller than the total number
of methods in the entire system.

However, we have an issue where the code that stores these edges
and the invalidation code disagree on which module is the *current*
one. The edge storing code was using the module in which the method
was defined, while the invalidation code was using the one in which
the MethodTable is defined. With these being misaligned, we can
miss necessary invalidations.

Both options are in principle possible, but I think the former is
better, because the module in which the method is defined is also
the module that we are likely to have a lot of references to (since
they get referenced implicitly by just writing symbols in the code).

However, this presents a problem: We don't actually have a way to
iterate all the methods defined in a particular module, without just
doing the brute force thing of scanning all methods and filtering.

To address this, build on the deferred scanning code added in JuliaLang#57615
to also add any scanned modules to an explicit list in `Module`. This
costs some space, but only proportional to the number of defined methods,
(and thus proportional to the written source code).

Note that we don't actually observe any issues in the test suite on
master due to this bug. However, this is because we are grossly
over-invalidating, which hides the missing invalidations from this
issue (JuliaLang#57617).
@KristofferC KristofferC removed the backport 1.12 Change should be backported to release-1.12 label Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants