forked from jax-ml/jax
-
Notifications
You must be signed in to change notification settings - Fork 4
CI: 05/28/25 upstream sync #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rocm-repo-management-api-2
wants to merge
1,706
commits into
rocm-main
Choose a base branch
from
ci-upstream-sync-203_1
base: rocm-main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A lot of this logic was confusing phrased as conditions over both CPU and GPU build flags. But we can decompose it: * dependencies we add for CPU tests, and * additional dependencies we add for GPU tests. While we are here, also add the necessary pypi dependency for TPU tests.
hold references to raw buffers instead of PjRtBuffers. This fixes an issue where the buffers can be deleted before the transfer is complete, but introduces another problem where if they are donated it will now silently read from donated arrays. Once the underlying runtime exposes usage holds properly, this new codepath should take a usage hold and the old pjrtbuffer path should be removed. PiperOrigin-RevId: 758819621
PiperOrigin-RevId: 758833461
These had been accidentally broken at some point in the plugin switchover..
`slice` is not hashable before Python 3.12. This change mitigates it by converting it into a hash value. PiperOrigin-RevId: 758905560
PiperOrigin-RevId: 758915292
We must not depend on the nvidia_nvshmem_cu12 pip package directly since it does not exist on Windows and Mac platforms. PiperOrigin-RevId: 758917499
The errors are too verbose and mostly not very useful. PiperOrigin-RevId: 759025165
We weren't handling them correctly meaning you couldn't use a `shard_map`/`ManualComputationOp` which has callbacks inside. PiperOrigin-RevId: 759072597
http://github.com/openxla/xla/commit/ab7cea20271d8a24a7309e09fc5af486dde8e155. PiperOrigin-RevId: 759095567
The "add a token" part of the `callback` primitive's MLIR lowering was incorrectly adding a ranked sharding by using the sharding of a ranked tensor. So instead create an unranked sharding explicitly PiperOrigin-RevId: 759135477
…extra PiperOrigin-RevId: 759203972
PiperOrigin-RevId: 759221096
PiperOrigin-RevId: 759252455
PiperOrigin-RevId: 759294851
PiperOrigin-RevId: 759301396
shouldn't affect existing behaviors, or trace time The main implementation ideas: * each Trace is tagged with a `requires_low: bool` * each Jaxpr * is tagged with an `is_high: bool`, default False but set True while tracing if any hijax primitives are encountered * includes an `mut_types: dict[Var, HijaxType]` indicating final types for type-changing mutable hijax types * each AbstractValue is tagged by a `mutable: bool` which is read to populate `mut_types` * each Primitive * has an `is_high(**params) -> bool` method (depends on params for HOPs) * has a `to_lojax(*args, **params)` method taking and returning hijaxtypes-wrapping-lowtracers * in `Primitive.bind`, we check if `prim.is_high(**params) and trace.requires_low`, and if so we call `prim.to_lojax` Co-authored-by: Dougal Maclaurin <dougalm@google.com>
PiperOrigin-RevId: 759336328
…tly it looks like this. ``` ValueError: Pytree for `in_specs` and inputs do not match. There are 1 mismatches, including: * `in_specs` is a tuple of length 1 but inputs is a tuple of length 4, so the lengths do not match ``` PiperOrigin-RevId: 759499528
http://github.com/openxla/xla/commit/5fee96f09a42daa80283dde9fb7090ba90d9d07a. PiperOrigin-RevId: 759564260
…t_dict_merge PiperOrigin-RevId: 759579563
PiperOrigin-RevId: 759602792
The implementation currently forces O=0 due to a suspected bug in the NVPTX backend. To get source information * Set MOSAIC_GPU_LINE_INFO=1 * Run with --jax_include_full_tracebacks_in_locations=true PiperOrigin-RevId: 759608368
http://github.com/openxla/xla/commit/5a5e232f7bb9a2fa0d79f461f86a3cfa2c78f2cf. PiperOrigin-RevId: 763372229
The C128 matmuls will be routed to cuBLAS rather than to be handled by the loop emitter, causing a very slight numerical difference. Therefore, don't be very strict in the comparison. PiperOrigin-RevId: 763397887
PiperOrigin-RevId: 763697379
…om-ptxas-and-llvm PiperOrigin-RevId: 763701410
http://github.com/openxla/xla/commit/cb67f2f7ce4787f63f5fc80dc5c30cd3dee8f4e3. PiperOrigin-RevId: 763710186
…yout in some ops I can't explain it, but if we don't do it then the verifier sometimes fails... I'm not even sure how to properly trigger this in a test right now, but worst case it would result in more verifier failures to fix, so I think it's fine to merge as is. PiperOrigin-RevId: 763711454
I thought this doesn't work, but it does! Still, adding a test to make sure we don't regress it. PiperOrigin-RevId: 763717665
If we don't synchronize the warps, some of them can go on and schedule e.g. async copies without waiting for the memory transactions of other warps in the warpgroup to complete. PiperOrigin-RevId: 763721411
…rue` PiperOrigin-RevId: 763730217
Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, and leads to improved build and iteration times. This was unblocked by moving ad, batching, and custom_transpose to their own rules in prior changes. It required one small code refactoring: moving an effects registration to the location where the effect is defined. PiperOrigin-RevId: 763736189
…TPU interpret mode. Since dimensions with parallel semantics must now appear as the leading dimensions of the grid, this CL also makes the sequential iteration over cores in the simulation never re-visit a core after the simulation has moved on to the next core. This enables the simulation to correctly omit loads and stores of kernel buffers if the same (slice of a) buffer is processed by multiple kernel invocations on the same core. PiperOrigin-RevId: 763737647
…on ASAN. PiperOrigin-RevId: 763756072
We already call `xla::sdy::addSdyRoundTripExportPipeline` in `xla::SerializeUsingVersionedStablehlo` so no need for this anymore. PiperOrigin-RevId: 763762358
Just to give us extra confidence while we make changes. PiperOrigin-RevId: 763767275
We sometimes access NVSHMEM functions from the host code too, which means we should include the NVSHMEM host library in the context of the ExecutionEngine. PiperOrigin-RevId: 763777731
This will make it much simpler to make the kernel persistent. PiperOrigin-RevId: 763782577
Before this fix, the test would finish before execution was done, and profiling would thus yield nothing. PiperOrigin-RevId: 763783695
http://github.com/openxla/xla/commit/a566a66e53c489f947eb6c04fe44205013250922. PiperOrigin-RevId: 763822788
…ToXlaComputation`. PiperOrigin-RevId: 763837933
…nsertion Enabling this flag can introduce races into certain kernels, which is why it's False by default. Still, there's plenty of kernels where it's unnecessary and a few of those suffer performance regressions when it is on. So it makes sense to at least allow users to opt out. PiperOrigin-RevId: 763853668
PiperOrigin-RevId: 763862020
PiperOrigin-RevId: 763865376
…effects PiperOrigin-RevId: 763886950
PiperOrigin-RevId: 763950695
Previously the result of vmapped RA2A was concatenating a flattened result. PiperOrigin-RevId: 763958632
PiperOrigin-RevId: 764019664
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Daily sync with upstream