Open
Description
- optimisation: sort offsets in post-process (@n-io)
- check that the library supports num_chunks > 1 ( 🤷 )
- padded_z_dim unused (@dk949)
- empty comptime_structs (@dk949)
- offset cast from i16 to i32 to i16
- pass params from layout module to program module in
lower-csl-wrapper
- pass nd accumulator for communicate-only apply
- transformations: (csl-stencil-bufferize) AccessOp to read to_tensor's underlying memref #3663
- transforms: (memref-to-dsd) Support 1d subview of nd memref #3653
- dialects: (csl) Switch dsds to use affine maps #3657
- transformations (csl): Add prefetch lowering #3584
- bug: (lower-csl-stencil) Zero-out accumulator for full reduction access #3520
- misc: Stencil and csl lowering fixes #3442
- transforms: add restrict flag to StencilShapeMinimize pass #3411 (
--stencil-shape-minimize{restrict=32,32,32}
) - transformations: New test-add-timers-to-top-level-funcs pass #3407 (
--test-add-timers-to-top-level-funcs
) - transformations: Persist func arg names as arg_attr #3395 (
--function-persist-arg-names
) - canonicalise csl load var ops
- core: Print fp literals losslessly #3381 not losing precision going between mlir-opt and xdsl, simplifying constant logic on both ends
- transformations: (linalg-to-csl) Lower generic to fmac(h|s) #3345
- transformations: New linalg-fuse-multiply-add pass #3347 (this always needs to be run with
require_scalar_factor=true
in our pipeline, and optionally withrequire_erasable_mul=true
@dk949 ) -
transformations: (lift-arith-to-linalg) Add generic FMA #3344 - transformations: (csl-stencil-bufferize) Inject accumulator in all csl-stencil linalg ops #3343
- dialects: (csl-stencil) Add coefficients to apply op #3320 optimisation: use API to set coefficients
- transformations: Support devito timers in the csl pipeline #3312
- transformations: Split varith into neighbour and own data across csl_stencil regions #3307
- transforms: Add convert-varith-to-arith pass #3309
- transformations: (lower-csl-stencil) Add iter args to first region #3304
- transformations: (lower-csl-stencil) Optimise full-stencil access #3271
- transformations: Add convert-arith-to-varith pass #3242
- dialects: (varith) Add varith (variadic arithmetic) dialect #3241
- transformations: (lower-csl-stencil) Promote args before outlining #3237
- transformations: (memref-to-dsd) Handle csl variables #3236
- transformations: New stencil-shape-minimize pass #3229 devito over-estimates grid sizes
- transformations: (lower-csl-stencil) Send only core data #3223
- transformations: New csl-stencil-materialize-stores pass #3222 (
--csl-stencil-materialize-stores
, isBorderRegionPE fix) -
transformations: (lower-csl-stencil) Check isBorderRegionPE #3213 - fix: (csl) adjusted width and height of the PE #3209 (layout module has wrong dimension on width and height params)
- transformations: Add CSL dsd canonicalisation #3208
- backend: (csl) allow
get_dir
to be inlined #3202 - transformations: (lower-csl-stencil) Store results to apply.dest #3203
- transformations: (csl-stencil-to-csl-wrapper) Add
unblock_cmd_stream
call #3198 (was in the wrong place) - Scale offset in callback func by chunk_size
- Add @set_rectangle(width, height); in layout module
- Get mem_dsd needs a comma
- Add exports in layout module
- Hoist allocs out of main func (
--csl-wrapper-hoist-buffers
) - Printer fix for get_dsd (don't print underlying symbol)
- Add memcpy.unblock_cmd_stream(); to program module
- Add
@rpc(@get_data_task_id(memcpy.LAUNCH));
to program module - Add support for loops and async cflow (
--csl-stencil-handle-async-flow
)