Skip to content

Commit 56481ca

Browse files
igorban-intelsys-cmllvm
authored andcommitted
Add lsc_merge instructions
For predicated loads merge destination for disabled lanes with values from additional input.
1 parent f0fb932 commit 56481ca

File tree

1 file changed

+106
-0
lines changed

1 file changed

+106
-0
lines changed

GenXIntrinsics/include/llvm/GenXIntrinsics/Intrinsic_definitions.py

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1990,6 +1990,112 @@
19901990
"attributes" : "ReadMem"
19911991
},
19921992

1993+
### ``llvm.genx.lsc.load.merge.*.<return type if not void>.<any type>.<any type>`` : lsc_load merge instructions
1994+
### ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1995+
###
1996+
### * ``llvm.genx.lsc.load.merge.slm`` :
1997+
### * ``llvm.genx.lsc.load.merge.bti`` :
1998+
### * ``llvm.genx.lsc.load.merge.stateless`` :
1999+
###
2000+
### * Exec_size ignored unless operation is transposed (DataOrder == Tranpose)
2001+
### * arg0: {1,32}Xi1 predicate (overloaded)
2002+
### * arg1: i8 Subopcode, [MBZ]
2003+
### * arg2: i8 Caching behavior for L1, [MBC]
2004+
### * arg3: i8 Caching behavior for L3, [MBC]
2005+
### * arg4: i16 Address scale, [MBC]
2006+
### * arg5: i32 Immediate offset added to each address, [MBC]
2007+
### * arg6: i8 The dataum size, [MBC]
2008+
### * arg7: i8 Number of elements to load per address (vector size), [MBC]
2009+
### * arg8: i8 Indicates if the data is transposed during the transfer, [MBC]
2010+
### * arg9: i8 Channel mask for quad versions, [MBC]
2011+
### * arg10: {1,32}Xi{16,32,64} The vector register holding offsets (overloaded)
2012+
### for flat version Base Address + Offset[i] goes here
2013+
### * arg11: i32 surface to use for this operation. This can be an immediate or a register
2014+
### for flat and bindless version pass zero here
2015+
### * arg12: VXi{16,32,64} The data to merge disable channels (overloaded)
2016+
###
2017+
### * Return value: the value read merged witg arg12 by predicate
2018+
###
2019+
### Cache mappings are:
2020+
###
2021+
### - 0 -> .df (default)
2022+
### - 1 -> .uc (uncached)
2023+
### - 2 -> .ca (cached)
2024+
### - 3 -> .wb (writeback)
2025+
### - 4 -> .wt (writethrough)
2026+
### - 5 -> .st (streaming)
2027+
### - 6 -> .ri (read-invalidate)
2028+
###
2029+
### Only certain combinations of CachingL1 with CachingL3 are valid on hardware.
2030+
###
2031+
### +---------+-----+-----------------------------------------------------------------------+
2032+
### | L1 | L3 | Notes |
2033+
### +---------+-----+-----------------------------------------------------------------------+
2034+
### | .df | .df | default behavior on both L1 and L3 (L3 uses MOCS settings) |
2035+
### +---------+-----+-----------------------------------------------------------------------+
2036+
### | .uc | .uc | uncached (bypass) both L1 and L3 |
2037+
### +---------+-----+-----------------------------------------------------------------------+
2038+
### | .st | .uc | streaming L1 / bypass L3 |
2039+
### +---------+-----+-----------------------------------------------------------------------+
2040+
### | .uc | .ca | bypass L1 / cache in L3 |
2041+
### +---------+-----+-----------------------------------------------------------------------+
2042+
### | .ca | .uc | cache in L1 / bypass L3 |
2043+
### +---------+-----+-----------------------------------------------------------------------+
2044+
### | .ca | .ca | cache in both L1 and L3 |
2045+
### +---------+-----+-----------------------------------------------------------------------+
2046+
### | .st | .ca | streaming L1 / cache in L3 |
2047+
### +---------+-----+-----------------------------------------------------------------------+
2048+
### | .ri | .ca | read-invalidate (e.g. last-use) on L1 loads / cache in L3 |
2049+
### +---------+-----+-----------------------------------------------------------------------+
2050+
###
2051+
### Immediate offset. The compiler may be able to fuse this add into the message, otherwise
2052+
### additional instructions are generated to honor the semantics.
2053+
### Alternative variant for predicated variant of loads - merge destination for disabled
2054+
### lanes with values from additional input(arg12)
2055+
###
2056+
### Dataum size mapping is
2057+
###
2058+
### - 1 = :u8
2059+
### - 2 = :u16
2060+
### - 3 = :u32
2061+
### - 4 = :u64
2062+
### - 5 = :u8u32 (load 8b, zero extend to 32b; store the opposite),
2063+
### - 6 = :u16u32 (load 8b, zero extend to 32b; store the opposite),
2064+
### - 7 = :u16u32h (load 16b into high 16 of each 32b; store the high 16)
2065+
###
2066+
"lsc_load_merge_slm" : { "result" : "anyvector",
2067+
"arguments" : ["any","char","char","char","short","int","char","char","char","char","any","int"],
2068+
"attributes" : "ReadMem"
2069+
},
2070+
"lsc_load_merge_stateless" : { "result" : "anyvector",
2071+
"arguments" : ["any","char","char","char","short","int","char","char","char","char","any","int"],
2072+
"attributes" : "ReadMem"
2073+
},
2074+
"lsc_load_merge_bindless" : { "result" : "anyvector",
2075+
"arguments" : ["any","char","char","char","short","int","char","char","char","char","any","int"],
2076+
"attributes" : "ReadMem"
2077+
},
2078+
"lsc_load_merge_bti" : { "result" : "anyvector",
2079+
"arguments" : ["any","char","char","char","short","int","char","char","char","char","any","int"],
2080+
"attributes" : "ReadMem"
2081+
},
2082+
"lsc_load_merge_quad_slm" : { "result" : "anyvector",
2083+
"arguments" : ["any","char","char","char","short","int","char","char","char","char","any","int"],
2084+
"attributes" : "ReadMem"
2085+
},
2086+
"lsc_load_merge_quad_stateless" : { "result" : "anyvector",
2087+
"arguments" : ["any","char","char","char","short","int","char","char","char","char","any","int"],
2088+
"attributes" : "ReadMem"
2089+
},
2090+
"lsc_load_merge_quad_bindless" : { "result" : "anyvector",
2091+
"arguments" : ["any","char","char","char","short","int","char","char","char","char","any","int"],
2092+
"attributes" : "ReadMem"
2093+
},
2094+
"lsc_load_merge_quad_bti" : { "result" : "anyvector",
2095+
"arguments" : ["any","char","char","char","short","int","char","char","char","char","any","int"],
2096+
"attributes" : "ReadMem"
2097+
},
2098+
19932099
### ``llvm.genx.lsc.store.*.<any type>.<any type>.<any vector>`` : lsc_store instructions
19942100
### ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
19952101
###

0 commit comments

Comments
 (0)