Skip to content

Commit 08f0dc8

Browse files
Support breakpoints on AVX-512 instructions (#89705)
* Update debugger AMD64 disassembly tables Regenerate with gdb 12. Fix various bugs in the table generation, such as due to incorrect ModRM/reg escape handling, and new instructions. * Remove 3DNow! table support * Remove obsolete AMD XOP encoding support * Update walker for new tables Especially, removed 3DNow, XOP. Also, update README.md * Remove special-case XOP instructions * Remove unused InstrForm Due to removing XOP instructions * Remove another removed InstrForm * Support EVEX encoding * Add more logging for patch decode; handle 64-byte memory in patch * Support EVEX embedded broadcast * Update comment about EVEX `disp8*N` addressing * Fix build
1 parent 0dfe81a commit 08f0dc8

File tree

8 files changed

+1894
-1380
lines changed

8 files changed

+1894
-1380
lines changed

src/coreclr/debug/ee/amd64/amd64InstrDecode.h

Lines changed: 833 additions & 996 deletions
Large diffs are not rendered by default.

src/coreclr/debug/ee/amd64/gen_amd64InstrDecode/Amd64InstructionTableGenerator.cs

Lines changed: 484 additions & 188 deletions
Large diffs are not rendered by default.

src/coreclr/debug/ee/amd64/gen_amd64InstrDecode/README.md

Lines changed: 72 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -18,59 +18,79 @@ gcc -g opcodes.cpp -o opcodes
1818
gdb opcodes -batch -ex "set disassembly-flavor intel" -ex "disass /r opcodes" > opcodes.intel
1919

2020
# Parse disassembly and generate code
21-
cat opcodes.intel | dotnet run > ../amd64InstrDecode.h
21+
# Build as a separate step so it will display build errors, if any.
22+
../../../../../../dotnet.sh build
23+
cat opcodes.intel | ../../../../../../dotnet.sh run > new_amd64InstrDecode.h
2224
```
2325

26+
After checking it, copy the generated new_amd64InstrDecode.h to ../amd64InstrDecode.h.
27+
28+
This process can be run using the `createTables.sh` script in this directory.
29+
2430
## Technical design
2531

26-
`amd64InstrDecode.h`'s primary purpose is to provide a reliable
27-
and accurately mechanism to implement
28-
`Amd64 NativeWalker::DecodeInstructionForPatchSkip(..)`.
32+
The primary purpose of `amd64InstrDecode.h` is to provide a reliable
33+
and accurate mechanism to implement the `amd64`
34+
`NativeWalker::DecodeInstructionForPatchSkip(..)` function.
2935

3036
This function needs to be able to decode an arbitrary `amd64`
3137
instruction. The decoder currently must be able to identify:
3238

33-
- Whether the instruction includes an instruction pointer relative memory access
39+
- Whether the instruction includes an instruction pointer relative memory access (RIP relative addressing)
3440
- The location of the memory displacement within the instruction
3541
- The instruction length in bytes
3642
- The size of the memory operation in bytes
3743

38-
To get this right is complicated, because the `amd64` instruction set is
39-
complicated.
44+
To get this right is complicated, because the `amd64` instruction set is complicated.
45+
46+
A high level view of the `amd64` instruction set can be seen by looking at:
47+
48+
`AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions`
49+
`Section 1.1 Instruction Encoding Overview`
50+
`Figure 1-1. Instruction Encoding Syntax`
51+
52+
also:
53+
54+
`Intel(R) 64 and IA-32 Architectures Software Developer's Manual`
55+
`Volume 2: Instruction Set Reference, A-Z`
56+
`Chapter 2 Instruction Format`
4057

41-
A high level view of the `amd64` instruction set can be seen by looking at
42-
`AMD64 Architecture Programmer’s
43-
Manual Volume 3:
44-
General-Purpose and System Instructions`
45-
`Section 1.1 Instruction Encoding Overview`
46-
`Figure 1-1. Instruction Encoding Syntax`
58+
Also useful in the manuals:
59+
60+
AMD: `Appendix A: Opcode and Operand Encodings`
61+
Intel: `Volume 2, Appendix A: Opcode Map`
4762

4863
The general behavior of each instruction can be modified by many of the
49-
bytes in the 1-15 byte instruction.
64+
bytes in the 1-15 byte instruction (15 is the maximum byte length of an instruction).
5065

5166
This set of files generates a metadata table by extracting the data from
5267
sample instruction disassembly.
5368

54-
The process entails
69+
The process entails:
5570
- Generating a necessary set of instructions
5671
- Generating parsable disassembly for the instructions
5772
- Parsing the disassembly
73+
- Generating the tables
5874

5975
### Generating a necessary set of instructions
6076

77+
What set of possible instruction encodings are needed to extract the information
78+
needed in the tables?
79+
6180
#### The necessary set
6281

63-
- All instruction forms which use instruction pointer relative memory accesses.
82+
- All instruction forms which use RIP relative memory accesses.
6483
- All combinations of modifier bits which affect the instruction form
6584
- presence and/or size of the memory access
6685
- size or presence of immediates
86+
- vector size bits
6787

68-
So with modrm.mod = 0, modrm.rm = 0x5 (instruction pointer relative memory access)
88+
So with modrm.mod = 0, modrm.rm = 0x5 (RIP relative memory access)
6989
we need all combinations of:
7090
- `opcodemap`
7191
- `opcode`
7292
- `modrm.reg`
73-
- `pp`, `W`, `L`
93+
- `pp`, `W`, `L`, `L'L`
7494
- Some combinations of `vvvv`
7595
- Optional prefixes: `repe`, `repne`, `opSize`
7696

@@ -80,7 +100,7 @@ We will iterate through all the necessary set. Many of these combinations
80100
will lead to invalid/undefined encodings. This will cause the disassembler
81101
to give up and mark the disassemble as bad.
82102

83-
The disassemble will then resume trying to disassemble at the next boundary.
103+
The disassembly will then resume trying to disassemble at the next boundary.
84104

85105
To make sure the disassembler attempts to disassemble every instruction,
86106
we need to make sure the preceding instruction is always valid and terminates
@@ -97,8 +117,7 @@ instruction.
97117

98118
Using a fixed suffix makes disassembly parsing simpler.
99119

100-
After the modrm byte, the generated instructions always include a
101-
`postamble`,
120+
After the modrm byte, the generated instructions always include a `postamble`,
102121

103122
```C++
104123
const char* postamble = "0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,\n";
@@ -109,14 +128,14 @@ This meets the padding consistency needs.
109128
#### Ordering
110129

111130
As a convenience to the parser the encoded instructions are logically
112-
ordered. The ordering is generally, but can vary slightly depending on
131+
ordered. The ordering is generally as follows, but can vary slightly depending on
113132
the needs of the particular opcode map:
114133

115134
- map
116135
- opcode
117136
- pp & some prefixes
118137
- modrm.reg
119-
- W, L, vvvv
138+
- W, L, L'L, vvvv
120139

121140
This is to keep related instruction grouped together.
122141

@@ -160,6 +179,7 @@ their sizes. For instance:
160179
- "OWORD PTR [rip+0x53525150]"
161180
- "XMMWORD PTR [rip+0x53525150]"
162181
- "YMMWORD PTR [rip+0x53525150]"
182+
- "ZMMWORD PTR [rip+0x53525150]"
163183
- "FWORD PTR [rip+0x53525150]"
164184
- "TBYTE PTR [rip+0x53525150]"
165185

@@ -177,7 +197,7 @@ gdb opcodes -batch -ex "set disassembly-flavor intel" -ex "disass /r opcodes" >
177197

178198
#### Alternative disassemblers
179199

180-
It seems `objdump` could provide similar results. Untested, the parser may need to
200+
It seems `objdump` could provide similar results. This is untested. The parser may need to
181201
be modified for subtle differences.
182202
```bash
183203
objdump -D -M intel -b --insn-width=15 -j .data opcodes
@@ -186,27 +206,28 @@ objdump -D -M intel -b --insn-width=15 -j .data opcodes
186206
The lldb parser aborts parsing when it observes bad instruction. It
187207
might be usable with additional python scripts.
188208

189-
Windows disassembler may also work. Not attempted.
209+
Windows disassembler may also work. It has not been tried.
190210

191211
### Parsing the disassembly
212+
192213
```bash
193214
# Parse disassembly and generate code
194215
cat opcodes.intel | dotnet run > ../amd64InstrDecode.h
195216
```
217+
196218
#### Finding relevant disassembly lines
197219

198220
We are not interested in all lines in the disassembly. The disassembler
199-
stray comments, recovery and our padding introduce lines we need to ignore.
221+
stray comments, recovery, and our padding introduce lines we need to ignore.
200222

201223
We filter out and ignore non-disassembly lines using a `Regex` for a
202224
disassembly line.
203225

204226
We expect the generated instruction samples to be in a group. The first
205-
instruction in the group is the only one we are interested in. This is
206-
the one we are interested in.
227+
instruction in the group is the only one we are interested in.
207228

208229
The group is terminated by a pair of instructions. The first terminal
209-
instruction must have `0x58` as the last byte in it encoding. The final
230+
instruction must have `0x58` as the last byte in its encoding. The final
210231
terminal instruction must be a `0x59\tpop`.
211232

212233
We continue parsing the first line of each group.
@@ -216,7 +237,7 @@ We continue parsing the first line of each group.
216237
Many encodings are not valid. For `gdb`, these instructions are marked
217238
`(bad)`. We filter and ignore these.
218239

219-
#### Parsing the disassambly for each instruction sample
240+
#### Parsing the disassembly for each instruction sample
220241

221242
For each sample, we need to calculate the important properties:
222243
- mnemonic
@@ -226,9 +247,9 @@ For each sample, we need to calculate the important properties:
226247
- map
227248
- opcode position
228249
- Encoding Flags
229-
- pp, W, L, prefix and encoding flags
250+
- pp, W, L, L'L, prefix and encoding flags
230251
- `SuffixFlags`
231-
- presence of instruction relative accesses
252+
- presence of RIP relative accesses
232253
- size of operation
233254
- position in the list of operands
234255
- number of immediate bytes
@@ -242,37 +263,32 @@ unknown sizes.
242263

243264
#### `opCodeExt`
244265

245-
To facilitate identifying sets of instructions, the creates an `opCodeExt`.
266+
To facilitate identifying sets of instructions, the tool creates an `opCodeExt`.
246267

247268
For the `Primary` map this is simply the encoded opcode from the instruction
248269
shifted left by 4 bits.
249270

250-
For the 3D Now `NOW3D` map this is simply the encoded immediate from the
251-
instruction shifted left by 4 bits.
252-
253-
For the `Secondary` `F38`, and `F39` maps this is the encoded opcode from
254-
the instruction shifted left by 4 bits orred with a synthetic `pp`. The
271+
For the `Secondary`, `F38`, and `F39` maps this is the encoded opcode from
272+
the instruction shifted left by 4 bits or'ed with a synthetic `pp`. The
255273
synthetic `pp` is constructed to match the rules of
256274
`Table 1-22. VEX/XOP.pp Encoding` from the
257-
`AMD64 Architecture Programmer’s
258-
Manual Volume 3:
259-
General-Purpose and System Instructions`. For the case where the opSize
260-
0x66 prefix is present with a `rep*` prefix, the `rep*` prefix is used
261-
to encode `pp`.
275+
`AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions`.
276+
For the case where the opSize 0x66 prefix is present with a `rep*` prefix, the `rep*` prefix is used
277+
to encode `pp`.
262278

263-
For the `VEX*` and `XOP*` maps this is the encoded opcode from
264-
the instruction shifted left by 4 bits orred with `pp`.
279+
For the `VEX*` maps this is the encoded opcode from
280+
the instruction shifted left by 4 bits or'ed with `pp`.
265281

266282
#### Identifying sets of instructions
267283

268-
For most instructions, the opCodeExt will uniquely identify the instruction.
284+
For most instructions, the `opCodeExt` will uniquely identify the instruction.
269285

270-
For many instructions, `modrm.reg` is used to uniquely identify the instruction.
286+
For many instructions, `modrm.reg` is used to help uniquely identify the instruction.
271287
These instruction typically change mnemonic and behavior as `modrm.reg`
272288
changes. These become problematic, when the form of these instructions vary.
273289

274-
For a few other instructions the `L`, `W`, `vvvv` value may the instruction
275-
change behavior. Usually these do not change mnemonic.
290+
For a few other instructions the `L`, `W`, `vvvv` values may change the instruction
291+
behavior. Usually these do not change mnemonic.
276292

277293
The set of instructions is therefore usually grouped by the opcode map and
278294
`opCodeExt` generated above. For these a change in `opCodeExt` or `map`
@@ -291,7 +307,9 @@ set based on the encoding flags. These are the `sometimesFlags`
291307
`sometimesFlags`, check each rule by calling `TestHypothesis`. This
292308
determines if the rule corresponds to the set of observations.
293309

294-
Encode the rule as a string.
310+
Encode the rule as a string. The rule might encode that the `W` bit or `L`
311+
bit causes a different memory/immediate behavior for the particular
312+
`<map, opCodeExt>` entry.
295313

296314
Add the rule to the set of all observed rules.
297315
Add the set's rule with comment to a dictionary.
@@ -328,7 +346,7 @@ samples, is restricted to a max compilation/link unit size. Early drafts
328346
were generating more instructions, and couldn't be compiled.
329347

330348
However, there is no restriction that all the samples must come from
331-
single executable. These could easily be separated by opcode map...
349+
single executable. These could easily be separated by opcode map.
332350

333351
## Risks
334352

@@ -342,7 +360,7 @@ Until a reasonably featured disassembler is created, the new instruction
342360
set can not be supported by this methodology.
343361

344362
The previous methodology of manually encoding these new instruction set
345-
would still be possible....
363+
would still be possible.
346364

347365
### Disassembler errors
348366

@@ -351,6 +369,7 @@ of the disassembler may have disassembly bugs. Using newer disassemblers
351369
would mitigate this to some extent.
352370

353371
### Bugs
372+
354373
- Inadequate samples. Are there other bits which modify instruction
355374
behavior which we missed?
356375
- Parser/Table generator implementation bugs. Does the parser do what it
@@ -368,4 +387,4 @@ Regenerate and compare.
368387

369388
### New debugger feature requires more metadata
370389

371-
Add new feature code, regenerate
390+
Add new feature code, regenerate.

0 commit comments

Comments
 (0)