Skip to content

Commit bbeb108

Browse files
namhyungacmel
authored andcommitted
perf mem: Document new output fields (op, cache, mem, dtlb, snoop)
Update the documentation of the new fields with examples and caveats. Also update the related documentation for AMD IBS. Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250610005742.2173050-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
1 parent 11cfaf3 commit bbeb108

File tree

2 files changed

+92
-17
lines changed

2 files changed

+92
-17
lines changed

tools/perf/Documentation/perf-amd-ibs.txt

Lines changed: 42 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -171,23 +171,48 @@ Below is a simple example of the perf mem tool.
171171
# perf mem report
172172

173173
A normal perf mem report output will provide detailed memory access profile.
174-
However, it can also be aggregated based on output fields. For example:
175-
176-
# perf mem report -F mem,sample,snoop
177-
Samples: 3M of event 'ibs_op//', Event count (approx.): 23524876
178-
Memory access Samples Snoop
179-
N/A 1903343 N/A
180-
L1 hit 1056754 N/A
181-
L2 hit 75231 N/A
182-
L3 hit 9496 HitM
183-
L3 hit 2270 N/A
184-
RAM hit 8710 N/A
185-
Remote node, same socket RAM hit 3241 N/A
186-
Remote core, same node Any cache hit 1572 HitM
187-
Remote core, same node Any cache hit 514 N/A
188-
Remote node, same socket Any cache hit 1216 HitM
189-
Remote node, same socket Any cache hit 350 N/A
190-
Uncached hit 18 N/A
174+
New output fields will show related access info together. For example:
175+
176+
# perf mem report -F overhead,cache,snoop,comm
177+
...
178+
# Samples: 92K of event 'ibs_op//'
179+
# Total weight : 531104
180+
#
181+
# ---------- Cache ----------- --- Snoop ----
182+
# Overhead L1 L2 L1-buf Other HitM Other Command
183+
# ........ ............................ .............. ..........
184+
#
185+
76.07% 5.8% 35.7% 0.0% 34.6% 23.3% 52.8% cc1
186+
5.79% 0.2% 0.0% 0.0% 5.6% 0.1% 5.7% make
187+
5.78% 0.1% 4.4% 0.0% 1.2% 0.5% 5.3% gcc
188+
5.33% 0.3% 3.9% 0.0% 1.1% 0.2% 5.2% as
189+
5.00% 0.1% 3.8% 0.0% 1.0% 0.3% 4.7% sh
190+
1.56% 0.1% 0.1% 0.0% 1.4% 0.6% 0.9% ld
191+
0.28% 0.1% 0.0% 0.0% 0.2% 0.1% 0.2% pkg-config
192+
0.09% 0.0% 0.0% 0.0% 0.1% 0.0% 0.1% git
193+
0.03% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% rm
194+
...
195+
196+
Also, it can be aggregated based on various memory access info using the
197+
sort keys. For example:
198+
199+
# perf mem report -s mem,snoop
200+
...
201+
# Samples: 92K of event 'ibs_op//'
202+
# Total weight : 531104
203+
# Sort order : mem,snoop
204+
#
205+
# Overhead Samples Memory access Snoop
206+
# ........ ............ ....................................... ............
207+
#
208+
47.99% 1509 L2 hit N/A
209+
25.08% 338 core, same node Any cache hit HitM
210+
10.24% 54374 N/A N/A
211+
6.77% 35938 L1 hit N/A
212+
6.39% 101 core, same node Any cache hit N/A
213+
3.50% 69 RAM hit N/A
214+
0.03% 158 LFB/MAB hit N/A
215+
0.00% 2 Uncached hit N/A
191216

192217
Please refer to their man page for more detail.
193218

tools/perf/Documentation/perf-mem.txt

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,22 @@ REPORT OPTIONS
119119
And the default sort keys are changed to local_weight, mem, sym, dso,
120120
symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.
121121

122+
-F::
123+
--fields=::
124+
Specify output field - multiple keys can be specified in CSV format.
125+
Please see linkperf:perf-report[1] for details.
126+
127+
In addition to the default fields, 'perf mem report' will provide the
128+
following fields to break down sample periods.
129+
130+
- op: operation in the sample instruction (load, store, prefetch, ...)
131+
- cache: location in CPU cache (L1, L2, ...) where the sample hit
132+
- mem: location in memory or other places the sample hit
133+
- dtlb: location in Data TLB (L1, L2) where the sample hit
134+
- snoop: snoop result for the sampled data access
135+
136+
Please take a look at the OUTPUT FIELD SELECTION section for caveats.
137+
122138
-T::
123139
--type-profile::
124140
Show data-type profile result instead of code symbols. This requires
@@ -156,6 +172,40 @@ but one sample with weight 180 and the other with weight 20:
156172
90% [k] memcpy
157173
10% [.] strcmp
158174

175+
OUTPUT FIELD SELECTION
176+
----------------------
177+
"perf mem report" adds a number of new output fields specific to data source
178+
information in the sample. Some of them have the same name with the existing
179+
sort keys ("mem" and "snoop"). So unlike other fields and sort keys, they'll
180+
behave differently when it's used by -F/--fields or -s/--sort.
181+
182+
Using those two as output fields will aggregate samples altogether and show
183+
breakdown.
184+
185+
$ perf mem report -F mem,snoop
186+
...
187+
# ------ Memory ------- --- Snoop ----
188+
# RAM Uncach Other HitM Other
189+
# ..................... ..............
190+
#
191+
3.5% 0.0% 96.5% 25.1% 74.9%
192+
193+
But using the same name for sort keys will aggregate samples for each type
194+
separately.
195+
196+
$ perf mem report -s mem,snoop
197+
# Overhead Samples Memory access Snoop
198+
# ........ ............ ....................................... ............
199+
#
200+
47.99% 1509 L2 hit N/A
201+
25.08% 338 core, same node Any cache hit HitM
202+
10.24% 54374 N/A N/A
203+
6.77% 35938 L1 hit N/A
204+
6.39% 101 core, same node Any cache hit N/A
205+
3.50% 69 RAM hit N/A
206+
0.03% 158 LFB/MAB hit N/A
207+
0.00% 2 Uncached hit N/A
208+
159209
SEE ALSO
160210
--------
161211
linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]

0 commit comments

Comments
 (0)