Skip to content

Commit ba11ba6

Browse files
ahunter6acmel
authored andcommitted
perf intel-pt: Add mispred-all config option to aid use with autofdo
autofdo incorrectly expects branch flags to include either mispred or predicted. In fact mispred = predicted = 0 is valid and means the flags are not supported, which they aren't by Intel PT. To make autofdo work, add a config option which will cause Intel PT decoder to set the mispred flag on all branches. Below is an example of using Intel PT with autofdo. The example is also added to the Intel PT documentation. It requires autofdo (https://github.com/google/autofdo) and gcc version 5. The bubble sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial) amended to take the number of elements as a parameter. $ gcc-5 -O3 sort.c -o sort_optimized $ ./sort_optimized 30000 Bubble sorting array of 30000 elements 2254 ms $ cat ~/.perfconfig [intel-pt] mispred-all $ perf record -e intel_pt//u ./sort 3000 Bubble sorting array of 3000 elements 58 ms [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 3.939 MB perf.data ] $ perf inject -i perf.data -o inj --itrace=i100usle --strip $ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo $ ./sort_autofdo 30000 Bubble sorting array of 30000 elements 2155 ms Note there is currently no advantage to using Intel PT instead of LBR, but that may change in the future if greater use is made of the data. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1443186956-18718-26-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
1 parent f56fb98 commit ba11ba6

File tree

2 files changed

+43
-0
lines changed

2 files changed

+43
-0
lines changed

tools/perf/Documentation/intel-pt.txt

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -764,3 +764,32 @@ perf inject also accepts the --itrace option in which case tracing data is
764764
removed and replaced with the synthesized events. e.g.
765765

766766
perf inject --itrace -i perf.data -o perf.data.new
767+
768+
Below is an example of using Intel PT with autofdo. It requires autofdo
769+
(https://github.com/google/autofdo) and gcc version 5. The bubble
770+
sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial)
771+
amended to take the number of elements as a parameter.
772+
773+
$ gcc-5 -O3 sort.c -o sort_optimized
774+
$ ./sort_optimized 30000
775+
Bubble sorting array of 30000 elements
776+
2254 ms
777+
778+
$ cat ~/.perfconfig
779+
[intel-pt]
780+
mispred-all
781+
782+
$ perf record -e intel_pt//u ./sort 3000
783+
Bubble sorting array of 3000 elements
784+
58 ms
785+
[ perf record: Woken up 2 times to write data ]
786+
[ perf record: Captured and wrote 3.939 MB perf.data ]
787+
$ perf inject -i perf.data -o inj --itrace=i100usle --strip
788+
$ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
789+
$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
790+
$ ./sort_autofdo 30000
791+
Bubble sorting array of 30000 elements
792+
2155 ms
793+
794+
Note there is currently no advantage to using Intel PT instead of LBR, but
795+
that may change in the future if greater use is made of the data.

tools/perf/util/intel-pt.c

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ struct intel_pt {
6464
bool data_queued;
6565
bool est_tsc;
6666
bool sync_switch;
67+
bool mispred_all;
6768
int have_sched_switch;
6869
u32 pmu_type;
6970
u64 kernel_start;
@@ -943,6 +944,7 @@ static void intel_pt_update_last_branch_rb(struct intel_pt_queue *ptq)
943944
be->flags.abort = !!(state->flags & INTEL_PT_ABORT_TX);
944945
be->flags.in_tx = !!(state->flags & INTEL_PT_IN_TX);
945946
/* No support for mispredict */
947+
be->flags.mispred = ptq->pt->mispred_all;
946948

947949
if (bs->nr < ptq->pt->synth_opts.last_branch_sz)
948950
bs->nr += 1;
@@ -1967,6 +1969,16 @@ static bool intel_pt_find_switch(struct perf_evlist *evlist)
19671969
return false;
19681970
}
19691971

1972+
static int intel_pt_perf_config(const char *var, const char *value, void *data)
1973+
{
1974+
struct intel_pt *pt = data;
1975+
1976+
if (!strcmp(var, "intel-pt.mispred-all"))
1977+
pt->mispred_all = perf_config_bool(var, value);
1978+
1979+
return 0;
1980+
}
1981+
19701982
static const char * const intel_pt_info_fmts[] = {
19711983
[INTEL_PT_PMU_TYPE] = " PMU Type %"PRId64"\n",
19721984
[INTEL_PT_TIME_SHIFT] = " Time Shift %"PRIu64"\n",
@@ -2011,6 +2023,8 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
20112023
if (!pt)
20122024
return -ENOMEM;
20132025

2026+
perf_config(intel_pt_perf_config, pt);
2027+
20142028
err = auxtrace_queues__init(&pt->queues);
20152029
if (err)
20162030
goto err_free;

0 commit comments

Comments
 (0)