Skip to content

Commit

Permalink
Merge tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/p…
Browse files Browse the repository at this point in the history
…ub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "New features:

   - Add 'trace' subcommand for 'perf ftrace', setting the stage for
     more 'perf ftrace' subcommands. Not using a subcommand yields the
     previous behaviour of 'perf ftrace'.

   - Add 'latency' subcommand to 'perf ftrace', that can use the
     function graph tracer or a BPF optimized one, via the -b/--use-bpf
     option.

     E.g.:

	$ sudo perf ftrace latency -a -T mutex_lock sleep 1
	#   DURATION     |      COUNT | GRAPH                          |
	     0 - 1    us |       4596 | ########################       |
	     1 - 2    us |       1680 | #########                      |
	     2 - 4    us |       1106 | #####                          |
	     4 - 8    us |        546 | ##                             |
	     8 - 16   us |        562 | ###                            |
	    16 - 32   us |          1 |                                |
	    32 - 64   us |          0 |                                |
	    64 - 128  us |          0 |                                |
	   128 - 256  us |          0 |                                |
	   256 - 512  us |          0 |                                |
	   512 - 1024 us |          0 |                                |
	     1 - 2    ms |          0 |                                |
	     2 - 4    ms |          0 |                                |
	     4 - 8    ms |          0 |                                |
	     8 - 16   ms |          0 |                                |
	    16 - 32   ms |          0 |                                |
	    32 - 64   ms |          0 |                                |
	    64 - 128  ms |          0 |                                |
	   128 - 256  ms |          0 |                                |
	   256 - 512  ms |          0 |                                |
	   512 - 1024 ms |          0 |                                |
	     1 - ...   s |          0 |                                |

     The original implementation of this command was in the bcc tool.

   - Support --cputype option for hybrid events in 'perf stat'.

  Improvements:

   - Call chain improvements for ARM64.

   - No need to do any affinity setup when profiling pids.

   - Reduce multiplexing with duration_time in 'perf stat' metrics.

   - Improve error message for uncore events, stating that some event
     groups are can only be used in system wide (-a) mode.

   - perf stat metric group leader fixes/improvements, including arch
     specific changes to better support Intel topdown events.

   - Probe non-deprecated sysfs path first, i.e. try the path
     /sys/devices/system/cpu/cpuN/topology/thread_siblings first, then
     the old /sys/devices/system/cpu/cpuN/topology/core_cpus.

   - Disable debuginfod by default in 'perf record', to avoid stalls on
     distros such as Fedora 35.

   - Use unbuffered output in 'perf bench' when pipe/tee'ing to a file.

   - Enable ignore_missing_thread in 'perf trace'

  Fixes:

   - Avoid TUI crash when navigating in the annotation of recursive
     functions.

   - Fix hex dump character output in 'perf script'.

   - Fix JSON indentation to 4 spaces standard in the ARM vendor event
     files.

   - Fix use after free in metric__new().

   - Fix IS_ERR_OR_NULL() usage in the perf BPF loader.

   - Fix up cross-arch register support, i.e. when printing register
     names take into account the architecture where the perf.data file
     was collected.

   - Fix SMT fallback with large core counts.

   - Don't lower case MetricExpr when parsing JSON files so as not to
     lose info such as the ":G" event modifier in metrics.

  perf test:

   - Add basic stress test for sigtrap handling to 'perf test'.

   - Fix 'perf test' failures on s/390

   - Enable system wide for metricgroups test in 'perf test´.

   - Use 3 digits for test numbering now we can have more tests.

  Arch specific:

   - Add events for Arm Neoverse N2 in the ARM JSON vendor event files

   - Support PERF_MEM_LVLNUM encodings in powerpc, that came from a
     single patch series, where I incorrectly merged the kernel bits,
     that were then reverted after coordination with Michael Ellerman
     and Stephen Rothwell.

   - Add ARM SPE total latency as PERF_SAMPLE_WEIGHT.

   - Update AMD documentation, with info on raw event encoding.

   - Add support for global and local variants of the "p_stage_cyc" sort
     key, applicable to perf.data files collected on powerpc.

   - Remove duplicate and incorrect aux size checks in the ARM CoreSight
     ETM code.

  Refactorings:

   - Add a perf_cpu abstraction to disambiguate CPUs and CPU map
     indexes, fixing problems along the way.

   - Document CPU map methods.

  UAPI sync:

   - Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench
     mem memcpy'

   - Sync UAPI files with the kernel sources: drm, msr-index,
     cpufeatures.

  Build system

   - Enable warnings through HOSTCFLAGS.

   - Drop requirement for libstdc++.so for libopencsd check

  libperf:

   - Make libperf adopt perf_counts_values__scale() from tools/perf/util/.

   - Add a stat multiplexing test to libperf"

* tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (115 commits)
  perf record: Disable debuginfod by default
  perf evlist: No need to do any affinity setup when profiling pids
  perf cpumap: Add is_dummy() method
  perf metric: Fix metric_leader
  perf cputopo: Fix CPU topology reading on s/390
  perf metricgroup: Fix use after free in metric__new()
  libperf tests: Update a use of the new cpumap API
  perf arm: Fix off-by-one directory path
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Update tools's copy of drm.h header
  tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
  perf pmu-events: Don't lower case MetricExpr
  perf expr: Add debug logging for literals
  perf tools: Probe non-deprecated sysfs path 1st
  perf tools: Fix SMT fallback with large core counts
  perf cpumap: Give CPUs their own type
  perf stat: Correct first_shadow_cpu to return index
  perf script: Fix flipped index and cpu
  perf c2c: Use more intention revealing iterator
  ...
  • Loading branch information
torvalds committed Jan 18, 2022
2 parents f003368 + 9bce13e commit 57d1737
Show file tree
Hide file tree
Showing 153 changed files with 4,685 additions and 2,175 deletions.
1 change: 1 addition & 0 deletions tools/arch/x86/include/asm/cpufeatures.h
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@
#define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */
#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
#define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
#define X86_FEATURE_CPPC (13*32+27) /* Collaborative Processor Performance Control */

/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */
Expand Down
17 changes: 17 additions & 0 deletions tools/arch/x86/include/asm/msr-index.h
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,23 @@

#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f

/* AMD Collaborative Processor Performance Control MSRs */
#define MSR_AMD_CPPC_CAP1 0xc00102b0
#define MSR_AMD_CPPC_ENABLE 0xc00102b1
#define MSR_AMD_CPPC_CAP2 0xc00102b2
#define MSR_AMD_CPPC_REQ 0xc00102b3
#define MSR_AMD_CPPC_STATUS 0xc00102b4

#define AMD_CPPC_LOWEST_PERF(x) (((x) >> 0) & 0xff)
#define AMD_CPPC_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff)
#define AMD_CPPC_NOMINAL_PERF(x) (((x) >> 16) & 0xff)
#define AMD_CPPC_HIGHEST_PERF(x) (((x) >> 24) & 0xff)

#define AMD_CPPC_MAX_PERF(x) (((x) & 0xff) << 0)
#define AMD_CPPC_MIN_PERF(x) (((x) & 0xff) << 8)
#define AMD_CPPC_DES_PERF(x) (((x) & 0xff) << 16)
#define AMD_CPPC_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24)

/* Fam 17h MSRs */
#define MSR_F17H_IRPERF 0xc00000e9

Expand Down
12 changes: 6 additions & 6 deletions tools/arch/x86/lib/memcpy_64.S
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ SYM_FUNC_START_WEAK(memcpy)
rep movsq
movl %edx, %ecx
rep movsb
ret
RET
SYM_FUNC_END(memcpy)
SYM_FUNC_END_ALIAS(__memcpy)
EXPORT_SYMBOL(memcpy)
Expand All @@ -53,7 +53,7 @@ SYM_FUNC_START_LOCAL(memcpy_erms)
movq %rdi, %rax
movq %rdx, %rcx
rep movsb
ret
RET
SYM_FUNC_END(memcpy_erms)

SYM_FUNC_START_LOCAL(memcpy_orig)
Expand Down Expand Up @@ -137,7 +137,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
movq %r9, 1*8(%rdi)
movq %r10, -2*8(%rdi, %rdx)
movq %r11, -1*8(%rdi, %rdx)
retq
RET
.p2align 4
.Lless_16bytes:
cmpl $8, %edx
Expand All @@ -149,7 +149,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
movq -1*8(%rsi, %rdx), %r9
movq %r8, 0*8(%rdi)
movq %r9, -1*8(%rdi, %rdx)
retq
RET
.p2align 4
.Lless_8bytes:
cmpl $4, %edx
Expand All @@ -162,7 +162,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
movl -4(%rsi, %rdx), %r8d
movl %ecx, (%rdi)
movl %r8d, -4(%rdi, %rdx)
retq
RET
.p2align 4
.Lless_3bytes:
subl $1, %edx
Expand All @@ -180,7 +180,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
movb %cl, (%rdi)

.Lend:
retq
RET
SYM_FUNC_END(memcpy_orig)

.popsection
6 changes: 3 additions & 3 deletions tools/arch/x86/lib/memset_64.S
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ SYM_FUNC_START(__memset)
movl %edx,%ecx
rep stosb
movq %r9,%rax
ret
RET
SYM_FUNC_END(__memset)
SYM_FUNC_END_ALIAS(memset)
EXPORT_SYMBOL(memset)
Expand All @@ -63,7 +63,7 @@ SYM_FUNC_START_LOCAL(memset_erms)
movq %rdx,%rcx
rep stosb
movq %r9,%rax
ret
RET
SYM_FUNC_END(memset_erms)

SYM_FUNC_START_LOCAL(memset_orig)
Expand Down Expand Up @@ -125,7 +125,7 @@ SYM_FUNC_START_LOCAL(memset_orig)

.Lende:
movq %r10,%rax
ret
RET

.Lbad_alignment:
cmpq $7,%rdx
Expand Down
2 changes: 1 addition & 1 deletion tools/build/Build.include
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" $(CXX
###
## HOSTCC C flags

host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(KBUILD_HOSTCFLAGS) -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj))
host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(HOSTCFLAGS) -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj))

# output directory for tests below
TMPOUT = .tmp_$$$$
Expand Down
18 changes: 18 additions & 0 deletions tools/include/uapi/drm/drm.h
Original file line number Diff line number Diff line change
Expand Up @@ -1096,6 +1096,24 @@ extern "C" {
#define DRM_IOCTL_SYNCOBJ_TRANSFER DRM_IOWR(0xCC, struct drm_syncobj_transfer)
#define DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL DRM_IOWR(0xCD, struct drm_syncobj_timeline_array)

/**
* DRM_IOCTL_MODE_GETFB2 - Get framebuffer metadata.
*
* This queries metadata about a framebuffer. User-space fills
* &drm_mode_fb_cmd2.fb_id as the input, and the kernels fills the rest of the
* struct as the output.
*
* If the client is DRM master or has &CAP_SYS_ADMIN, &drm_mode_fb_cmd2.handles
* will be filled with GEM buffer handles. Planes are valid until one has a
* zero handle -- this can be used to compute the number of planes.
*
* Otherwise, &drm_mode_fb_cmd2.handles will be zeroed and planes are valid
* until one has a zero &drm_mode_fb_cmd2.pitches.
*
* If the framebuffer has a format modifier, &DRM_MODE_FB_MODIFIERS will be set
* in &drm_mode_fb_cmd2.flags and &drm_mode_fb_cmd2.modifier will contain the
* modifier. Otherwise, user-space must ignore &drm_mode_fb_cmd2.modifier.
*/
#define DRM_IOCTL_MODE_GETFB2 DRM_IOWR(0xCE, struct drm_mode_fb_cmd2)

/*
Expand Down
5 changes: 4 additions & 1 deletion tools/include/uapi/linux/perf_event.h
Original file line number Diff line number Diff line change
Expand Up @@ -1332,7 +1332,10 @@ union perf_mem_data_src {

/* hop level */
#define PERF_MEM_HOPS_0 0x01 /* remote core, same node */
/* 2-7 available */
#define PERF_MEM_HOPS_1 0x02 /* remote node, same socket */
#define PERF_MEM_HOPS_2 0x03 /* remote socket, same board */
#define PERF_MEM_HOPS_3 0x04 /* remote board */
/* 5-7 available */
#define PERF_MEM_HOPS_SHIFT 43

#define PERF_MEM_S(a, s) \
Expand Down
11 changes: 6 additions & 5 deletions tools/lib/perf/Documentation/libperf.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ SYNOPSIS
int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
bool perf_cpu_map__empty(const struct perf_cpu_map *map);
int perf_cpu_map__max(struct perf_cpu_map *map);
bool perf_cpu_map__has(const struct perf_cpu_map *map, int cpu);

#define perf_cpu_map__for_each_cpu(cpu, idx, cpus)
--
Expand Down Expand Up @@ -135,16 +136,16 @@ SYNOPSIS
int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus,
struct perf_thread_map *threads);
void perf_evsel__close(struct perf_evsel *evsel);
void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu_map_idx);
int perf_evsel__mmap(struct perf_evsel *evsel, int pages);
void perf_evsel__munmap(struct perf_evsel *evsel);
void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread);
int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu_map_idx, int thread);
int perf_evsel__read(struct perf_evsel *evsel, int cpu_map_idx, int thread,
struct perf_counts_values *count);
int perf_evsel__enable(struct perf_evsel *evsel);
int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu);
int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu_map_idx);
int perf_evsel__disable(struct perf_evsel *evsel);
int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu);
int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu_map_idx);
struct perf_cpu_map *perf_evsel__cpus(struct perf_evsel *evsel);
struct perf_thread_map *perf_evsel__threads(struct perf_evsel *evsel);
struct perf_event_attr *perf_evsel__attr(struct perf_evsel *evsel);
Expand Down
Loading

0 comments on commit 57d1737

Please sign in to comment.