Skip to content

Conversation

@dvalinrh
Copy link
Contributor

@dvalinrh dvalinrh commented Jan 9, 2026

Description

  1. Adds the open metric file for the test metrics itself.
  2. Handles iterations better.

Before/After Comparison

Before
Logging results hpl_time 261.59
Unexpected metric logged. Check for a typo.
Logging results hpl_gflops 1.3036e+02
Unexpected metric logged. Check for a typo.
Send result to PCP archive for iteration 1
Stopping PCP subset
Stop PCP
Ran

      o.w.hpl_gflops  o.w.hpl_time  o.w.iteration  o.w.running  o.w.numthreads  o.w.runtime  o.w.throughput  o.w.latency

13:21:15 N/A 261.590 0.000 1.000 0.000 NaN NaN NaN
13:21:16 N/A 261.590 0.000 1.000 0.000 NaN NaN NaN
13:21:17 130.360 261.590 0.000 0.000 0.000 NaN NaN NaN
13:21:18 130.360 261.590 0.000 0.000 0.000 NaN NaN NaN

After

Logging results hpl_time 261.50
hpl_time NaN
Logging results hpl_gflops 1.3040e+02
hpl_gflops NaN
Stopping PCP subset
Ran

      o.w.iteration  o.w.running  o.w.numthreads  o.w.runtime  o.w.throughput  o.w.latency  o.w.hpl_time  o.w.hpl_gflops

16:23:00 1.000 1.000 0.000 NaN NaN NaN 261.500 NaN
16:23:01 1.000 1.000 0.000 NaN NaN NaN 261.500 NaN
16:23:02 1.000 1.000 0.000 NaN NaN NaN 261.500 130.400
16:23:03 1.000 1.000 0.000 NaN NaN NaN 261.500 130.400
16:23:04 1.000 0.000 0.000 NaN NaN NaN 261.500 130.400

For cases where the --iteration is passed into the test, there will be 1 set pcp data for each iteration.

Clerical Stuff

This closes #58

Relates to JIRA: RPOPC-759

Test results

=======================
csv file

Test general meta start

Test: auto_hpl

Results version: 1.0

Host: m5.xlarge

Sys environ: aws

Tuned: virtual-guest

OS: 5.14.0-611.5.1.el9_7.x86_64

Numa nodes: 1

CPU family: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz

Number cpus: 4

Memory: 15727856kB

Test general meta end

Test meta data start

/usr/lib64/openmpi/bin/mpirun --allow-run-as-root -np 1 --mca btl self,vader --report-bindings --map-by l3cache -x OMP_NUM_THREADS=2 ./xhpl

Test meta data end

T/V:N:NB:P:Q:Time:Gflops
WR12R2R4:37120:256:1:1:261.50:1.3040e+02

====================================
pcp snippet

Iteration 1
16:17:09 1.000 1.000 0.000 NaN NaN NaN 261.700 NaN
16:17:10 1.000 1.000 0.000 NaN NaN NaN 261.700 NaN
16:17:11 1.000 1.000 0.000 NaN NaN NaN 261.700 130.300
16:17:12 1.000 1.000 0.000 NaN NaN NaN 261.700 130.300
16:17:13 1.000 0.000 0.000 NaN NaN NaN 261.700 130.300
16:17:14 1.000 0.000 0.000 NaN NaN NaN 261.700 130.300

iteration 2
o.w.iteration o.w.running o.w.numthreads o.w.runtime o.w.throughput o.w.latency o.w.hpl_time o.w.hpl_gflops
16:23:00 1.000 1.000 0.000 NaN NaN NaN 261.500 NaN
16:23:01 1.000 1.000 0.000 NaN NaN NaN 261.500 NaN
16:23:02 1.000 1.000 0.000 NaN NaN NaN 261.500 130.400
16:23:03 1.000 1.000 0.000 NaN NaN NaN 261.500 130.400
16:23:04 1.000 0.000 0.000 NaN NaN NaN 261.500 130.400

===================
Test output

================================================================================
HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver

An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N : 37120
NB : 256
PMAP : Row-major process mapping
P : 1
Q : 1
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Right
BCAST : 2ring
DEPTH : 1
SWAP : Spread-roll (long)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words


  • The matrix A is randomly generated for each test.
  • The following scaled residual check will be computed:
    ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
  • The relative machine precision (eps) is taken to be 1.110223e-16
  • The relative machine precision (eps) is taken to be 1.110223e-16
  • Computational tests pass if scaled residuals are less than 16.0

================================================================================
T/V N NB P Q Time Gflops

WR12R2R4 37120 256 1 1 262.00 1.3015e+02
HPL_pdgesv() start time Fri Jan 9 17:17:19 2026

HPL_pdgesv() end time Fri Jan 9 17:21:41 2026


||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.12924418e-03 ...... PASSED

Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.

End of Tests.

@dvalinrh dvalinrh changed the title Fix pcp Add test metrics to pcp. Jan 13, 2026
Copy link

@malucius-rh malucius-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sayalibhavsar sayalibhavsar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

update for pcp to record test metrics.

4 participants