Blue Gene/Q Network Performance Counters Monitoring Library
BGQNCL is a library to monitor and record network performance counters on the 5D torus interconnection network of IBM’s Blue Gene/Q platform. It accesses the Universal Performance Counter (UPC) hardware counters through the Blue Gene/Q Hardware Performance Monitoring (HPM) API.
Transparent interception of MPI_Init, MPI_Pcontrol, and MPI_Finalize. Use MPI_Pcontrol around regions of interest. The argument passed to MPI_Pcontrol identifies the region. Multiple calls with same argument leads to summation of counter values during such executions. Argument value 0 is assumed at startup and is used by the library as a stopper. Use positive values for specify start of your region of interest and 0 to mark the end. You can implicitly mark end of a region by starting a new region.
Limitation: Argument has to be less than equal to 9.
Set environment variable BGQ_COUNTER_FILE to the output file to which counters should be written. If the such a file can't be created, the output is dumped to stdout.
Each line of the output file contains data for a physical node. The meta data for a line is:
Pcontrol_region world_rank coords[0] coords[1] coords[2] coords[3] coords[4] coords[5] ** linkdata
link data = (d_A-) (d_A+) (d_B-) (d_B+) (d_C-) (d_C+) (d_D-) (d_D+) (d_E-) (d_E+)
d_* = sent_chunks (32 bytes) dynamic_chunks deterministic_chunks col_packets (ignore) recv_packets (512 bytes) fifo_length
Hence, there are 60 entries that are part of link data.
Edit Makefile to point BGPM to the installation location of bgpm and type:
make
Link the profiler before your MPI library. Here is a sample link line
$(CC) -o mypgm mybin.o libprofiler.a -L $(BGPM)/lib -lbgpm -lrt -lstdc++
make test-all
Run the binary simple without any arguments on >= 2 processes.
Any published work that utilizes this library should include the following reference:
Abhinav Bhatele, Nikhil Jain, Katherine E. Isaacs, Ronak Buch, Todd Gamblin,
Steven H. Langer, and Laxmikant V. Kale. Optimizing the performance of parallel
applications on a 5D torus via task mapping. In Proceedings of IEEE
International Conference on High Performance Computing, HiPC '14. IEEE Computer
Society, December 2014. LLNL-CONF-655465.
Copyright (c) 2013, Lawrence Livermore National Security, LLC. Produced at the Lawrence Livermore National Laboratory.
Written by:
Nikhil Jain <nikhil.jain@acm.org>
Abhinav Bhatele <bhatele@llnl.gov>
LLNL-CODE-678958. All rights reserved.
This file is part of BGQNCL. For details, see: https://github.com/LLNL/bgqncl Please also read the LICENSE file for our notice and the LGPL.