Skip to content
@PerfLab-EXaCT

Performance Lab for EXtreme Computing and daTa

Performance Lab for EXtreme Computing and daTa

DataFlowDrs: Performance optimization of HPC workflows

  • DataFlowDrs: 🆕 Scientific workflows are critical in many areas of scientific exploration. Because these workflows tend to be data intensive, severe bottlenecks emerge in storage systems and I/O networks. We introduce DataFlowDrs, a new comprehensive suite of tools for performance optimization of HPC workflows that especially focuses on data flow and storage. DataFlowDrs introduces (a) lightweight high-resolution measurement and visualization tools for workflow profiling and tracing; (b) rapid modeling and analysis that reduces analysis data by compressing common repeated coordination patterns; (c) novel methods for predicting data flow scaling using automatically generated interpretable models of data flow; (d) effective performance analysis and bottleneck detection that can automatically quantify and rank bottlenecks for different combinations of task parallelism and storage resources; (e) actionable performance optimization in the form of new schedules and resource assignments. DataFlowDrs automates several previously difficult manual analyses and substantially reduces the impact of data flow bottlenecks by recommending the right tradeoffs between task parallelism and storage performance.

    Tools: DataLife, DaYu, Dataflow Performance Matcher (DPM), FlowForecaster, FastFlow, QoSFlow

  • TAZeR Remote I/O and BigFlowSim I/O simulator-emulator

    Details
    • TAZeR: TAZeR (Transparent Asynchronous Zero-copy Remote I/O) is a remote I/O framework for transparently minimizing the access latencies of remote I/O in workflows. TAZeR captures dynamic and irregular inter-task locality, both temporal and spatial, via adaptive hierarchical staging that ensures most frequently accessed data is `close'.

    • BigFlowSim: BigFlowSim is a workflow I/O simulator-emulator and trace generator that captures several parameters that affect local and remote I/O performance. BigFlowSim generates a large variety of flows within and between tasks of distributed workflows. The BigFlowSim Driver is helpful for conducting experiments.

AI Systems • Data Analytics

  • MassiveGNN: Graph Neural Networks (GNN) based on massively connected (distributed) GNNs pose significant challenges as even with the best methods, GNN training usually suffers from communication bottlenecks and load imbalance. MassiveGNN introduces performant and productive training for massively connected (distributed) GNNs within the state-of-the-art Amazon DistDGL (distributed Deep Graph Library). It brings practical trade-offs for improving the sampling and communication overheads for representation learning on distributed graphs by developing a parameterized continuous prefetch and eviction scheme.

  • PowerTrip 🆕 and PowerMorph 🆕, for addressing the power constraints of large-scale training with federated heterogeneous datacenter power and intelligent adaptation of demand-response power.

    Details
    • PowerTrip: As AI training's power demands exceed capacity of a single data center, intelligent power federation becomes critical. Effective solutions must overcome the challenges of dynamic power availability and communication cost. PowerTrip intelligently harnesses and federates the residual power of multiple datacenters for distributed AI training, selecting the best combination of power and communication cost.

    • PowerMorph: To ensure secure operation of power grids, datacenters must provide demand response: large jobs must adjust power consumption dynamically based on specific energy requirements. PowerMorph effectively and reliablly tracks power demand-response curves, ensuring higher throughput while preserving training accuracy. It achieves higher training throughput without compromising training accuracy. The tracking error is minimal (about 2% in experiments).

  • AIZ 🆕 Image compression for workflows that provides best-in-class image quality (preserving texture), compression ratio, and speed using a novel AI-based compression pipeline.

    Details
    • AIZ: Scientific images are essential for agentic interpretation of experimental science. Effective image compression must be fast, parallelizable, achieve high compression ratios, and preserve important domain-specific features. Existing compressors can distort critical textures at high compression ratios. AI-based compressors, on the other hand, have high quality and compression ratios, but are extraordinarily slow. AIZ is a high-performance AI-based compressor that not only preserves visual semantics, but addresses both compression latency and throughput with a modular high-performance pipeline.
  • SAMIAm: Microstructure segmentation for transmission electron microscopy that recognizes geometric and textural features and that is based on semantic boosting of the Segment Anything Model (SAM).

    Details
    • SAMIAm Image segmentation is a critical enabler for tasks ranging from medical diagnostics to autonomous driving. However, the correct segmentation semantics -- where are boundaries located? what segments are logically similar? -- change depending on the domain, such that state-of-the-art foundation models can generate meaningless and incorrect results. Moreover, in certain domains, fine-tuning and retraining techniques are infeasible: obtaining labels is costly and time-consuming; domain images (micrographs) can be exponentially diverse; and data sharing (for third-party retraining) is restricted. To enable rapid adaptation of the best segmentation technology, we define semantic boosting: given a zero-shot foundation model, guide its segmentation and adjust results to match domain expectations. We apply semantic boosting to the Segment Anything Model (SAM) to obtain microstructure segmentation for transmission electron microscopy. Our booster, SAM-I-Am, extracts geometric and textural features of various intermediate masks to perform mask removal and mask merging operations.

    • SuperSAM: Neural Architecture Search (NAS) is a powerful approach of automating the design of efficient neural architectures. In contrast to traditional NAS methods, recently proposed one-shot NAS methods prove to be more efficient in performing NAS. One-shot NAS works by generating a singular weight-sharing supernetwork that acts as a search space (container) of subnetworks. Despite its achievements, designing the one-shot search space remains a major challenge. In this work we propose a search space design strategy for Vision Transformer (ViT)-based architectures. In particular, we convert the Segment Anything Model (SAM) into a weight-sharing supernetwork called SuperSAM. Our approach involves automating the search space design via layer-wise structured pruning and parameter prioritization. While the structured pruning applies probabilistic removal of certain transformer layers, parameter prioritization performs weight reordering and slicing of MLP-blocks in the remaining layers. We train supernetworks on several datasets using the sandwich rule. For deployment, we enhance subnetwork discovery by utilizing a program autotuner to identify efficient subnetworks within the search space. The resulting subnetworks are 30-70% smaller in size compared to the original pre-trained SAM ViT-B, yet outperform the pretrained model. Our work introduces a new and effective method for ViT NAS search-space design.

    • Demo

    • SAMIAm-LabelStudio

Hardware/Software Co-design • Application Performance Analysis

  • MemGaze/MemFriend: 🆕 MemGaze is a memory analysis toolset that combines high-resolution trace analysis and low overhead measurement, both with respect to time and space. MemGaze provides high-resolution by collecting world-level memory access traces, where the highest resolution supported is back-to-back sequences. MemGaze provides several post-mortem trace processing methods, including multi-resolution analysis for locations vs. operations; accesses vs. spatio-temporal reuse, and reuse (distance, rate, volume) vs. access patterns.

    Memgaze now includes MemFriend, a new analysis module that introduces spatial and temporal locality analysis that captures affinity (access correlation) between pairs of memory locations. MemFriend's multi-resolution analysis identifies significant memory segments and simultaneously prunes the analysis space such that time and space complexity is modest. MemFriend creates signatures, selectable at 3D, 2D, and 1D resolutions, that provide novel insights and enable predictive reasoning about application performance. The results aid data layout optimizations, and data placement decisions.

  • OCEAN 🆕 (Open-source CXL Emulation at Hyperscale Architecture and Networking), an emerging tool for emulating CXL-extended distributed memory systems.

  • Palm: Palm is a suite of performance modeling tools (Palm, Palm-Task, Representative-Paths, Palm/FastFootprints, MIAMI-NW) to assist performance analysis and predictive model generation. Palm generates models by combining top-down (human-provided) semantic insight with bottom-up static and dynamic analysis. Palm has been used to model irregular applications with sparse data structures and unpredictable access patterns. Recent additions focus on rapid characterization of memory behavior.

  • QuaL²M (QuaLM): Quantitative Learned Latency Model [Extra datasets]

Workload Benchmarking and Characterization

  • Scientific workflows: A suite of distributed scientific workflows with a range of workload characteristics

  • SEAK Suite: The SEAK Suite is a collection of constraining problems for common embedded computing challenges. A constraining problem is a mission-centric and goal-oriented problem specification that separate problem-domain constraints from solution implementations so as to encourage creative solutions that meet goals but that may deviate from standard implementations.

  • PERFECT Suite: The PERFECT Suite consists of kernels and applications for evaluating tradeoffs between performance, power, and architecture within the domains of radar and image processing.

  • miniVite-x: Mini-application to demonstrate different memory patterns and test memory analysis tools.

Miscellaneous tools for performance analysis and modeling

Pinned Loading

  1. ubench ubench Public

    C 1

  2. utools utools Public

    Python 1

Repositories

Showing 10 of 11 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…