This repository is the one-stop destination of everything related to DPC4. It contains, among many things, (1) the codes of the competing prefetchers, (2) the simulation infrastructure, (3) workload traces, (4) evaluation metrics, and (5) any other auxiliary scripts used in DPC4. More information about the championship can be found at https://sites.google.com/view/dpc4-2026/home
Table of Contents
In total, eight submissions were part of the DPC4 main program. Their codes can be found inside submissions/. The eight submissions were:
- Crossing the Boundary: Virtual-Address Based Inter-Page Prefetching for Lower Level Caches
- SPPAM: Signature Pattern Prediction and Access-Map Prefetcher
- Emender: Optimizing Prefetch Priority and Throttling in VBerti+Pythia
- sBerti: Enhancing Berti with a Smart Stride Prefetcher for Better Coverage
- Pushing the limits of the Berti Prefetcher
- The Entangling Data Prefetcher
- Performance-Driven Composite Prefetching with Bandits
- Global Berti: Simultaneous Streaming and Spatial Prefetching
DPC4 used ChampSim as the simulation infrastructure to evaluate all submissions. The infra can be found here.
The workload traces used in DPC4 can be found here: https://console.cloud.google.com/storage/browser/dpc4-all-traces
This storage bucket contains all 610 traces. More information about the composition of the traces can be found in our slides. These trace in total may take ~3 TB of storage.
SPEC17traces were open-sourced by DPC3. We are grateful to Prof. Daniel Jiménez and Prof. Mike Ferdman for capturing and maintaining these traces.Graph/GAPtraces were open-sourced by ML-Based Data Prefetching Competition.gtrace_v2traces were originally open-sourced by Google and later converted to ChampSim. We are grateful to Matthew Giordano and Akanksha Jain for this effort.
Every submission was evaluated to receive three scores:
- FullBW score: measured in a single-core system with 4800 MTPS main memory bandwidth
- LimitBW score: measured in a single-core system with 800 MTPS main memory bandwidth
- MC score: measured in a four-core system with 4800 MTPS main memory bandwidth
The FullBW and LimitBW scores are calculated as the non-weighted geomean of speedup (i.e., the ratio of IPC in with submitted prefetcher and the IPC of the baseline) across all workload traces.
The MC Score is defined in three steps as follows:
The definition uses the following convention:
IPC <i,shared,baseline>denotes the IPC of the i-th core running together with other traces with baseline prefetchers.
- For a given trace mix k, compute the HarmonicSpeedup of the baseline (
$BaselineHS_k$ ) as$\frac{IPC<i,shared,baseline>}{IPC<i,alone,baseline>}$ . - For the same trace mix, compute the HarmonicSpeedup of the submission (
$SubmissionHS_k$ ) as$\frac{IPC<i,shared,submission>}{IPC<i,alone,baseline>}$ . - Compute the MC Score as the non-weighted geometric mean of the ratio
$\frac{SubmissionHS_k}{BaselineHS_k}$ across all trace mixes.
The overall score of a submission is calculated as the non-weighted geometric mean of all three individual scores.
Check inside the scripts directory. scripts/README.md provides a documentation on how to use the scripts.
We are thankful to all the competitors for submitting their ideas and pushing the the state-of-the-art in prefetching. We are grateful to all the program committee members for their valuable feedback on the submissions: Akanksha Jain, Alaa Alameldeen, Alberto Ros, Anant Nori, Biswabandan Panda, Leeor Peled, Mike Ferdman, Paul Gratz, and Pierre Michaud. Thanks goes to the HPCA 2026 organizers for giving us the opportunity to host the workshop.