Skip to content

PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems (SoCC 2019)

License

Notifications You must be signed in to change notification settings

UCLA-SEAL/PerfDebug

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PerfDebug

PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems (SoCC 2019)

Summary of PerfDebug

Performance is a key factor for big data applications, and much research has been devoted to optimizing these applications. While prior work can diagnose and correct data skew, the problem of computation skew—abnormally high computation costs for a small subset of input data—has been largely overlooked. Computation skew commonly occurs in real-world applications and yet no tool is available for developers to pinpoint underlying causes.

To enable a user to debug applications that exhibit computation skew, we develop a post-mortem performance debugging tool. PerfDebug automatically finds input records responsible for such abnormalities in a big data application by reasoning about deviations in performance metrics such as job execution time, garbage collection time, and serialization time. The key to PerfDebug’s success is a data provenance-based technique that computes and propagates record-level computation latency to keep track of abnormally expensive records throughout the pipeline. Finally, the input records that have the largest latency contributions are presented to the user for bug fixing. We evaluate PerfDebug via in-depth case studies and observe that remediation such as removing the single most expensive record or simple code rewrite can achieve up to 16X performance improvement.

Team

This project is developed by Professor Miryung Kim's Software Engineering and Analysis Laboratory at UCLA. If you encounter any problems, please open an issue or feel free to contact us:

Jason Teoh: PhD student, jteoh@cs.ucla.edu;

Muhammad Ali Gulzar: Professor at Virginia Tech, gulzar@cs.vt.edu;

Guoqing Harry Xu: Professor at UCLA, harryxu@cs.ucla.edu;

Miryung Kim: Professor at UCLA, miryung@cs.ucla.edu;

How to cite

Please refer to our SoCC 2019 paper, PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems for more details.

Bibtex

@inproceedings{10.1145/3357223.3362727, author = {Teoh, Jason and Gulzar, Muhammad Ali and Xu, Guoqing Harry and Kim, Miryung}, title = {PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems}, year = {2019}, isbn = {9781450369732}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3357223.3362727}, doi = {10.1145/3357223.3362727}, abstract = {Performance is a key factor for big data applications, and much research has been devoted to optimizing these applications. While prior work can diagnose and correct data skew, the problem of computation skew---abnormally high computation costs for a small subset of input data---has been largely overlooked. Computation skew commonly occurs in real-world applications and yet no tool is available for developers to pinpoint underlying causes.To enable a user to debug applications that exhibit computation skew, we develop a post-mortem performance debugging tool. PerfDebug automatically finds input records responsible for such abnormalities in a big data application by reasoning about deviations in performance metrics such as job execution time, garbage collection time, and serialization time. The key to PerfDebug's success is a data provenance-based technique that computes and propagates record-level computation latency to keep track of abnormally expensive records throughout the pipeline. Finally, the input records that have the largest latency contributions are presented to the user for bug fixing. We evaluate PerfDebug via in-depth case studies and observe that remediation such as removing the single most expensive record or simple code rewrite can achieve up to 16X performance improvement.}, booktitle = {Proceedings of the ACM Symposium on Cloud Computing}, pages = {465–476}, numpages = {12}, keywords = {fault localization, data intensive scalable computing, data provenance, Performance debugging, big data systems}, location = {Santa Cruz, CA, USA}, series = {SoCC '19} }

DOI Link

Benchmarks:

The PerfDebug paper includes a set of benchmark programs evaluated across Spark, Titian, and PerfDebug. For the sake of consolidating everything in one repository, they are included here as separate branches. However, note that the actual programs require Spark libraries (jars) corresponding to the evaluation setting (Spark/Titian/PerfDebug), so it's typically recommended to have a separate repository from your PerfDebug code (i.e., we suggest cloning a second copy of the PerfDebug repo for use with the benchmark branches).

The benchmark locations are as follows:

  1. Spark (baseline) evaluation benchmarks: https://github.com/UCLA-SEAL/PerfDebug/tree/spark-benchmarks
  2. Titian evaluation benchmarks: https://github.com/UCLA-SEAL/PerfDebug/tree/titian-benchmarks
  3. PerfDebug evaluation benchmarks: https://github.com/UCLA-SEAL/PerfDebug/tree/perfdebug-benchmarks

Other Notes:

  • This project originated as a fork of BigDebug, which itself is inherently based on Apache Spark. In particular, the original project code can be found in BigDebug's perf-ignite branch.
  • We recommend using IntelliJ as it is the recommended IDE for Spark development at the time that this project was forked.