Skip to content

liuhb86/spark-code-reading

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Source code reading (draft)

Table of contents

  1. Preface
  2. Core Spark Workflow
  3. Standalone Cluster
  4. Important Internal Infrastructure
    • What does BlockManager do?
    • How shuffle works between stages? (TODO)
    • What is inside the cache? (TODO)
    • How are the caches managed? (TODO)
    • How does the recovery work? (TODO)
    • How does broadcast work? (TODO)
    • How does accumulator work? (TODO)
    • How is the object serialized? (TODO)
    • RPC
    • Task Pool
  5. Pefromance Optimization
    • How does Spark respect locality? (TODO)
    • What are the physical plan and logical plan? (TODO)
    • What does the query optimizer do? (TODO)
    • How does code generation work? (TODO)
    • How does speculative task work? (TODO)
    • How does out-heap memory work?(TODO)
  6. Streaming/Machine Learning (not planned)
  7. Appendix A Minor concepts

About

Notes on reading Apache Spark source code

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published