- Preface
- Core Spark Workflow
- Standalone Cluster
- How does driver programs communicate with master?
- How does master communicate with worker? (TODO)
- How does driver communicate with the executors? (TODO)
- Important Internal Infrastructure
- What does BlockManager do?
- How shuffle works between stages? (TODO)
- What is inside the cache? (TODO)
- How are the caches managed? (TODO)
- How does the recovery work? (TODO)
- How does broadcast work? (TODO)
- How does accumulator work? (TODO)
- How is the object serialized? (TODO)
- RPC
- Task Pool
- Pefromance Optimization
- How does Spark respect locality? (TODO)
- What are the physical plan and logical plan? (TODO)
- What does the query optimizer do? (TODO)
- How does code generation work? (TODO)
- How does speculative task work? (TODO)
- How does out-heap memory work?(TODO)
- Streaming/Machine Learning (not planned)
- Appendix A Minor concepts
-
Notifications
You must be signed in to change notification settings - Fork 1
liuhb86/spark-code-reading
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Notes on reading Apache Spark source code
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published