A collection of good learning resources in distributed system, big data, and cloud computing areas
-
MIT : Distributed systems (with Go), Spring 2016 version, including 4 great course projects : MapReduce, Raft, and two key/value store.
-
Brown University : Distributed systems (with Go) good projects for building Chord, Tapestry, Raft, PuddleStore
-
CMU : There are couple of versions of distributed system courses. I'll only recommend the Go versions. Personally my favoriate is Fall15 version and its github, it extends the original version taught by David, Anderson, and replace the last "Design your own distributed system" project with a "Paxos" project. Alternatively, you can try the latest Spring16 version
-
Log, what every engineer should know .. this is a very good article written by LinkedIn's principal engineer, the lead of Kalfka and Samza.
-
Big Data: Principles and best practices of scalable realtime data systems. This is a really well-written book, by the creator of Apache Storm. As this was a little bit old (as far as I know it was created in 2012 and published in 2015), some chapters about specific tools can be skipped.
-
Spark paper, nsdi12 : This is a paper everybody can understand. Interestingly, there is a source code of the first version of Spark, including only thousands of lines of Scala code.
- 500 lines or less, this is a good collections of short projects written by domain experts. It includes couple of DB examples, and a Paxos example with Python.