Skip to content

Latest commit

 

History

History
312 lines (160 loc) · 8.8 KB

README.md

File metadata and controls

312 lines (160 loc) · 8.8 KB

learning-distributed-systems

List to all papers, books, blogs, articles, repos that I come across while studying distributed systems.

collections

distributed systems theory for the distributed systems engineer

https://www.the-paper-trail.org/post/2014-08-09-distributed-systems-theory-for-the-distributed-systems-engineer/

awesome-distributed-systems

https://github.com/theanalyst/awesome-distributed-systems

seminal papers in distributed systems (quora)

http://www.quora.com/What-are-the-seminal-papers-in-distributed-systems-Why

distributed systems (quora)

best papers in computer science (since 1996)

https://jeffhuang.com/best_paper_awards/

amazon builder's library

https://aws.amazon.com/builders-library/

papers we love

https://github.com/papers-we-love/papers-we-love

https://paperswelove.org/

timilearning

https://timilearning.com/

the architecture of open source applications

Perhaps a little bit off-topic but still valuable.

http://www.aosabook.org/en/

courses

principles of distributed computing

https://disco.ethz.ch/courses/podc_allstars/

pmp distributed systems (university of washington)

http://courses.cs.washington.edu/courses/cse552/07sp/

MIT distributed systems

https://pdos.csail.mit.edu/6.824/schedule.html

books

distributed systems

https://www.amazon.com/Distributed-Systems-2nd-Sape-Mullender/dp/0201624273

guide to reliable distributed systems: building high-assurance applications and cloud-hosted services

https://www.amazon.com/Guide-Reliable-Distributed-Systems-High-Assurance/dp/1447124154/

distributed computing through combinatorial topology

https://www.amazon.com/Distributed-Computing-Through-Combinatorial-Topology/dp/0124045782

specifying systems

https://www.amazon.com/Specifying-Systems-Language-Hardware-Engineers/dp/032114306X

designing data-intensive applications: the big ideas behind reliable, scalable, and maintainable systems

https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321

elements of distributed computing

https://www.amazon.com/Elements-Distributed-Computing-Vijay-Garg/dp/8126551755

mathematical tools for data mining: set theory, partial orders, combinatorics

https://www.amazon.com/Mathematical-Tools-Data-Mining-Combinatorics/dp/1447164067

designing distributed control systems: a pattern language approach

https://www.amazon.com/Designing-Distributed-Control-Systems-Language/dp/1118694155

blogs

james hamilton

https://perspectives.mvdirona.com/

werner vogels

https://www.allthingsdistributed.com/

martin kleppmann

https://martin.kleppmann.com/

replication, atomicity and order in distributed systems

http://afeinberg.github.io/2011/06/17/replication-atomicity-and-order-in-distributed-systems.html

fault tolerance in a high volume, distributed system

http://techblog.netflix.com/2012/02/fault-tolerance-in-high-volume.html

high scalability

http://highscalability.com/all-time-favorites/

marc brooker

https://brooker.co.za/blog/

notes on distributed systems for young bloods

https://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/

facebook scribe

https://engineering.fb.com/data-infrastructure/scribe/

papers

xen and the art of virtualization

https://www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf

a hundred impossibility proofs for distributed computing

https://apps.dtic.mil/dtic/tr/fulltext/u2/a216391.pdf

the power of two random choices: a survery of techniques and results

http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/twosurvey.pdf

implementing fault-tolerant services using the state machine approach: a tutorial

https://www.cs.cornell.edu/fbs/publications/SMSurvey.pdf

a simple totally ordered broadcast protocol

https://www.datadoghq.com/pdf/zab.totally-ordered-broadcast-protocol.2008.pdf

linearizability: a correctness condition for concurrent objects

https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf

wait-free synchronization

https://dl.acm.org/doi/10.1145/114005.102808

https://cs.brown.edu/~mph/Herlihy91/p124-herlihy.pdf

the dangers of replication and a solution

https://dsf.berkeley.edu/cs286/papers/dangers-sigmod1996.pdf

practical byzantine fault tolerance and proactive recovery

http://www.pmg.csail.mit.edu/papers/bft-tocs.pdf

fault-scalable byzantine fault-tolerant services

https://www.pdl.cmu.edu/PDL-FTP/PASIS/sosp05.pdf

dapper, a large-scale distributed systems tracing infrastructure

https://research.google/pubs/pub36356/

chukwa: a system for reliable large-scale log collection

https://www.usenix.org/legacy/event/lisa10/tech/full_papers/Rabkin.pdf

latency lags bandwith

https://dl.acm.org/doi/10.1145/1022594.1022596

apache hadoop goes realtime at facebook

https://research.fb.com/wp-content/uploads/2011/06/apache-hadoop-goes-realtime-at-facebook.pdf?

on brewing fresh espresso: linkedin’s distributed data serving platform

https://www.cs.cmu.edu/~pavlo/courses/fall2013/static/papers/p1135-qiao.pdf

the case for ramclouds: scalable high-performance storage entirely in dram

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.420.872&rep=rep1&type=pdf

in search of an understandable consensus algorithm

https://raft.github.io/raft.pdf

pacifica: replication in log-based distributed storage systems

https://www.microsoft.com/en-us/research/wp-content/uploads/2008/02/tr-2008-25.pdf

corfu: a distributed shared log

https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/a10-balakrishnan.pdf

building linkedin’s real-time activity data pipeline

http://sites.computer.org/debull/A12june/pipeline.pdf

building a replicated logging system with apache kafka

https://dl.acm.org/doi/10.14778/2824032.2824063

kafka: a distributed messaging system for log processing

http://notes.stephenholiday.com/Kafka.pdf

how to build a highly available system using consensus

http://research.microsoft.com/en-us/um/people/blampson/58-Consensus/Acrobat.pdf

distributed computing meets game theory: combining insights from two fields

http://www.cs.utexas.edu/~lorenzo/papers/Abraham11Distributed.pdf

impossibility of distributed consensus with one faulty process

http://macs.citadel.edu/rudolphg/csci604/ImpossibilityofConsensus.pdf

the implementation of reliable distributed multiprocess systems

https://www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Implementation-of-Reliable-Distributed-Multiprocess-Systems.pdf

amazon aurora: design considerations for high hhroughput cloud-native relational databases

https://pdos.csail.mit.edu/6.824/papers/aurora.pdf

dynamo: amazon’s highly available key-value store

https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

cassandra - a decentralized structured storage system

https://research.cs.cornell.edu/ladis2009/papers/lakshman-ladis2009.pdf

the chubby lock service for loosely-coupled distributed systems

http://static.googleusercontent.com/media/research.google.com/en/us/archive/chubby-osdi06.pdf

zookeeper: wait-free coordination for internet-scale systems

https://www.usenix.org/legacy/event/usenix10/tech/full_papers/Hunt.pdf

topics

acid

https://en.wikipedia.org/wiki/ACID

paxos

chain replication

distributed state machines

https://en.wikipedia.org/wiki/State_machine_replication

exponential backoff

https://www.awsarchitectureblog.com/2015/03/backoff.html

load balancing

crdt

gossip protocols

https://en.wikipedia.org/wiki/Gossip_protocol

hystrix

https://github.com/Netflix/Hystrix/wiki