Skip to content

Commit

Permalink
Format final changes for Dagoba.
Browse files Browse the repository at this point in the history
  • Loading branch information
MichaelDiBernardo committed Mar 1, 2016
1 parent a8c4bc0 commit cb46a98
Show file tree
Hide file tree
Showing 21 changed files with 1,602 additions and 1,468 deletions.
22 changes: 12 additions & 10 deletions dagoba/dagoba.markdown
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# Dagoba: an in-memory graph database [^titlefoot]
title: Dagoba: an in-memory graph database
author: Dann Toliver

[^titlefoot] This database started life as a library for managing Directed Acyclic Graphs, or DAGs. It was originally intended to come with a silent 'h' at the end, an homage to the swampy fictional planet, but reading the back of a chocolate bar one day we discovered the sans-h version refers to a place for silently contemplating the connections between things, which seems even more fitting.
_[Dann](https://twitter.com/dann) enjoys building things, like programming languages, databases, distributed systems, communities of smart friendly humans, and pony castles with his two year old._


## Prologue

> "When we try to pick out anything by itself we find that it is bound fast by a thousand invisible cords that cannot be broken, to everything in the universe."
> --- John Muir
 

> "What went forth to the ends of the world to traverse not itself, God, the sun, Shakespeare, a commercial traveller, having itself traversed in reality itself becomes that self."
> --- James Joyce
## Prologue

A long time ago, when the world was still young, all data walked happily in single file. If you wanted your data to jump over a fence, you just set the fence down in its path and each datum jumped it in turn. Punch cards in, punch cards out. Life was easy and programming was a breeze.

Then came the random access revolution, and data grazed freely across the hillside. Herding data became a serious concern: if you can access any piece of data at any time, how do you know which one to pick next? Techniques were developed for corralling the data by forming links between items [^items], marshaling groups of units into formation through their linking assemblage. Questioning data meant picking a sheep and pulling along everything connected to it.
Expand All @@ -24,12 +24,14 @@ The distributed revolution changed everything, again. Data broke free of spacial

[^items]: One of the very first database designs was the hierarchical model, which grouped items into tree-shaped hierarchies and is still used as the basis of IBM's IMS product, a high-speed transaction processing system. It's influence can also been seen in XML, file systems and geographic information storage. The network model, invented by Charles Bachmann and standardized by CODASYL, generalized the hierarchical model by allowing multiple parents, forming a DAG instead of a tree. These navigational database models came in to vogue in the 1960s and continued their dominance until performance gains made relational databases usable in the 1980s.

[^relationaltheory]: Edgar F. Codd developed relational database theory while working at IBM, but Big Blue feared that a relational database would cannibalize the sales of IMS. While IBM eventually built a research prototype called System R, it was based around a new non-relational language called SEQUEL, instead of Codd's original Alpha language. The SEQUEL language was copied by Larry Ellison in his Oracle Database based on pre-launch conference papers, and the name changed to SQL to avoid trademark disputes.]
[^relationaltheory]: Edgar F. Codd developed relational database theory while working at IBM, but Big Blue feared that a relational database would cannibalize the sales of IMS. While IBM eventually built a research prototype called System R, it was based around a new non-relational language called SEQUEL, instead of Codd's original Alpha language. The SEQUEL language was copied by Larry Ellison in his Oracle Database based on pre-launch conference papers, and the name changed to SQL to avoid trademark disputes.


## Take One

Within this chapter we're going to build a graph database. As we build it we're going to explore the problem space, generate multiple solutions for our design decisions, compare those solutions to understand the tradeoffs between them, and finally choose the right solution for our system. A higher-than-usual precedence is put on code compactness, but the process will otherwise mirror that used by software professionals since time immemorial. The purpose of this chapter is to teach this process. And to build a graph database.[^purpose]
Within this chapter we're going to build a graph database[^dagoba]. As we build it we're going to explore the problem space, generate multiple solutions for our design decisions, compare those solutions to understand the tradeoffs between them, and finally choose the right solution for our system. A higher-than-usual precedence is put on code compactness, but the process will otherwise mirror that used by software professionals since time immemorial. The purpose of this chapter is to teach this process. And to build a graph database.[^purpose]

[^dagoba]: This database started life as a library for managing Directed Acyclic Graphs, or DAGs. Its name "Dagoba" was originally intended to come with a silent 'h' at the end, an homage to the swampy fictional planet, but reading the back of a chocolate bar one day we discovered the sans-h version refers to a place for silently contemplating the connections between things, which seems even more fitting.

[^purpose]: The two purposes of this chapter are to teach this process, to build a graph database, and to have fun.

Expand Down Expand Up @@ -361,9 +363,9 @@ Dagoba.fauxPipetype = function(_, _, maybe_gremlin) { // pass the result upstr
}
```

See those underscores? We use those to label params that won't be used in our function. Most other pipetypes will use all three parameters, and have all three parameter names. This allows us to distinguish at a glance which parameters a particular pipetype relies on.[^underscores]
See those underscores? We use those to label params that won't be used in our function. Most other pipetypes will use all three parameters, and have all three parameter names. This allows us to distinguish at a glance which parameters a particular pipetype relies on.

[^underscores]: Actually, we only used this underscore technique here to make the comments line up nicely. No, seriously. If programs "must be written for people to read, and only incidentally for machines to execute", [citation: Structure and Interpretation of Computer Programs, Abelson and Sussman] then it immediately follows that our predominant concern should be making code pretty.
This underscore technique is also important because it makes the comments line up nicely. No, seriously. If programs ["must be written for people to read, and only incidentally for machines to execute"](https://mitpress.mit.edu/sicp/front/node3.html), then it immediately follows that our predominant concern should be making code pretty.


#### Vertex
Expand Down
8 changes: 8 additions & 0 deletions tex/500L.tex
Original file line number Diff line number Diff line change
Expand Up @@ -260,8 +260,16 @@
\mainmatter


\include{image-filters}

\include{dagoba}

\include{ocr}

\include{contingent}

\include{same-origin-policy}

\include{blockcode}

\include{interpreter}
Expand Down
27 changes: 19 additions & 8 deletions tex/blockcode.tex
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
\begin{aosachapter}{Blockcode: A visual programming toolkit}{s:blockcode}{Dethe Elze}
\begin{aosachapter}{Blockcode: A visual programming toolkit}{s:blockcode}{Dethe Elza}

\emph{\href{https://twitter.com/dethe}{Dethe} is a geek dad, aesthetic
programmer, mentor, and creator of the
\href{http://waterbearlang.com/}{Waterbear} visual programming tool. He
co-hosts the Vancouver Maker Education Salons and wants to fill the
world with robotic origami rabbits.}

In block-based programming languages, you write programs by dragging and
connecting blocks that represent parts of the program. Block-based
Expand Down Expand Up @@ -51,6 +57,11 @@
graphics, and it is a small enough domain to be able to capture in a
tightly constrained project such as this.

If you would like to get a feel for what a block-based-language is like,
you can experiment with the program that is built in this chapter from
author's \href{https://dethe.github.io/500lines/blockcode/}{GitHub
repository}.

\aosasecti{Goals and Structure}\label{goals-and-structure}

I want to accomplish a couple of things with this code. First and
Expand Down Expand Up @@ -233,8 +244,8 @@

\begin{verbatim}
function createBlock(name, value, contents){
var item = elem('div',
{'class': 'block', draggable: true, 'data-name': name},
var item = elem('div',
{'class': 'block', draggable: true, 'data-name': name},
[name]
);
if (value !== undefined && value !== null){
Expand All @@ -245,7 +256,7 @@
elem('div', {'class': 'container'}, contents.map(function(block){
return createBlock.apply(null, block);
})));
}else if (typeof contents === 'string'){
}else if (typeof contents === 'string'){
// Add units (degrees, etc.) specifier
item.appendChild(document.createTextNode(' ' + contents));
}
Expand Down Expand Up @@ -286,8 +297,8 @@
}
function blockUnits(block){
if (block.children.length > 1 &&
block.lastChild.nodeType === Node.TEXT_NODE &&
if (block.children.length > 1 &&
block.lastChild.nodeType === Node.TEXT_NODE &&
block.lastChild.textContent){
return block.lastChild.textContent.slice(1);
}
Expand Down Expand Up @@ -398,7 +409,7 @@
return;
}
// Necessary. Allows us to drop.
if (evt.preventDefault) { evt.preventDefault(); }
if (evt.preventDefault) { evt.preventDefault(); }
if (dragType === 'menu'){
// See the section on the DataTransfer object.
evt.dataTransfer.dropEffect = 'copy';
Expand All @@ -425,7 +436,7 @@
var dropType = 'script';
if (matches(dropTarget, '.menu')){ dropType = 'menu'; }
// stops the browser from redirecting.
if (evt.stopPropagation) { evt.stopPropagation(); }
if (evt.stopPropagation) { evt.stopPropagation(); }
if (dragType === 'script' && dropType === 'menu'){
trigger('blockRemoved', dragTarget.parentElement, dragTarget);
dragTarget.parentElement.removeChild(dragTarget);
Expand Down
11 changes: 9 additions & 2 deletions tex/ci.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
\begin{aosachapter}{A Continuous Integration System}{s:ci}{Malini Das}

\emph{Malini Das is a software engineer who is passionate about
developing quickly (but safely!), and solving cross-functional problems.
She has worked at Mozilla as a tools engineer and is currently honing
her skills at Twitch. Follow Malini on
\href{https://twitter.com/malinidas}{Twitter} or on her
\href{http://malinidas.com/}{blog}.}

\aosasecti{What is a Continuous Integration
System?}\label{what-is-a-continuous-integration-system}

Expand Down Expand Up @@ -178,7 +185,7 @@
$ cp -r /this/directory/tests /path/to/test_repo/
$ cd /path/to/test\_repo
$ git add tests/
$ git commit -m”add tests”
$ git commit -m ”add tests”
\end{verbatim}

Now you have a commit in the master repository.
Expand Down Expand Up @@ -226,7 +233,7 @@

The observer must know which repository to observe. We previously
created a clone of our repository at
\texttt{/path/to/test\_repo\_clone\_obs}. The repository will use this
\texttt{/path/to/test\_repo\_clone\_obs}. The observer will use this
clone to detect changes. To allow the repository observer to use this
clone, we pass it the path when we invoke the \texttt{repo\_observer.py}
file. The repository observer will use this clone to pull from the main
Expand Down
9 changes: 9 additions & 0 deletions tex/cluster.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
\begin{aosachapter}{Clustering by Consensus}{s:cluster}{Dustin J. Mitchell}

\emph{Dustin is an open source software developer and release engineer
at Mozilla. He has worked on projects as varied as a host configuration
system in Puppet, a Flask-based web framework, unit tests for firewall
configurations, and a continuous integration framework in Twisted
Python. Find him as \href{http://github.com/djmitche}{@djmitche} on
GitHub or at \href{mailto:dustin@mozilla.com}{dustin@mozilla.com}.}

\aosasecti{Introduction}\label{introduction}

In this chapter, we'll explore implementation of a network protocol
designed to support reliable distributed computation. Network protocols
can be difficult to implement correctly, so we'll look at some
Expand Down
100 changes: 61 additions & 39 deletions tex/crawler.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
\begin{aosachapter}{A Web Crawler With asyncio Coroutines}{s:crawler}{A. Jesse Jiryu Davis and Guido van Rossum}

\emph{A. Jesse Jiryu Davis is a staff engineer at MongoDB in New York.
He wrote Motor, the async MongoDB Python driver, and he is the lead
developer of the MongoDB C Driver and a member of the PyMongo team. He
contributes to asyncio and Tornado. He writes at
\url{http://emptysqua.re}.}

\emph{Guido van Rossum is the creator of Python, one of the major
programming languages on and off the web. The Python community refers to
him as the BDFL (Benevolent Dictator For Life), a title straight from a
Monty Python skit. Guido's home on the web is
\url{http://www.python.org/~guido/}.}

\aosasecti{Introduction}\label{introduction}

Classical computer science emphasizes efficient algorithms that complete
computations as quickly as possible. But many networked programs spend
their time not computing, but holding open many connections that are
Expand Down Expand Up @@ -157,7 +171,7 @@
default selector:

\begin{verbatim}
from selectors import DefaultSelector
from selectors import DefaultSelector, EVENT_WRITE
selector = DefaultSelector()
Expand Down Expand Up @@ -316,7 +330,7 @@
def connected(self, key, mask):
print('connected!')
selector.unregister(key.fd)
request = 'GET {} HTTP/1.0\r\nHost: xkcd.com\r\n\r\n'.format(url)
request = 'GET {} HTTP/1.0\r\nHost: xkcd.com\r\n\r\n'.format(self.url)
self.sock.send(request.encode('ascii'))
# Register the next callback.
Expand Down Expand Up @@ -496,7 +510,7 @@
\begin{verbatim}
@asyncio.coroutine
def fetch(self, url):
response = yield from aiohttp.request('get', url)
response = yield from self.session.get(url)
body = yield from response.read()
\end{verbatim}

Expand All @@ -514,14 +528,13 @@
There are many implementations of coroutines; even in Python there are
several. The coroutines in the standard ``asyncio'' library in Python
3.4 are built upon generators, a Future class, and the ``yield from''
statement. Starting in Python 3.5, coroutines will be a native feature
of the language itself\footnote{Python 3.5's built-in coroutines are
statement. Starting in Python 3.5, coroutines are a native feature of
the language itself\footnote{Python 3.5's built-in coroutines are
described in \href{https://www.python.org/dev/peps/pep-0492/}{PEP
492}, ``Coroutines with async and await syntax.'' At the time of this
writing, Python 3.5 was in beta, due for release in September 2015.};
however, understanding coroutines as they were first implemented in
Python 3.4, using pre-existing language facilities, is the foundation to
tackle Python 3.5's native coroutines.
492}, ``Coroutines with async and await syntax.''}; however,
understanding coroutines as they were first implemented in Python 3.4,
using pre-existing language facilities, is the foundation to tackle
Python 3.5's native coroutines.

To explain Python 3.4's generator-based coroutines, we will engage in an
exposition of generators and how they are used as coroutines in asyncio,
Expand Down Expand Up @@ -936,7 +949,7 @@
And from inside \texttt{gen}, we cannot tell if values are sent in from
\texttt{caller} or from outside it. The \texttt{yield from} statement is
a frictionless channel, through which values flow in and out of
\texttt{gen} until it \texttt{gen} completes.
\texttt{gen} until \texttt{gen} completes.

A coroutine can delegate work to a sub-coroutine with
\texttt{yield from} and receive the result of the work. Notice, above,
Expand Down Expand Up @@ -1124,7 +1137,7 @@
\begin{verbatim}
@asyncio.coroutine
def fetch(self, url):
response = yield from aiohttp.request('get', url)
response = yield from self.session.get(url)
body = yield from response.read()
\end{verbatim}

Expand Down Expand Up @@ -1172,10 +1185,11 @@
coroutine and run asyncio's event loop until \texttt{crawl} finishes:

\begin{verbatim}
loop = asyncio.get_event_loop()
crawler = crawling.Crawler('http://xkcd.com',
max_redirect=10)
loop = asyncio.get_event_loop()
loop.run_until_complete(crawler.crawl())
\end{verbatim}

Expand All @@ -1192,6 +1206,10 @@
self.q = Queue()
self.seen_urls = set()
# aiohttp's ClientSession does connection pooling and
# HTTP keep-alives for us.
self.session = aiohttp.ClientSession(loop=loop)
# Put (URL, max_redirect) in the queue.
self.q.put((root_url, self.max_redirect))
\end{verbatim}
Expand Down Expand Up @@ -1372,27 +1390,31 @@
@asyncio.coroutine
def fetch(self, url, max_redirect):
# Handle redirects ourselves.
response = yield from aiohttp.request(
'get', url, allow_redirects=False)
if is_redirect(response):
if max_redirect > 0:
next_url = response.headers['location']
if next_url in self.seen_urls:
# We have been down this path before.
return
# Remember we have seen this URL.
self.seen_urls.add(next_url)
# Follow the redirect. One less redirect remains.
self.q.put_nowait((next_url, max_redirect - 1))
else:
links = yield from self.parse_links(response)
# Python set-logic:
for link in links.difference(self.seen_urls):
self.q.put_nowait((link, self.max_redirect))
self.seen_urls.update(links)
response = yield from self.session.get(
url, allow_redirects=False)
try:
if is_redirect(response):
if max_redirect > 0:
next_url = response.headers['location']
if next_url in self.seen_urls:
# We have been down this path before.
return
# Remember we have seen this URL.
self.seen_urls.add(next_url)
# Follow the redirect. One less redirect remains.
self.q.put_nowait((next_url, max_redirect - 1))
else:
links = yield from self.parse_links(response)
# Python set-logic:
for link in links.difference(self.seen_urls):
self.q.put_nowait((link, self.max_redirect))
self.seen_urls.update(links)
finally:
# Return connection to pool.
yield from response.release()
\end{verbatim}

If the response is a page, rather than a redirect, \texttt{fetch} parses
Expand Down Expand Up @@ -1584,11 +1606,11 @@
This chapter was written during a renaissance in the history of Python
and async. Generator-based coroutines, whose devising you have just
learned, were released in the ``asyncio'' module with Python 3.4 in
March 2014. In September 2015, Python 3.5 will be released with
coroutines built in to the language itself. These native coroutines will
be declared with the new syntax ``async def'', and instead of ``yield
from'', they will use the new ``await'' keyword to delegate to a
coroutine or wait for a Future.
March 2014. In September 2015, Python 3.5 was released with coroutines
built in to the language itself. These native coroutinesare declared
with the new syntax ``async def'', and instead of ``yield from'', they
use the new ``await'' keyword to delegate to a coroutine or wait for a
Future.

Despite these advances, the core ideas remain. Python's new native
coroutines will be syntactically distinct from generators but work very
Expand Down
Loading

0 comments on commit cb46a98

Please sign in to comment.