diff --git a/.gitignore b/.gitignore index 644ffd486..f013fa1a0 100644 --- a/.gitignore +++ b/.gitignore @@ -44,6 +44,8 @@ tex/modeller-images tex/modeller.markdown tex/objmodel-images tex/objmodel.markdown +tex/ocr-images +tex/ocr.markdown tex/pedometer-images tex/pedometer.markdown tex/sample-images diff --git a/blockcode/blockcode.markdown b/blockcode/blockcode.markdown index 35a8f39c8..aa04a5f26 100644 --- a/blockcode/blockcode.markdown +++ b/blockcode/blockcode.markdown @@ -1,9 +1,11 @@ title: Blockcode: A visual programming toolkit -author: Dethe Elze +author: Dethe Elza + +_[Dethe](https://twitter.com/dethe) is a geek dad, aesthetic programmer, mentor, and creator of the [Waterbear](http://waterbearlang.com/) visual programming tool. He co-hosts the Maker Education Salons and wants to fill the world with robotic origami rabbits._ In block-based programming languages, you write programs by dragging and connecting blocks that represent parts of the program. Block-based languages differ from conventional programming languages, in which you type words and symbols. -Learning a programming language can be difficult because they are extremely sensitive to even the slightest of typos. Most programming languages are case-sensitive, have obscure syntax, and will refuse to run if you get so much as a semicolon in the wrong place --- or worse, leave one out. Further, most programming languages in use today are based on English and their syntax cannot be localized. +Learning a programming language can be difficult because they are extremely sensitive to even the slightest of typos. Most programming languages are case-sensitive, have obscure syntax, and will refuse to run if you get so much as a semicolon in the wrong place --- or worse, leave one out. Further, most programming languages in use today are based on English and their syntax cannot be localized. In contrast, a well-done block language can eliminate syntax errors completely. You can still create a program which does the wrong thing, but you cannot create one with the wrong syntax: the blocks just won't fit that way. Block languages are more discoverable: you can see all the constructs and libraries of the language right in the list of blocks. Further, blocks can be localized into any human language without changing the meaning of the programming language. @@ -33,11 +35,11 @@ There is nothing stopping us from adding additional stages to be more like a tra ### Web Applications -In order to make the tool available to the widest possible audience, it is web-native. It's written in HTML, CSS, and JavaScript, so it should work in most browsers and platforms. +In order to make the tool available to the widest possible audience, it is web-native. It's written in HTML, CSS, and JavaScript, so it should work in most browsers and platforms. -Modern web browsers are powerful platforms, with a rich set of tools for building great apps. If something about the implementation became too complex, I took that as a sign that I wasn't doing it "the web way" and, where possible, tried to re-think how to better leverage the tools built into the browser. +Modern web browsers are powerful platforms, with a rich set of tools for building great apps. If something about the implementation became too complex, I took that as a sign that I wasn't doing it "the web way" and, where possible, tried to re-think how to better leverage the tools built into the browser. -An important difference between web applications and traditional desktop or server applications is the lack of a `main()` or other entry point. There is no explicit run loop because that is already built into the browser and implicit on every web page. All our code will be parsed and executed on load, at which point we can register for events we are interested in for interacting with the user. After the first run, all further interaction with our code will be through callbacks we set up and register, whether we register those for events (like mouse movement), timeouts (fired with the periodicity we specify), or frame handlers (called for each screen redraw, generally 60 frames per second). The browser does not expose full-featured threads either (only shared-nothing web workers). +An important difference between web applications and traditional desktop or server applications is the lack of a `main()` or other entry point. There is no explicit run loop because that is already built into the browser and implicit on every web page. All our code will be parsed and executed on load, at which point we can register for events we are interested in for interacting with the user. After the first run, all further interaction with our code will be through callbacks we set up and register, whether we register those for events (like mouse movement), timeouts (fired with the periodicity we specify), or frame handlers (called for each screen redraw, generally 60 frames per second). The browser does not expose full-featured threads either (only shared-nothing web workers). ## Stepping Through the Code @@ -45,7 +47,7 @@ I've tried to follow some conventions and best practices throughout this project The code style is procedural, not object-oriented or functional. We could do the same things in any of these paradigms, but that would require more setup code and wrappers to impose on what exists already for the DOM. Recent work on [Custom Elements](http://webcomponents.org/) make it easier to work with the DOM in an OO way, and there has been a lot of great writing on [Functional JavaScript](https://leanpub.com/javascript-allonge/read), but either would require a bit of shoe-horning, so it felt simpler to keep it procedural. -There are eight source files in this project, but `index.html` and `blocks.css` are basic structure and style for the app and won't be discussed. Two of the JavaScript files won't be discussed in any detail either: `util.js` contains some helpers and serves as a bridge between different browser implementations --- similar to a library like jQuery but in less than 50 lines of code. `file.js` is a similar utility used for loading and saving files and serializing scripts. +There are eight source files in this project, but `index.html` and `blocks.css` are basic structure and style for the app and won't be discussed. Two of the JavaScript files won't be discussed in any detail either: `util.js` contains some helpers and serves as a bridge between different browser implementations --- similar to a library like jQuery but in less than 50 lines of code. `file.js` is a similar utility used for loading and saving files and serializing scripts. These are the remaining files: @@ -77,8 +79,8 @@ The `createBlock(name, value, contents)` function returns a block as a DOM eleme ```javascript function createBlock(name, value, contents){ - var item = elem('div', - {'class': 'block', draggable: true, 'data-name': name}, + var item = elem('div', + {'class': 'block', draggable: true, 'data-name': name}, [name] ); if (value !== undefined && value !== null){ @@ -89,7 +91,7 @@ The `createBlock(name, value, contents)` function returns a block as a DOM eleme elem('div', {'class': 'container'}, contents.map(function(block){ return createBlock.apply(null, block); }))); - }else if (typeof contents === 'string'){ + }else if (typeof contents === 'string'){ // Add units (degrees, etc.) specifier item.appendChild(document.createTextNode(' ' + contents)); } @@ -97,7 +99,7 @@ The `createBlock(name, value, contents)` function returns a block as a DOM eleme } ``` -We have some utilities for handling blocks as DOM elements: +We have some utilities for handling blocks as DOM elements: - `blockContents(block)` retrieves the child blocks of a container block. It always returns a list if called on a container block, and always returns null on a simple block - `blockValue(block)` returns the numerical value of the input on a block if the block has an input field of type number, or null if there is no input element for the block @@ -116,8 +118,8 @@ We have some utilities for handling blocks as DOM elements: } function blockUnits(block){ - if (block.children.length > 1 && - block.lastChild.nodeType === Node.TEXT_NODE && + if (block.children.length > 1 && + block.lastChild.nodeType === Node.TEXT_NODE && block.lastChild.textContent){ return block.lastChild.textContent.slice(1); } @@ -195,7 +197,7 @@ While we are dragging, the `dragenter`, `dragover`, and `dragout` events give us return; } // Necessary. Allows us to drop. - if (evt.preventDefault) { evt.preventDefault(); } + if (evt.preventDefault) { evt.preventDefault(); } if (dragType === 'menu'){ // See the section on the DataTransfer object. evt.dataTransfer.dropEffect = 'copy'; @@ -216,7 +218,7 @@ When we release the mouse, we get a `drop` event. This is where the magic happen var dropType = 'script'; if (matches(dropTarget, '.menu')){ dropType = 'menu'; } // stops the browser from redirecting. - if (evt.stopPropagation) { evt.stopPropagation(); } + if (evt.stopPropagation) { evt.stopPropagation(); } if (dragType === 'script' && dropType === 'menu'){ trigger('blockRemoved', dragTarget.parentElement, dragTarget); dragTarget.parentElement.removeChild(dragTarget); @@ -277,7 +279,7 @@ We use `scriptDirty` to keep track of whether the script has been modified since var scriptDirty = false; ``` -When we want to notify the system to run the script during the next frame handler, we call `runSoon()` which sets the `scriptDirty` flag to `true`. The system calls `run()` on every frame, but returns immediately unless `scriptDirty` is set. When `scriptDirty` is set, it runs all the script blocks, and also triggers events to let the specific language handle any tasks it needs before and after the script is run. This decouples the blocks-as-toolkit from the turtle language to make the blocks re-usable (or the language pluggable, depending how you look at it). +When we want to notify the system to run the script during the next frame handler, we call `runSoon()` which sets the `scriptDirty` flag to `true`. The system calls `run()` on every frame, but returns immediately unless `scriptDirty` is set. When `scriptDirty` is set, it runs all the script blocks, and also triggers events to let the specific language handle any tasks it needs before and after the script is run. This decouples the blocks-as-toolkit from the turtle language to make the blocks re-usable (or the language pluggable, depending how you look at it). As part of running the script, we iterate over each block, calling `runEach(evt)` on it, which sets a class on the block, then finds and executes its associated function. If we slow things down, you should be able to watch the code execute as each block highlights to show when it is running. @@ -342,7 +344,7 @@ We define `repeat(block)` here, outside of the turtle language, because it is ge \aosafigure[240pt]{blockcode-images/turtle_example.png}{Example of Turtle code running}{500l.blockcode.turtle} -Turtle programming is a style of graphics programming, first popularized by Logo, where you have an imaginary turtle carrying a pen walking on the screen. You can tell the turtle to pick up the pen (stop drawing, but still move), put the pen down (leaving a line everywhere it goes), move forward a number of steps, or turn a number of degrees. Just those commands, combined with looping, can create amazingly intricate images. +Turtle programming is a style of graphics programming, first popularized by Logo, where you have an imaginary turtle carrying a pen walking on the screen. You can tell the turtle to pick up the pen (stop drawing, but still move), put the pen down (leaving a line everywhere it goes), move forward a number of steps, or turn a number of degrees. Just those commands, combined with looping, can create amazingly intricate images. In this version of turtle graphics we have a few extra blocks. Technically we don't need both `turn right` and `turn left` because you can have one and get the other with negative numbers. Likewise `move back` can be done with `move forward` and negative numbers. In this case it felt more balanced to have both. @@ -359,7 +361,7 @@ The image above was formed by putting two loops inside another loop and adding a var WIDTH, HEIGHT, position, direction, visible, pen, color; ``` -The `reset()` function clears all the state variables to their defaults. If we were to support multiple turtles, these variables would be encapsulated in an object. We also have a utility, `deg2rad(deg)`, because we work in degrees in the UI, but we draw in radians. Finally, `drawTurtle()` draws the turtle itself. The default turtle is simply a triangle, but you could override this to get a more "turtle-looking" turtle. +The `reset()` function clears all the state variables to their defaults. If we were to support multiple turtles, these variables would be encapsulated in an object. We also have a utility, `deg2rad(deg)`, because we work in degrees in the UI, but we draw in radians. Finally, `drawTurtle()` draws the turtle itself. The default turtle is simply a triangle, but you could override this to get a more "turtle-looking" turtle. Note that `drawTurtle` uses the same primitive operations that we define to implement the turtle drawing. Sometimes you don't want to reuse code at different abstraction layers, but when the meaning is clear it can be a big win for code size and performance. @@ -390,7 +392,7 @@ Note that `drawTurtle` uses the same primitive operations that we define to impl } ``` -We have a special block to draw a circle with a given radius at the current mouse position. We special-case `drawCircle` because, while you can certainly draw a circle by repeating `MOVE 1 RIGHT 1` 360 times, controlling the size of the circle is very difficult that way. +We have a special block to draw a circle with a given radius at the current mouse position. We special-case `drawCircle` because, while you can certainly draw a circle by repeating `MOVE 1 RIGHT 1` 360 times, controlling the size of the circle is very difficult that way. ```javascript function drawCircle(radius){ @@ -491,7 +493,7 @@ Now we can use the functions above, with the `Menu.item` function from `menu.js` ### Why Not Use MVC? -Model-View-Controller (MVC) was a good design choice for Smalltalk programs in the '80s and it can work in some variation or other for web apps, but it isn't the right tool for every problem. All the state (the "model" in MVC) is captured by the block elements in a block language anyway, so replicating it into Javascript has little benefit unless there is some other need for the model (if we were editing shared, distributed code, for instance). +Model-View-Controller (MVC) was a good design choice for Smalltalk programs in the '80s and it can work in some variation or other for web apps, but it isn't the right tool for every problem. All the state (the "model" in MVC) is captured by the block elements in a block language anyway, so replicating it into Javascript has little benefit unless there is some other need for the model (if we were editing shared, distributed code, for instance). An early version of Waterbear went to great lengths to keep the model in JavaScript and sync it with the DOM, until I noticed that more than half the code and 90% of the bugs were due to keeping the model in sync with the DOM. Eliminating the duplication allowed the code to be simpler and more robust, and with all the state on the DOM elements, many bugs could be found simply by looking at the DOM in the developer tools. So in this case there is little benefit to building further separation of MVC than we already have in HTML/CSS/JavaScript. @@ -501,14 +503,14 @@ Building a small, tightly scoped version of the larger system I work on has been #### Small Experiments Make Failure OK -Some of the experiments I was able to do with this stripped-down block language were: +Some of the experiments I was able to do with this stripped-down block language were: -- using HTML5 drag-and-drop, +- using HTML5 drag-and-drop, - running blocks directly by iterating through the DOM calling associated functions, - separating the code that runs cleanly from the HTML DOM, - simplified hit testing while dragging, -- building our own tiny vector and sprite libraries (for the game blocks), and -- "live coding" where the results are shown whenever you change the block script. +- building our own tiny vector and sprite libraries (for the game blocks), and +- "live coding" where the results are shown whenever you change the block script. The thing about experiments is that they do not have to succeed. We tend to gloss over failures and dead ends in our work, where failures are punished instead of treated as important vehicles for learning), but failures are essential if you are going to push forward. While I did get the HTML5 drag-and-drop working, the fact that it isn't supported at all on any mobile browser means it is a non-starter for Waterbear. Separating the code out and running code by iterating through the blocks worked so well that I've already begun bringing those ideas to Waterbear, with excellent improvements in testing and debugging. The simplified hit testing, with some modifications, is also coming back to Waterbear, as are the tiny vector and sprite libraries. Live coding hasn't made it to Waterbear yet, but once the current round of changes stabilizes I may introduce it. diff --git a/build.py b/build.py index 4ad1e3472..681de9eac 100644 --- a/build.py +++ b/build.py @@ -15,6 +15,7 @@ def main(chapters=[], epub=False, pdf=False, html=False, mobi=False, pandoc_epub chapter_dirs = [ 'dagoba', + 'ocr', 'contingent', 'same-origin-policy', 'blockcode', @@ -68,6 +69,7 @@ def main(chapters=[], epub=False, pdf=False, html=False, mobi=False, pandoc_epub ] image_paths = [ + './ocr/ocr-images', './contingent/contingent-images', './same-origin-policy/same-origin-policy-images', './blockcode/blockcode-images', @@ -193,7 +195,7 @@ def build_mobi(): def build_html(chapter_markdowns): run('mkdir -p html/content/pages') temp = 'python _build/preprocessor.py --chapter {chap} --html-refs --html-paths --output={md}.1 --latex {md}' - temp2 = 'pandoc --csl=minutiae/ieee.csl --bibliography=tex/500L.bib -t html -f markdown+citations -o html/content/pages/{basename}.md {md}.1' + temp2 = 'pandoc --csl=minutiae/ieee.csl --mathjax --bibliography=tex/500L.bib -t html -f markdown+citations -o html/content/pages/{basename}.md {md}.1' temp3 = './_build/fix_html_title.sh html/content/pages/{basename}.md' for i, markdown in enumerate(chapter_markdowns): basename = os.path.splitext(os.path.split(markdown)[1])[0] diff --git a/ci/README.rst b/ci/README.rst index 2746a5e6a..549607f52 100644 --- a/ci/README.rst +++ b/ci/README.rst @@ -65,7 +65,7 @@ Copy the tests/ folder from this code base to test_repo and commit it:: cp -r /this/directory/tests /path/to/test_repo/ cd /path/to/test_repo git add tests/ - git commit -m”add tests” + git commit -m "add tests" The repo observer will need its own clone of the code:: @@ -110,7 +110,7 @@ to make a new commit. Go to your master repo and make an arbitrary change:: cd /path/to/test_repo touch new_file git add new_file - git commit -m"new file" new_file + git commit -m "new file" new_file then repo_observer.py will realize that there's a new commit and will notify the dispatcher. You can see the output in their respective shells, so you diff --git a/ci/ci.markdown b/ci/ci.markdown index 8cbe242c4..fd618f655 100644 --- a/ci/ci.markdown +++ b/ci/ci.markdown @@ -179,7 +179,7 @@ Copy the tests folder from this code base to `test_repo` and commit it: $ cp -r /this/directory/tests /path/to/test_repo/ $ cd /path/to/test\_repo $ git add tests/ -$ git commit -m”add tests” +$ git commit -m ”add tests” ``` Now you have a commit in the master repository. @@ -227,7 +227,7 @@ modified this assumption for simplicity. The observer must know which repository to observe. We previously created a clone of our repository at `/path/to/test_repo_clone_obs`. -The repository will use this clone to detect changes. To allow the +The observer will use this clone to detect changes. To allow the repository observer to use this clone, we pass it the path when we invoke the `repo_observer.py` file. The repository observer will use this clone to pull from the main repository. diff --git a/contingent/contingent.markdown b/contingent/contingent.markdown index 474727669..4b1cc29cb 100644 --- a/contingent/contingent.markdown +++ b/contingent/contingent.markdown @@ -19,8 +19,7 @@ things; he loves seeing the spark of wonder and delight in people's eyes when someone shares a novel, surprising, or beautiful idea. Daniel lives in Atlanta with a microbiologist and four aspiring rocketeers._ -Introduction -============ +## Introduction Build systems have long been a standard tool within computer programming. @@ -32,7 +31,7 @@ It not only lets you declare that an output file depends upon one (or more) inputs, but lets you do this recursively. A program, for example, might depend upon an object file -which itself depends upon the corresponding source code:: +which itself depends upon the corresponding source code: ``` prog: main.o @@ -70,8 +69,7 @@ The problem, again, is cross-referencing. Where do cross-references tend to emerge? In text documents, documentation, and printed books! -The Problem: Building Document Systems -====================================== +## The Problem: Building Document Systems Systems to rebuild formatted documents from source texts always seem to do too much work, or too little. @@ -132,7 +130,7 @@ If you later reconsider the tutorial’s chapter title — after all, the word “newcomer” sounds so antique, as if your users are settlers who have just arrived in pioneer Wyoming — then you would edit the first line of `tutorial.rst` -and write something better:: +and write something better: ``` -Newcomers Tutorial @@ -279,8 +277,7 @@ This can happen for many kinds of cross reference that Sphinx supports: chapter titles, section titles, paragraphs, classes, methods, and functions. -Build Systems and Consistency -============================= +## Build Systems and Consistency The problem outlined above is not specific to Sphinx. Not only does it haunt other document systems, like LaTeX, @@ -290,7 +287,7 @@ with the venerable `make` utility, if their assets happen to cross-reference in interesting ways. As the problem is ancient and universal, -its solution is of equally long lineage:: +its solution is of equally long lineage: ```bash $ rm -r _build/ @@ -332,8 +329,7 @@ while performing the fewest possible rebuild steps. While Contingent can be applied to any problem domain, we will run it against a small version of the problem outlined above. -Linking Tasks To Make a Graph -============================= +## Linking Tasks To Make a Graph Any build system needs a way to link inputs and outputs. The three markup texts in our discussion above, @@ -544,8 +540,7 @@ at either end of the edge. But in return for this redundancy, the data structure supports the fast lookup that Contingent needs. -The Proper Use of Classes -========================= +## The Proper Use of Classes You may have been surprised by the absence of classes in the above discussion @@ -637,7 +632,7 @@ and that the nodes themselves in these early examples are simply strings. Coming from other languages and traditions, one might have expected to see -user-defined classes and interfaces for everything in the system:: +user-defined classes and interfaces for everything in the system: ```java Graph g = new ConcreteGraph(); @@ -862,8 +857,7 @@ will eventually have Contingent do for us: the graph `g` captures the inputs and consequences for the various artifacts in our project's documentation. -Learning Connections -==================== +## Learning Connections We now have a way for Contingent to keep track of tasks and the relationships between them. @@ -1311,8 +1305,7 @@ at its disposal, Contingent knows all the things to rebuild if the inputs to any tasks change. -Chasing Consequences -==================== +## Chasing Consequences Once the initial build has run to completion, Contingent needs to monitor the input files for changes. @@ -1542,8 +1535,7 @@ nevertheless returned the same value means that all further downstream tasks were insulated from the change and did not get re-invoked. -Conclusion -========== +## Conclusion There exist languages and programming methodologies under which Contingent would be a suffocating forest of tiny classes diff --git a/functionalDB/functionalDB.markdown b/functionalDB/functionalDB.markdown index e2c32dcc5..174523568 100644 --- a/functionalDB/functionalDB.markdown +++ b/functionalDB/functionalDB.markdown @@ -721,7 +721,7 @@ Our data model is based on accumulation of facts (i.e., datoms) over time. For t ### Query Language -Let's look at an example query in our proposed language. This query asks: "What are the names and birthday of entities who like pizza, speak English, and who have a birthday this month?" +Let's look at an example query in our proposed language. This query asks: "What are the names and birthdays of entities who like pizza, speak English, and who have a birthday this month?" ```clojure { :find [?nm ?bd ] :where [ @@ -1281,12 +1281,12 @@ The twist to the index structure is that now we hold a binding pair of the entit At the end of phase 3 of our example execution, we have the following structure at hand: ```clojure - {[1 "?e"] { - [:likes nil] ["Pizza" nil] - [:name nil] ["USA" "?nm"] - [:speaks nil] ["English" nil] - [:birthday nil] ["July 4, 1776" "?bd"]} - }} +{[1 "?e"]{ + {[:likes nil] ["Pizza" nil]} + {[:name nil] ["USA" "?nm"]} + {[:speaks nil] ["English" nil]} + {[:birthday nil] ["July 4, 1776" "?bd"]} +}} ``` #### Phase 4: Unify and Report diff --git a/ocr/data.csv b/ocr/code/data.csv similarity index 100% rename from ocr/data.csv rename to ocr/code/data.csv diff --git a/ocr/dataLabels.csv b/ocr/code/dataLabels.csv similarity index 100% rename from ocr/dataLabels.csv rename to ocr/code/dataLabels.csv diff --git a/ocr/neural_network_design.py b/ocr/code/neural_network_design.py similarity index 100% rename from ocr/neural_network_design.py rename to ocr/code/neural_network_design.py diff --git a/ocr/nn.json b/ocr/code/nn.json similarity index 100% rename from ocr/nn.json rename to ocr/code/nn.json diff --git a/ocr/ocr.html b/ocr/code/ocr.html similarity index 100% rename from ocr/ocr.html rename to ocr/code/ocr.html diff --git a/ocr/ocr.js b/ocr/code/ocr.js similarity index 100% rename from ocr/ocr.js rename to ocr/code/ocr.js diff --git a/ocr/ocr.py b/ocr/code/ocr.py similarity index 96% rename from ocr/ocr.py rename to ocr/code/ocr.py index 90bdff3ac..fb304fe6e 100644 --- a/ocr/ocr.py +++ b/ocr/code/ocr.py @@ -75,13 +75,13 @@ def train(self, training_data_array): actual_vals = [0] * 10 # actual_vals is a python list for easy initialization and is later turned into an np matrix (2 lines down). actual_vals[data['label']] = 1 output_errors = np.mat(actual_vals).T - np.mat(y2) - hiddenErrors = np.multiply(np.dot(np.mat(self.theta2).T, output_errors), self.sigmoid_prime(sum1)) + hidden_errors = np.multiply(np.dot(np.mat(self.theta2).T, output_errors), self.sigmoid_prime(sum1)) # Step 4: Update weights - self.theta1 += self.LEARNING_RATE * np.dot(np.mat(hiddenErrors), np.mat(data['y0'])) + self.theta1 += self.LEARNING_RATE * np.dot(np.mat(hidden_errors), np.mat(data['y0'])) self.theta2 += self.LEARNING_RATE * np.dot(np.mat(output_errors), np.mat(y1).T) self.hidden_layer_bias += self.LEARNING_RATE * output_errors - self.input_layer_bias += self.LEARNING_RATE * hiddenErrors + self.input_layer_bias += self.LEARNING_RATE * hidden_errors def predict(self, test): y1 = np.dot(np.mat(self.theta1), np.mat(test).T) diff --git a/ocr/server.py b/ocr/code/server.py similarity index 94% rename from ocr/server.py rename to ocr/code/server.py index a40076028..b8a3e77f6 100644 --- a/ocr/server.py +++ b/ocr/code/server.py @@ -24,8 +24,8 @@ class JSONHandler(BaseHTTPServer.BaseHTTPRequestHandler): def do_POST(s): response_code = 200 response = "" - varLen = int(s.headers.get('Content-Length')) - content = s.rfile.read(varLen); + var_len = int(s.headers.get('Content-Length')) + content = s.rfile.read(var_len); payload = json.loads(content); if payload.get('train'): diff --git a/ocr/ocr-images/ann.png b/ocr/ocr-images/ann.png new file mode 100644 index 000000000..6a503c073 Binary files /dev/null and b/ocr/ocr-images/ann.png differ diff --git a/ocr/ocr.markdown b/ocr/ocr.markdown new file mode 100644 index 000000000..01cc6ba14 --- /dev/null +++ b/ocr/ocr.markdown @@ -0,0 +1,779 @@ +title: Optical Character Recognition (OCR) +author: Marina Samuel + +## Introduction + +What if your computer could wash your dishes, do your laundry, cook you dinner, +and clean your home? I think I can safely say that most people would be happy +to get a helping hand! But what would it take for a computer to be able to +perform these tasks, in exactly the same way that humans can? + +The famous computer scientist Alan Turing proposed the Turing Test as a way to +identify whether a machine could have intelligence indistinguishable from that +of a human being. The test involves a human posing questions to two hidden +entities, one human, and the other a machine, and trying to identify which is +which. If the interrogator is unable to identify the machine, then the machine +is considered to have human-level intelligence. + +While there is a lot of controversy surrounding whether the Turing Test is a +valid assessment of intelligence, and whether we can build such intelligent +machines, there is no doubt that machines with some degree of intelligence +already exist. There is currently software that helps robots navigate an office +and perform small tasks, or help those suffering with Alzheimer's. More common +examples of Artificial Intelligence (A.I.) are the way that Google estimates +what you’re looking for when you search for some keywords, or the way that +Facebook decides what to put in your news feed. + +One well known application of A.I. is Optical Character Recognition (OCR). An +OCR system is a piece of software that can take images of handwritten +characters as input and interpret them into machine readable text. While you +may not think twice when depositing a handwritten cheque into a bank machine +that confirms the deposit value, there is some interesting work going on in the +background. This chapter will examine a working example of a simple OCR system +that recognizes numerical digits using an Artificial Neural Network (ANN). But +first, let’s establish a bit more context. + + +## What is Artificial Intelligence? +\label{sec.ocr.ai} +While Turing’s definition of intelligence sounds reasonable, at the end of the +day what constitutes intelligence is fundamentally a philosophical debate. +Computer scientists have, however, categorized certain types of systems and +algorithms into branches of AI. Each branch is used to solve certain sets of +problems. These branches include the following examples, as well as [many +others](http://www-formal.stanford.edu/jmc/whatisai/node2.html): + +- Logical and probabilistic deduction and inference based on some predefined + knowledge of a world. e.g. [Fuzzy + inference](http://www.cs.princeton.edu/courses/archive/fall07/cos436/HIDDEN/Knapp/fuzzy004.htm) + can help a thermostat decide when to turn on the air conditioning when it + detects that the temperature is hot and the atmosphere is humid +- Heuristic search. e.g. Searching can be used to find the best possible next + move in a game of chess by searching all possible moves and choosing the one + that most improves your position +- Machine learning (ML) with feedback models. e.g. Pattern-recognition problems + like OCR. + +In general, ML involves using large data sets to train a system to identify +patterns. The training data sets may be labelled, meaning the system’s expected +outputs are specified for given inputs, or unlabelled meaning expected outputs +are not specified. Algorithms that train systems with unlabelled data are +called _unsupervised_ algorithms and those that train with labelled data are +called _supervised_. Although many ML algorithms and techniques exist for +creating OCR systems, ANNs are one simple approach. + +## Artificial Neural Networks +### What Are ANNs? +\label{sec.ocr.ann} +An ANN is a structure consisting of interconnected nodes that communicate with +one another. The structure and its functionality are inspired by neural +networks found in a biological brain. [Hebbian +Theory](http://www.nbb.cornell.edu/neurobio/linster/BioNB420/hebb.pdf) explains +how these networks can learn to identify patterns by physically altering their +structure and link strengths. Similarly, a typical ANN (shown in +\aosafigref{500l.ocr.ann}) has connections between nodes that have a weight +which is updated as the network learns. The nodes labelled "+1" are called +_biases_. The leftmost blue column of nodes are _input nodes_, the middle +column contains _hidden nodes_, and the rightmost column contains _output +nodes_. There may be many columns of hidden nodes, known as _hidden layers_. + +\aosafigure[360pt]{ocr-images/ann.png}{An Artificial Neural Network}{500l.ocr.ann} + +The values inside all of the circular nodes in \aosafigref{500l.ocr.ann} +represent the output of the node. If we call the output of the $n$th node from +the top in layer $L$ as a $n(L)$ and the connection between the $i$th node in +layer $L$ and the $j$th node in layer $L+1$ as $w^{(L)}_ji$, then the output of +node $a^{(2)}_2$ is: + +$$ +a^{(2)}_2 = f(w^{(1)}_{21}x_1 + w^{(1)}_{22}x_2 + b^{(1)}_{2}) +$$ + +where $f(.)$ is known as the _activation function_ and $b$ is the _bias_. An +activation function is the decision-maker for what type of output a node has. +A bias is an additional node with a fixed output of 1 that may be added to an +ANN to improve its accuracy. We’ll see more details on both of these in + \aosasecref{sec.ocr.feedforward}. + +This type of network topology is called a _feedforward_ neural network because +there are no cycles in the network. ANNs with nodes whose outputs feed into +their inputs are called recurrent neural networks. There are many algorithms +that can be applied to train feedforward ANNs; one commonly used algorithm is +called _backpropagation_. The OCR system we will implement in this chapter will +use backpropagation. + +### How Do We Use ANNs? +Like most other ML approaches, the first step for using backpropagation is to +decide how to transform or reduce our problem into one that can be solved by an +ANN. In other words, how can we manipulate our input data so we can feed it +into the ANN? For the case of our OCR system, we can use the positions of the +pixels for a given digit as input. It is worth noting that, often times, +choosing the input data format is not this simple. If we were analyzing large +images to identify shapes in them, for instance, we may need to pre-process the +image to identify contours within it. These contours would be the input. + +Once we’ve decided on our input data format, what’s next? Since backpropagation +is a supervised algorithm, it will need to be trained with labelled data, as +mentioned in \aosasecref{sec.ocr.ai}. Thus, when passing the pixel positions as training +input, we must also pass the associated digit. This means that we must find or +gather a large data set of drawn digits and associated values. + +The next step is to partition the data set into a training set and validation +set. The training data is used to run the backpropagation algorithm to set the +weights of the ANN. The validation data is used to make predictions using the +trained network and compute its accuracy. If we were comparing the performance +of backpropagation vs. another algorithm on our data, we would [split the +data](http://www-group.slac.stanford.edu/sluo/Lectures/stat_lecture_files/sluo2006lec7.pdf) +into 50% for training, 25% for comparing performance of the 2 algorithms +(validation set) and the final 25% for testing accuracy of the chosen algorithm +(test set). Since we’re not comparing algorithms, we can group one of the +25% sets as part of the training set and use 75% of the data to train the +network and 25% for validating that it was trained well. + +The purpose of identifying the accuracy of the ANN is two-fold. First, it is to +avoid the problem of _overfitting_. Overfitting occurs when the network has a +much higher accuracy on predicting the training set than the validation set. +Overfitting tells us that the chosen training data does not generalize well +enough and needs to be refined. Secondly, testing the accuracy of several +different numbers of hidden layers and hidden nodes helps in designing the most +optimal ANN size. An optimal ANN size will have enough hidden nodes and layers +to make accurate predictions but also as few nodes/connections as possible to +reduce computational overhead that may slow down training and predictions. Once +the optimal size has been decided and the network has been trained, it’s ready +to make predictions! + +## Design Decisions in a Simple OCR System +\label{sec.ocr.decisions} +In the last few paragraphs we’ve gone over some of the basics of feedforward +ANNs and how to use them. Now it’s time to talk about how we can build an OCR +system. + +First off, we must decide what we want our system to be able to do. To keep +things simple, let’s allow users to draw a single digit and be able to train +the OCR system with that drawn digit or to request that the system predict what +the drawn digit is. While an OCR system could run locally on a single machine, +having a client-server setup gives much more flexibility. It makes +crowd-sourced training of an ANN possible and allows powerful servers to handle +intensive computations. + +Our OCR system will consist of 5 main components, divided into 5 files. There +will be: + +- a client (`ocr.js`) +- a server (`server.py`) +- a simple user interface (`ocr.html`) +- an ANN trained via backpropagation (`ocr.py`) +- an ANN design script (`neural_network_design.py`) + +The user interface will be simple: a canvas to draw digits on and buttons to +either train the ANN or request a prediction. The client will gather the drawn +digit, translate it into an array, and pass it to the server to be processed +either as a training sample or as a prediction request. The server will simply +route the training or prediction request by making API calls to the ANN module. +The ANN module will train the network with an existing data set on its first +initialization. It will then save the ANN weights to a file and re-load them on +subsequent startups. This module is where the core of training and prediction +logic happens. Finally, the design script is for experimenting with different +hidden node counts and deciding what works best. Together, these pieces give us +a very simplistic, but functional OCR system. + +Now that we've thought about how the system will work at a high level, it's +time to put the concepts into code! + +### A Simple Interface (`ocr.html`) +As mentioned earlier, the first step is to gather data for training the +network. We could upload a sequence of hand-written digits to the server, but +that would be awkward. Instead, we could have users actually handwrite the +digits on the page using an HTML canvas. We could then give them a couple of +options to either train or test the network, where training the network also +involves specifying what digit was drawn. This way it is possible to easily +outsource the data collection by pointing people to a website to receive their +input. Here’s some HTML to get us started. + +```html + +
+ + + + +Before reading this, try reading our + + Beginners Tutorial + !
+\end{verbatim} + +What if you now make another edit to the title at the top of the +\texttt{tutorial.rst} file? You will have invalidated \emph{three} +output files: + +\begin{aosaenumerate} +\def\labelenumi{\arabic{enumi}.} +\item + The title at the top of \texttt{tutorial.html} is now out of date, so + the file needs to be rebuilt. +\item + The table of contents in \texttt{index.html} still has the old title, + so that document needs to be rebuilt. +\item + The embedded cross reference in the first paragraph of + \texttt{api.html} still has the old chapter title, and also needs to + be rebuilt. +\end{aosaenumerate} + +What does Sphinx do? + +\begin{verbatim} + writing output... [ 50%] index + writing output... [100%] tutorial +\end{verbatim} + +Whoops. + +Only two files were rebuilt, not three. Sphinx has failed to correctly +rebuild your documentation. + +If you now push your HTML to the web, users will see the old title in +the cross reference at the top of \texttt{api.html} but then a different +title --- the new one --- once the link has carried them to +\texttt{tutorial.html} itself. This can happen for many kinds of cross +reference that Sphinx supports: chapter titles, section titles, +paragraphs, classes, methods, and functions. + +\aosasecti{Build Systems and +Consistency}\label{build-systems-and-consistency} + +The problem outlined above is not specific to Sphinx. Not only does it +haunt other document systems, like LaTeX, but it can even plague +projects that are simply trying to direct compilation steps with the +venerable \texttt{make} utility, if their assets happen to +cross-reference in interesting ways. + +As the problem is ancient and universal, its solution is of equally long +lineage: + +\begin{verbatim} + $ rm -r _build/ + $ make html +\end{verbatim} + +If you remove all of the output, you are guaranteed a complete rebuild! +Some projects even alias \texttt{rm} \texttt{-r} a target named +\texttt{clean} so that only a quick \texttt{make} \texttt{clean} is +necessary to wipe the slate. + +By eliminating every copy of every intermediate or output asset, a hefty +\texttt{rm} \texttt{-r} is able to force the build to start over again +with nothing cached --- with no memory of its earlier state that could +possibly lead to a stale product! + +But could we develop a better approach? + +What if your build system were a persistent process that noticed every +chapter title, every section title, and every cross referenced phrase as +it passed from the source code of one document into the text of another? +Its decisions about whether to rebuild other documents after a change to +a single source file could be precise, instead of mere guesses, and +correct, instead of leaving the output in an inconsistent state. + +The result would be a system like the old static \texttt{make} tool, but +which learned the dependencies between files as they were built --- that +added and removed dependencies dynamically as cross references were +added, updated, and then later deleted. + +In the sections that follow we will construct such a tool in Python, +named Contingent, that guarantees correctness in the presence of dynamic +dependencies while performing the fewest possible rebuild steps. While +Contingent can be applied to any problem domain, we will run it against +a small version of the problem outlined above. + +\aosasecti{Linking Tasks To Make a +Graph}\label{linking-tasks-to-make-a-graph} + +Any build system needs a way to link inputs and outputs. The three +markup texts in our discussion above, for example, each produce a +corresponding HTML output file. The most natural way to express these +relationships is as a collection of boxes and arrows --- or, in +mathematician terminology, \emph{nodes} and \emph{edges} to form a +\emph{graph} (\aosafigref{500l.contingent.graph}.) + +\aosafigure[240pt]{contingent-images/figure1.png}{Three files generated by parsing three input texts.}{500l.contingent.graph} + +Each language in which a programmer might tackle writing a build system +will offer various data structures with which such a graph of nodes and +edges might be represented. + +How could we represent such a graph in Python? + +The Python language gives priority to four generic data structures by +giving them direct support in the language syntax. You can create new +instances of these big-four data structures by simply typing their +literal representation into your source code, and their four type +objects are available as built-in symbols that can be used without being +imported. + +The \textbf{tuple} is a read-only sequence used to hold heterogeneous +data --- each slot in a tuple typically means something different. Here, +a tuple (e.g.~holds together a hostname and port number, and would lose +its meaning if the elements were re-ordered: + +\begin{verbatim} +('dropbox.com', 443) +\end{verbatim} + +The \textbf{list} is a mutable sequence used to hold homogenous data --- +each item usually has the same structure and meaning as its peers. Lists +can be used either to preserve data's original input order, or can be +rearranged or sorted to establish a new and more useful order. + +\begin{verbatim} +['C', 'Awk', 'TCL', 'Python', 'JavaScript'] +\end{verbatim} + +The \textbf{set} does not preserve order. Sets remember only whether a +given value has been added, not how many times, and are therefore the +go-to data structure for removing duplicates from a data stream. For +example, the following two sets, once the language has built them, will +each have three elements: + +\begin{verbatim} +{3, 4, 5} +{3, 4, 5, 4, 4, 3, 5, 4, 5, 3, 4, 5} +\end{verbatim} + +The \textbf{dict} is an associative data structure for storing values +accessible by a key. Dicts let the programmer chose the key by which +each value is indexed, instead of using automatic integer indexing like +the tuple and list. The lookup is backed by a hash table, which means +that dict key lookup runs at the same speed whether the dict has a dozen +or a million keys! + +\begin{verbatim} +{'ssh': 22, 'telnet': 23, 'domain': 53, 'http': 80} +\end{verbatim} + +A key to Python's flexibility is that these four data structures are +composable. The programmer can arbitrarily nest them inside each other +to produce more complex data stores whose rules and syntax remain the +simple ones of the underlying tuples, lists, sets, and dicts. + +Given that each of our graph edges needs to know at least its origin +node and its destination node, the simplest possible representation +would be a tuple. The top edge in Figure~1 might look like: + +\begin{verbatim} + ('tutorial.rst', 'tutorial.html') +\end{verbatim} + +How can we store several edges? While our initial impulse might be to +simply throw all of our edge tuples into a list, that would have +disadvantages. A~list is careful to maintain order, but it is not +meaningful to talk about an absolute order for the edges in a graph. And +a list would be perfectly happy to hold several copies of exactly the +same edge, even though we only want it to be possible to draw a single +arrow between \texttt{tutorial.rst} and \texttt{tutorial.html}. The +correct choice is thus the set, which would have us represent +\aosafigref{500l.contingent.graph} as: + +\begin{verbatim} + {('tutorial.rst', 'tutorial.html'), + ('index.rst', 'index.html'), + ('api.rst', 'api.html')} +\end{verbatim} + +This would allow quick iteration across all of our edges, fast insert +and delete operations for a single edge, and a quick way to check +whether a particular edge was present. + +Unfortunately, those are not the only operations we need. + +A build system like Contingent needs to understand the relationship +between a given node and all the nodes connected to it. For example, +when \texttt{api.rst} changes, Contingent needs to know which assets are +affected by that change, if any, in order to minimize the work performed +while also ensuring a complete build. To answer this question --- ``what +nodes are downstream from \texttt{api.rst}?'' --- we need to examine the +\emph{outgoing} edges from \texttt{api.rst}. But building the dependency +graph requires that Contingent be concerned with a node's \emph{inputs} +as well. What inputs were used, for example, when the build system +assembled the output document \texttt{tutorial.html}? It is by watching +the input to each node that Contingent can know that \texttt{api.html} +depends on \texttt{api.rst} but that \texttt{tutorial.html} does not. As +sources change and rebuilds occur, Contingent rebuilds the incoming +edges of each changed node to remove potentially stale edges and +re-learn which resources a task uses this time around. + +Our set-of-tuples does not make answering either of these questions +easy. If we needed to know the relationship between \texttt{api.html} +and the rest of the graph, we would need to traverse the entire set +looking for edges that start or end at the \texttt{api.html} node. + +An associative data structure like Python's dict would make these chores +easier by allowing direct lookup of all the edges from a particular +node: + +\begin{verbatim} + {'tutorial.rst': {('tutorial.rst', 'tutorial.html')}, + 'tutorial.html': {('tutorial.rst', 'tutorial.html')}, + 'index.rst': {('index.rst', 'index.html')}, + 'index.html': {('index.rst', 'index.html')}, + 'api.rst': {('api.rst', 'api.html')}, + 'api.html': {('api.rst', 'api.html')}} +\end{verbatim} + +Looking up the edges of a particular node would now be blazingly fast, +at the cost of having to store every edge twice: once in a set of +incoming edges, and once in a set of outgoing edges. But the edges in +each set would have to be examined manually to see which are incoming +and which are outgoing. It is also slightly redundant to keep naming the +node over and over again in its set of edges. + +The solution to both of these objections is to place incoming and +outgoing edges in their own separate data structures, which will also +absolve us of having to mention the node over and over again for every +one of the edges in which it is involved. + +\begin{verbatim} + incoming = { + 'tutorial.html': {'tutorial.rst'}, + 'index.html': {'index.rst'}, + 'api.html': {'api.rst'}, + } + + outgoing = { + 'tutorial.rst': {'tutorial.html'}, + 'index.rst': {'index.html'}, + 'api.rst': {'api.html'}, + } +\end{verbatim} + +Notice that \texttt{outgoing} represents, directly in Python syntax, +exactly what we drew in \aosafigref{500l.contingent.graph} earlier: the +source documents on the left will be transformed by the build system +into the output documents on the right. For this simple example each +source points to only one output --- all the output sets have only one +element --- but we will see examples shortly where a single input node +has multiple downstream consequences. + +Every edge in this dictionary-of-sets data structure does get +represented twice, once as an outgoing edge from one node +(\texttt{tutorial.rst} → \texttt{tutorial.html}) and again as an +incoming edge to the other (\texttt{tutorial.html} ← +\texttt{tutorial.rst}). These two representations capture precisely the +same relationship, just from the opposite perspectives of the two nodes +at either end of the edge. But in return for this redundancy, the data +structure supports the fast lookup that Contingent needs. + +\aosasecti{The Proper Use of Classes}\label{the-proper-use-of-classes} + +You may have been surprised by the absence of classes in the above +discussion of Python data structures. After all, classes are a frequent +mechanism for structuring applications and a hardly less frequent +subject of heated debate among their adherents and detractors. Classes +were once thought important enough that entire educational curricula +were designed around them, and the majority of popular programming +languanges include dedicated syntax for defining and using them. + +But it turns out that classes are often orthogonal to the question of +data structure design. Rather than offering us an entirely alternative +data modeling paradigm, classes simply repeat data structures that we +have already seen: + +\begin{aosaitemize} + +\item + A class instance is \emph{implemented} as a dict. +\item + A class instance is \emph{used} like a mutable tuple. +\end{aosaitemize} + +The class offers key lookup into its attribute dictionary through a +prettier syntax, where you get to say \texttt{graph.incoming} instead of +\texttt{graph{[}"incoming"{]}}. But, in practice, class instances are +almost never used as generic key-value stores. Instead, they are used to +organize related but heterogeneous data by attribute name, with +implementation details encapsulated behind a consistent and memorable +interface. + +So instead of putting a hostname and a port number together in a tuple +and having to remember later which came first and which came second, you +create an \texttt{Address} class whose instances each have a +\texttt{host} and a \texttt{port} attribute. You can then pass +\texttt{Address} objects around where otherwise you would have had +anonymous tuples. Code becomes easier to read and easier to write. But +using a class instance does not really change any of the questions we +faced above when doing data design: it just provides a prettier and less +anonymous container. + +The true value of classes, then, is not that they change the science of +data design. The value of classes is that they let you \emph{hide} your +data design from the rest of a program! + +Successful application design hinges upon our ability to exploit the +powerful built-in data structures Python offers us while minimizing the +volume of details we are required to keep in our heads at any one time. +Classes provide the mechanism for resolving this apparent quandary: used +effectively, a class provides a \emph{facade} around some small subset +of the system's overall design. When working within one subset --- a +\texttt{Graph}, for example --- we can forget the implementation details +of other subsets as long as we can remember their interfaces. In this +way, programmers often find themselves navigating among several levels +of abstraction in the course of writing a system, now working with the +specific data model and implementation details for a particular +subsystem, now connecting higher-level concepts through their +interfaces. + +For example, from the outside, code can simply ask for a new +\texttt{Graph} instance: + +\begin{verbatim} +>>> from contingent import graphlib +>>> g = graphlib.Graph() +\end{verbatim} + +without needing to understand the details of how \texttt{Graph} works. +Code that is simply using the graph sees only interface verbs --- the +method calls --- when manipulating a graph, as when an edge is added or +some other operation performed: + +\begin{verbatim} +>>> g.add_edge('index.rst', 'index.html') +>>> g.add_edge('tutorial.rst', 'tutorial.html') +>>> g.add_edge('api.rst', 'api.html') +\end{verbatim} + +Careful readers will have noticed that we added edges to our graph +without explicitly creating ``node'' and ``edge'' objects, and that the +nodes themselves in these early examples are simply strings. Coming from +other languages and traditions, one might have expected to see +user-defined classes and interfaces for everything in the system: + +\begin{verbatim} + Graph g = new ConcreteGraph(); + Node indexRstNode = new StringNode("index.rst"); + Node indexHtmlNode = new StringNode("index.html"); + Edge indexEdge = new DirectedEdge(indexRstNode, indexHtmlNode); + g.addEdge(indexEdge); +\end{verbatim} + +The Python language and community explicitly and intentionally emphasize +using simple, generic data structures to solve problems, instead of +creating custom classes for every minute detail of the problem we want +to tackle. This is one facet of the notion of ``Pythonic'' solutions +that you may have read about. Pythonic solutions try to minimize +syntactic overhead and leverage Python's powerful built-in tools and +extensive standard library. + +With these considerations in mind, let's return to the \texttt{Graph} +class, examining its design and implmentation to see the interplay +between data structures and class interfaces. When a new \texttt{Graph} +instance is constructed, a pair of dictionaries has already been built +to store edges using the logic we outlined in the previous section: + +\begin{verbatim} +class Graph: + """A directed graph of the relationships among build tasks.""" + + def __init__(self): + self._inputs_of = defaultdict(set) + self._consequences_of = defaultdict(set) +\end{verbatim} + +The leading underscore in front of the attribute names +\texttt{\_inputs\_of} and \texttt{\_consequences\_of} is a common +convention in the Python community to signal that an attribute is +private. This convention is one way the community suggests that +programmers pass messages and warnings through space and time to each +other. Recognizing the need to signal differences among public versus +internal object attributes, the community adopted the single leading +underscore as a concise and fairly consistent indicator to other +programmers, including our future selves, that the attribute is best +treated as part of the invisible internal machinery of the class. + +Why are we using a ``defaultdict'' instead of a standard dict? A common +problem when composing dicts with other data structures is handling +missing keys. With a normal dict, retrieving a key that does not exist +raises a \texttt{KeyError}: + +\begin{verbatim} +>>> consequences_of = {} +>>> consequences_of['index.rst'].add('index.html') +Traceback (most recent call last): + ... +KeyError: 'index.rst' +\end{verbatim} + +Using a normal dict requires special checks throughout the code to +handle this specific case, for example when adding a new edge: + +\begin{verbatim} + # Special case to handle “we have not seen this task yet”: + + if input_task not in self._consequences_of: + self._consequences_of[input_task] = set() + + self._consequences_of[input_task].add(consequence_task) +\end{verbatim} + +This need is so common that Python includes a special utility, the +defaultdict, which lets you provide a function that returns a value for +absent keys. When we ask about an edge that the \texttt{Graph} hasn't +yet seen, we will get back an empty \texttt{set} instead of an +exception: + +\begin{verbatim} +>>> from collections import defaultdict +>>> consequences_of = defaultdict(set) +>>> consequences_of['api.rst'] +set() +\end{verbatim} + +Structuring our implementation this way means that each key's first use +can look identical to second-and-subsequent-times that a particular key +is used: + +\begin{verbatim} +>>> consequences_of['index.rst'].add('index.html') +>>> 'index.html' in consequences_of['index.rst'] +True +\end{verbatim} + +Given these techniques, let's examine the implementation of +\texttt{add\_edge}, which we earlier used to build the graph for +\aosafigref{500l.contingent.graph}. + +\begin{verbatim} + def add_edge(self, input_task, consequence_task): + """Add an edge: `consequence_task` uses the output of `input_task`.""" + self._consequences_of[input_task].add(consequence_task) + self._inputs_of[consequence_task].add(input_task) +\end{verbatim} + +This method hides the fact that two, not one, storage steps are required +for each new edge so that we know about it in both directions. And +notice how \texttt{add\_edge()} does not know or care whether either +node has been seen before. Because the inputs and consequences data +structures are each a \texttt{defaultdict(set)}, the +\texttt{add\_edge()} method remains blissfully ignorant as to the +novelty of a node --- the \texttt{defaultdict} takes care of the +difference by creating a new \texttt{set} object on the fly. As we saw +above, \texttt{add\_edge()} would be three times longer had we not used +\texttt{defaultdict}. More importantly, it would be more difficult to +understand and reason about the resulting code. This implementation +demonstrates a Pythonic approach to problems: simple, direct, and +concise. + +Callers should also be given a simple way to visit every edge without +having to learn how to traverse our data structure: + +\begin{verbatim} + def edges(self): + """Return all edges as ``(input_task, consequence_task)`` tuples.""" + return [(a, b) for a in self.sorted(self._consequences_of) + for b in self.sorted(self._consequences_of[a])] +\end{verbatim} + +The \texttt{Graph.sorted()} method, if you want to examine it later, +makes an attempt to sort the nodes in case they have a natural sort +order (such as alphabetical) that can provide a stable output order for +the user. + +By using this traversal method we can see that, following our three +``add'' method calls earlier, \texttt{g} now represents the same graph +that we saw in Figure~1. + +\begin{verbatim} +>>> from pprint import pprint +>>> pprint(g.edges()) +[('api.rst', 'api.html'), + ('index.rst', 'index.html'), + ('tutorial.rst', 'tutorial.html')] +\end{verbatim} + +Since we now have a real live Python object, and not just a figure, we +can ask it interesting questions! For example, when Contingent is +building a blog from source files, it will need to know things like +``What depends on \texttt{api.rst}?'' when the content of +\texttt{api.rst} changes: + +\begin{verbatim} +>>> g.immediate_consequences_of('api.rst') +['api.html'] +\end{verbatim} + +This \texttt{Graph} is telling Contingent that, when \texttt{api.rst} +changes, \texttt{api.html} is now stale and must be rebuilt. How about +\texttt{index.html}? + +\begin{verbatim} +>>> g.immediate_consequences_of('index.html') +[] +\end{verbatim} + +An empty list has been returned, signalling that \texttt{index.html} is +at the right edge of the graph and so nothing further needs to be +rebuilt if it changes. This query can be expressed very simply thanks to +the work that has already gone in to laying out our data: + +\begin{verbatim} + def immediate_consequences_of(self, task): + """Return the tasks that use `task` as an input.""" + return self.sorted(self._consequences_of[task]) +\end{verbatim} + +\begin{verbatim} + >>> from contingent.rendering import as_graphviz + >>> open('figure1.dot', 'w').write(as_graphviz(g)) and None +\end{verbatim} + +\aosafigref{500l.contingent.graph} ignored one of the most important +relationships that we discovered in the opening section of our chapter: +the way that document titles appear in the table of contents. Let's fill +in this detail. We will create a node for each title string that needs +to be generated by parsing an input file and then passed to one of our +other routines: + +\begin{verbatim} +>>> g.add_edge('api.rst', 'api-title') +>>> g.add_edge('api-title', 'index.html') +>>> g.add_edge('tutorial.rst', 'tutorial-title') +>>> g.add_edge('tutorial-title', 'index.html') +\end{verbatim} + +The result is a graph (\aosafigref{500l.contingent.graph2}) that could +properly handle rebuilding the table of contents that we discussed in +the opening of this chapter. + +\aosafigure[240pt]{contingent-images/figure2.png}{Being prepared to rebuild `index.html` whenever any title that it mentions gets changed.}{500l.contingent.graph2} + +This manual walk-through illustrates what we will eventually have +Contingent do for us: the graph \texttt{g} captures the inputs and +consequences for the various artifacts in our project's documentation. + +\aosasecti{Learning Connections}\label{learning-connections} + +We now have a way for Contingent to keep track of tasks and the +relationships between them. If we look more closely at Figure 2, +however, we see that it is actually a little hand wavy and vague: +\emph{how} is \texttt{api.html} produced from \texttt{api.rst}? How do +we know that \texttt{index.html} needs the title from the tutorial? And +how is this dependency resolved? + +Our intuitive notion of these ideas served when we were constructing +consequences graphs by hand, but unfortunately computers are not +terribly intuitive, so we will need to be more precise about what we +want. + +What are the steps required to produce output from sources? How are +these steps defined and executed? And how can Contingent know the +connections between them? + +In Contingent, build tasks are modeled as functions plus arguments. The +functions define actions that a particular project understands how to +perform. The arguments provide the specifics: \emph{which} source +document should be read, \emph{which} blog title is needed. As they are +running, these functions may in turn invoke \emph{other} task functions, +passing whatever arguments they need answers for. + +To see how this works, we will actually now implement the documentation +builder described at the beginning of the chapter. In order to prevent +ourselves from wallowing around in a bog of details, for this +illustration we will work with simplified input and output document +formats. Our input documents will consist of a title on the first line, +with the remainder of the text forming the body. Cross references will +simply be source file names enclosed in back ticks, which on output are +replaced with the title from the corresponding document in the output. + +Here is the content of our example \texttt{index.txt}, \texttt{api.txt}, +and \texttt{tutorial.txt}, illustrating titles, document bodies, and +cross-references from our little document format: + +\begin{verbatim} +>>> index = """ +... Table of Contents +... ----------------- +... * `tutorial.txt` +... * `api.txt` +... """ + +>>> tutorial = """ +... Beginners Tutorial +... ------------------ +... Welcome to the tutorial! +... We hope you enjoy it. +... """ + +>>> api = """ +... API Reference +... ------------- +... You might want to read +... the `tutorial.txt` first. +... """ +\end{verbatim} + +Now that we have some source material to work with, what functions would +a Contingent-based blog builder need? + +In the simplistic examples above, the HTML output files proceed directly +from the source, but in a realistic system, turning source into markup +involves several steps: reading the raw text from disk, parsing the text +to a convenient internal representation, processing any directives the +author may have specified, resolving cross-references or other external +dependencies (such as include files), and applying one or more view +transformations to convert the internal representation to its output +form. + +Contingent manages tasks by grouping them into a \texttt{Project}, a +sort of build system busybody that injects itself into the middle of the +build process, noting every time one task talks to another to construct +a graph of the relationships between all the tasks. + +\begin{verbatim} +>>> from contingent.projectlib import Project, Task +>>> project = Project() +>>> task = project.task +\end{verbatim} + +A build system for the example given at the beginning of the chapter +might involve a few basic tasks. + +Our \texttt{read()} task will pretend to read the files from disk. Since +we really defined the source text in variables, all it needs to do is +convert from a filename to the corresponding text. + +\begin{verbatim} + >>> filesystem = {'index.txt': index, + ... 'tutorial.txt': tutorial, + ... 'api.txt': api} + ... + >>> @task + ... def read(filename): + ... return filesystem[filename] +\end{verbatim} + +The \texttt{parse()} task interprets the raw text of the file contents +according to the specification of our document format. Our format is +very simple: the title of the document appears on the first line, and +the rest of the content is considered the document's body. + +\begin{verbatim} + >>> @task + ... def parse(filename): + ... lines = read(filename).strip().splitlines() + ... title = lines[0] + ... body = '\n'.join(lines[2:]) + ... return title, body +\end{verbatim} + +Because the format is so simple, the parser is a little silly, +admittedly, but it illustrates the interpretive responsibilities that +parsers are required to carry out. Parsing in general is a very +interesting subject and many books have been written either partially or +completely dedicated to it. In a system like Sphinx, the parser must +understand the many markup tokens, directives, and commands defined by +the system, transforming the input text into something the rest of the +system can work with. + +Notice the connection point between \texttt{parse()} and \texttt{read()} +--- the first task in parsing is to pass the filename it has been given +to \texttt{read()}, which finds and returns the contents of that file. + +The \texttt{title\_of()} task, given a source file name, returns the +document's title: + +\begin{verbatim} + >>> @task + ... def title_of(filename): + ... title, body = parse(filename) + ... return title +\end{verbatim} + +This task nicely illustrates the separation of responsibilities between +the parts of a document processing system. The \texttt{title\_of()} +function works directly from an in-memory representation of a document +--- in this case, a tuple --- instead of taking it upon itself to +re-parse the entire document again just to find the title. The +\texttt{parse()} function alone produces the in-memory representation, +in accordance with the contract of the system specification, and the +rest of the blog builder processing functions like \texttt{title\_of()} +simply use its output as their authority. + +If you are coming from an orthodox object-oriented tradition, this +function-oriented design may look a little weird. In an OO solution, +\texttt{parse()} would return some sort of \texttt{Document} object that +has \texttt{title\_of()} as a method or property. In fact, Sphinx works +exactly this way: its \texttt{Parser} subsystem produces a ``Docutils +document tree'' object for the other parts of the system to use. + +Contingent is not opinionated with regard to these differing design +paradigms and supports either approach equally well. For this chapter we +are keeping things simple. + +The final task, \texttt{render()}, turns the in-memory representation of +a document into an output form. It is, in effect, the inverse of +\texttt{parse()}. Whereas \texttt{parse()} takes an input document +conforming to a specification and converts it to an in-memory +representation, \texttt{render()} takes an in-memory representation and +produces an output document conforming to some specification. + +\begin{verbatim} + >>> import re + >>> + >>> LINK = '{}' + >>> PAGE = '\n{}\n
' + >>> + >>> def make_link(match): + ... filename = match.group(1) + ... return LINK.format(filename, title_of(filename)) + ... + >>> @task + ... def render(filename): + ... title, body = parse(filename) + ... body = re.sub(r'`([^`]+)`', make_link, body) + ... return PAGE.format(title, body) +\end{verbatim} + +Here is an example run that will invoke every stage of the above logic +--- rendering \texttt{tutorial.txt} to produce its output: + +\begin{verbatim} +>>> print(render('tutorial.txt')) +
+Welcome to the tutorial! +We hope you enjoy it. +
+\end{verbatim} + +\aosafigref{500l.contingent.graph3} illustrates the task graph that +transitively connects all the tasks required to produce the output, from +reading the input file, parsing and transforming the document, and +rendering the result: + +\aosafigure[240pt]{contingent-images/figure3.png}{A task graph.}{500l.contingent.graph3} + +It turns out that \aosafigref{500l.contingent.graph3} was not hand-drawn +for this chapter, but has been generated directly from Contingent! +Building this graph is possible for the \texttt{Project} object because +it maintains its own call stack, similar to the stack of live execution +frames that Python maintains to remember which function to continue +running when the current one returns. + +Every time that a new task is invoked, Contingent can assume that it has +been called --- and that its output will be used --- by the task +currently at the top of the stack. Maintaining the stack will require +that several extra steps surround the invocation of a task~~\emph{T}: + +\begin{aosaenumerate} +\def\labelenumi{\arabic{enumi}.} + +\item + Push \emph{T} onto the stack. +\item + Execute \emph{T}, letting it call any other tasks it needs. +\item + Pop \emph{T} off the stack. +\item + Return its result. +\end{aosaenumerate} + +To intercept task calls, the \texttt{Project} leverages a key Python +feature: \emph{function decorators}. A~decorator is allowed to process +or transform a function at the moment that it is being defined. The +\texttt{Project.task} decorator uses this opportunity to package every +task inside another function, a \emph{wrapper}, which allows a clean +separation of responsibilities between the wrapper --- which will worry +about graph and stack management on behalf of the Project --- and our +task functions that focus on document processing. Here is what the +\texttt{task} decorator boilerplate looks like: + +\begin{verbatim} + from functools import wraps + + def task(function): + @wraps(function) + def wrapper(*args): + # wrapper body, that will call function() + return wrapper +\end{verbatim} + +This is an entirely typical Python decorator declaration. It can then be +applied to a function by naming it after a \texttt{@} character atop the +\texttt{def} that creates the function: + +\begin{verbatim} + @task + def title_of(filename): + title, body = parse(filename) + return title +\end{verbatim} + +When this definition is complete, the name \texttt{title\_of} will refer +to the wrapped version of the function. The wrapper can access the +original version of the function via the name \texttt{function}, calling +it at the appropriate time. The body of the Contingent wrapper runs +something like this: + +\begin{verbatim} + def task(function): + @wraps(function) + def wrapper(*args): + task = Task(wrapper, args) + + if self.task_stack: + self._graph.add_edge(task, self.task_stack[-1]) + + self._graph.clear_inputs_of(task) + self._task_stack.append(task) + try: + value = function(*args) + finally: + self._task_stack.pop() + + return value + return wrapper +\end{verbatim} + +This wrapper performs several crucial maintenance steps: + +\begin{aosaenumerate} +\def\labelenumi{\arabic{enumi}.} +\item + Packages the task --- a function plus its arguments --- into a small + object for convenience. The \texttt{wrapper} here names the wrapped + version of the task function. +\item + If this task has been invoked by a current task that is already + underway, add an edge capturing the fact that this task is an input to + the already-running task. +\item + Forget whatever we might have learned last time about the task, since + it might make new decisions this time --- if the source text of the + API guide no longer mentions the Tutorial, for example, then its + \texttt{render()} will no longer ask for the \texttt{title\_of()} the + Tutorial document. +\item + Push this task onto the top of the task stack in case it decides, in + its turn, to invoke further tasks in the course of doing its work. +\item + Invoke the task inside of a \texttt{try...finally} block that ensures + we correctly remove the finished task from the stack even if it dies + by raising an exception. +\item + Return the task's return value, so that callers of this wrapper will + not be able to tell that they have not simply invoked the plain task + function itself. +\end{aosaenumerate} + +Steps 4 and 5 maintain the task stack itself, which is then used by step +2 to perform the consequences tracking that is our whole reason for +building a task stack in the first place. + +Since each task gets surrounded by its own copy of the wrapper function, +the mere invocation and execution of the normal stack of tasks will +produce a graph of relationships as an invisible side effect. That is +why we were careful to use the wrapper around every one of the +processing steps we defined: + +\begin{verbatim} + @task + def read(filename): + # body of read + + @task + def parse(filename): + # body of parse + + @task + def title_of(filename): + # body of title_of + + @task + def render(filename): + # body of render +\end{verbatim} + +Thanks to these wrappers, when we called \texttt{parse('tutorial.txt')} +the decorator learned the connection between \texttt{parse} and +\texttt{read}. We can ask about the relationship by building another +\texttt{Task} tuple and asking what the consequences would be if its +output value changed: + +\begin{verbatim} +>>> task = Task(read, ('tutorial.txt',)) +>>> print(task) +read('tutorial.txt') +>>> project._graph.immediate_consequences_of(task) +[parse('tutorial.txt')] +\end{verbatim} + +The consequence of re-reading the \texttt{tutorial.txt} file and finding +its contents have changed is that we need to re-execute the +\texttt{parse()} routine for that document. What happens if we render +the entire set of documents? Will Contingent be able to learn the entire +build process with its interrelationships? + +\begin{verbatim} +>>> for filename in 'index.txt', 'tutorial.txt', 'api.txt': +... print(render(filename)) +... print('=' * 30) +... +
+* Beginners Tutorial +* API Reference +
+============================== +
+Welcome to the tutorial! +We hope you enjoy it. +
+============================== +
+You might want to read +the Beginners Tutorial first. +
+============================== +\end{verbatim} + +It worked! From the output, we can see that our transform substituted +the docuent titles for the directives in our source docuents, indicating +that Contingent was able to discover the connections between the various +tasks needed to build our documents. + +\aosafigure[240pt]{contingent-images/figure4.png}{The complete set of relationships + between our input files and our HTML outputs.}{500l.contingent.graph4} + +By watching one task invoke another through the \texttt{task} wrapper +machinery, \texttt{Project} has automatically learned the graph of +inputs and consequences. Since it has a complete consequences graph at +its disposal, Contingent knows all the things to rebuild if the inputs +to any tasks change. + +\aosasecti{Chasing Consequences}\label{chasing-consequences} + +Once the initial build has run to completion, Contingent needs to +monitor the input files for changes. When the user finishes a new edit +and runs ``Save,'' both the \texttt{read()} method and its consequences +need to be invoked. + +This will require us to walk the graph in the opposite order from the +one in which it was created. It was built, you will recall, by calling +\texttt{render()} for the API Reference and having that call +\texttt{parse()} which finally invoked the \texttt{read()} task. Now we +go in the other direction: we know that \texttt{read()} will now return +new content, and we need to figure out what consequences lie downstream. + +The process of compiling consequences is a recursive one, as each +consequence can itself have further tasks that depended on it. We could +perform this recursion manually through repeated calls to the graph +(note that we are here taking advantage of the fact that the Python +prompt saves the last value displayed under the name \texttt{\_} for use +in the subsequent expression): + +\begin{verbatim} +>>> task = Task(read, ('api.txt',)) +>>> project._graph.immediate_consequences_of(task) +[parse('api.txt')] +>>> t1, = _ +>>> project._graph.immediate_consequences_of(t1) +[render('api.txt'), title_of('api.txt')] +>>> t2, t3 = _ +>>> project._graph.immediate_consequences_of(t2) +[] +>>> project._graph.immediate_consequences_of(t3) +[render('index.txt')] +>>> t4, = _ +>>> project._graph.immediate_consequences_of(t4) +[] +\end{verbatim} + +This recursive task of looking repeatedly for immediate consequences and +only stopping when we arrive at tasks with no further consequences is a +basic enough graph operation that it is supported directly by a method +on the \texttt{Graph} class: + +\begin{verbatim} +>>> # Secretly adjust pprint to a narrower-than-usual width: +>>> _pprint = pprint +>>> pprint = lambda x: _pprint(x, width=40) +>>> pprint(project._graph.recursive_consequences_of([task])) +[parse('api.txt'), + render('api.txt'), + title_of('api.txt'), + render('index.txt')] +\end{verbatim} + +In fact, \texttt{recursive\_consequences\_of()} tries to be a bit +clever. If a particular task appears repeatedly as a downstream +consequence of several other tasks, then it is careful to only mention +it once in the output list, and to move it close to the end so that it +appears only after the tasks that are its inputs. This intelligence is +powered by the classic depth-first implementation of a topological sort, +an algorithm which winds up being fairly easy to write in Python through +a hidden a recursive helper function. Check out the \texttt{graphlib.py} +source code for the details. + +If upon detecting a change we are careful to re-run every task in the +recursive consequences, then Contingent will be able to avoid rebuilding +too little. Our second challenge, however, was to avoid rebuilding too +much. Refer again to Figure~4. We want to avoid rebuilding all three +documents every time that \texttt{tutorial.txt} is changed, since most +edits will probably not affect its title but only its body. How can this +be accomplished? + +The solution is to make graph recomputation dependent on caching. When +stepping forward through the recursive consequences of a change, we will +only invoke tasks whose inputs are different than last time. + +This optimization will involve a final data structure. We will give the +\texttt{Project} a \texttt{\_todo} set with which to remember every task +for which at least one input value has changed, and that therefore +requires re-execution. Because only tasks in \texttt{\_todo} are +out-of-date, the build process can skip running any other tasks unless +they appear there. + +Again, Python's convenient and unified design makes these features very +easy to code. Because task objects are hashable, \texttt{\_todo} can +simply be a set that remembers task items by identity --- guaranteeing +that a task never appears twice --- and the \texttt{\_cache} of return +values from previous runs can be a dict with tasks as keys. + +More precisely, the rebuild step must keep looping as long as +\texttt{\_todo} is non-empty. During each loop, it should: + +\begin{aosaitemize} +\item + Call \texttt{recursive\_consequences\_of()} and pass in every task + listed in \texttt{\_todo}. The return value will be a list of not only + the \texttt{\_todo} tasks themselves, but also every task downstream + of them --- every task, in other words, that could possibly need + re-execution if the outputs come out different this time. +\item + For each task in the list, check whether it is listed in + \texttt{\_todo}. If not, then we can skip running it, because none of + the tasks that we have re-invoked upstream of it has produced a new + return value that would require the task's recomputation. +\item + But for any task that is indeed listed in \texttt{\_todo} by the time + we reach it, we need to ask it to re-run and re-compute its return + value. If the task wrapper function detects that this return value + does not match the old cached value, then its downstream tasks will be + automatically added to \texttt{\_todo} before we reach them in the + list of recursive consequences. +\end{aosaitemize} + +By the time we reach the end of the list, every task that could possibly +need to be re-run should in fact have been re-run. But just in case, we +will check \texttt{\_todo} and try again if it is not yet empty. Even +for very rapidly changing dependency trees, this should quickly settle +out. Only a cycle --- where, for example, task \emph{A} needs the output +of task \emph{B} which itself needs the output of task \emph{A} --- +could keep the builder in an infinite loop, and only if their return +values never stabilize. Fortunately, real-world build tasks are +typically without cycles. + +Let us trace the behavior of this system through an example. + +Suppose you edit \texttt{tutorial.txt} and change both the title and the +body content. We can simulate this by modifying the value in our +\texttt{filesystem} dict: + +\begin{verbatim} +>>> filesystem['tutorial.txt'] = """ +... The Coder Tutorial +... ------------------ +... This is a new and improved +... introductory paragraph. +... """ +\end{verbatim} + +Now that the contents have changed, we can ask the Project to re-run the +\texttt{read()} task by using its \texttt{cache\_off()} context manager +that temporarily disables its willingness to return its old cached +result for a given task and argument: + +\begin{verbatim} +>>> with project.cache_off(): +... text = read('tutorial.txt') +\end{verbatim} + +The new tutorial text has now been read into the cache. How many +downstream tasks will need to be re-executed? + +To help us answer this question, the \texttt{Project} class supports a +simple tracing facility that will tell us which tasks are executed in +the course of a rebuild. Since the above change to \texttt{tutorial.txt} +affects both its body and its title, everything downstream will need to +be re-computed: + +\begin{verbatim} +>>> project.start_tracing() +>>> project.rebuild() +>>> print(project.stop_tracing()) +calling parse('tutorial.txt') +calling render('tutorial.txt') +calling title_of('tutorial.txt') +calling render('api.txt') +calling render('index.txt') +\end{verbatim} + +Looking back at \aosafigref{500l.contingent.graph4}, you can see that, +as expected, this is every task that is an immediate or downstream +consequence of \texttt{read('tutorial.txt')}. + +But what if we edit it again, but this time leave the title the same? + +\begin{verbatim} +>>> filesystem['tutorial.txt'] = """ +... The Coder Tutorial +... ------------------ +... Welcome to the coder tutorial! +... It should be read top to bottom. +... """ +>>> with project.cache_off(): +... text = read('tutorial.txt') +\end{verbatim} + +This small, limited change should have no effect on the other documents. + +\begin{verbatim} +>>> project.start_tracing() +>>> project.rebuild() +>>> print(project.stop_tracing()) +calling parse('tutorial.txt') +calling render('tutorial.txt') +calling title_of('tutorial.txt') +\end{verbatim} + +Success! Only one document got rebuilt. The fact that +\texttt{title\_of()}, given a new input document, nevertheless returned +the same value means that all further downstream tasks were insulated +from the change and did not get re-invoked. + +\aosasecti{Conclusion}\label{conclusion} + +There exist languages and programming methodologies under which +Contingent would be a suffocating forest of tiny classes giving useless +and verbose names to every concept in the problem domain. + +When programming Contingent in Python, however, we skipped the creation +of a dozen classes that could have existed, like \texttt{TaskArgument} +and \texttt{CachedResult} and \texttt{ConsequenceList}. We instead drew +upon Python's strong tradition of solving generic problems with generic +data structures, resulting in code that repeatedly uses a small set of +ideas from the core data structures tuple, list, set, and dict. + +But does this not cause a problem? + +Generic data structures are also, by their nature, anonymous. Our +\texttt{project.\_cache} is a set. So is every collection of upstream +and downstream nodes inside the \texttt{Graph}. Are we in danger of +seeing generic \texttt{set} error messages and not knowing whether to +look in the project or the graph implementation for the error? + +In fact, we are not in danger! + +Thanks to the careful discipline of encapsulation --- of only allowing +\texttt{Graph} code to touch the graph's sets, and \texttt{Project} code +to touch the project's set --- there will never be ambiguity if a set +operation returns an error during a later phase of the project. The name +of the innermost executing method at the moment of the error will +necessarily direct us to exactly the class, and set, involved in the +mistake. There is no need to create a subclass of \texttt{set} for every +possible application of the data type, so long as we put that +conventional underscore in front of data structure attributes and then +are careful not to touch them from code outside of the class. + +Contingent demonstrates how crucial the Facade pattern, from the epochal +\emph{Design Patterns} book, is for a well-designed Python program. Not +every data structure and fragment of data in a Python program gets to be +its own class. Instead, classes are used sparingly, at conceptual pivots +in the code where a big idea --- like the idea of a dependency graph --- +can be wrapped up into a Facade that hides the details of the simple +generic data structures that lie beneath it. + +Code outside of the Facade names the big concepts that it needs and the +operations that it wants to perform. Inside of the Facade, the +programmer manipulates the small and convenient moving parts of the +Python programming language to make the operations happen. + +\end{aosachapter} diff --git a/tex/ocr.tex b/tex/ocr.tex new file mode 100644 index 000000000..818e730ba --- /dev/null +++ b/tex/ocr.tex @@ -0,0 +1,851 @@ +\begin{aosachapter}{Optical Character Recognition (OCR)}{s:ocr}{Marina Samuel} + +\aosasecti{Introduction}\label{introduction} + +What if your computer could wash your dishes, do your laundry, cook you +dinner, and clean your home? I think I can safely say that most people +would be happy to get a helping hand! But what would it take for a +computer to be able to perform these tasks, in exactly the same way that +humans can? + +The famous computer scientist Alan Turing proposed the Turing Test as a +way to identify whether a machine could have intelligence +indistinguishable from that of a human being. The test involves a human +posing questions to two hidden entities, one human, and the other a +machine, and trying to identify which is which. If the interrogator is +unable to identify the machine, then the machine is considered to have +human-level intelligence. + +While there is a lot of controversy surrounding whether the Turing Test +is a valid assessment of intelligence, and whether we can build such +intelligent machines, there is no doubt that machines with some degree +of intelligence already exist. There is currently software that helps +robots navigate an office and perform small tasks, or help those +suffering with Alzheimer's. More common examples of Artificial +Intelligence (A.I.) are the way that Google estimates what you're +looking for when you search for some keywords, or the way that Facebook +decides what to put in your news feed. + +One well known application of A.I. is Optical Character Recognition +(OCR). An OCR system is a piece of software that can take images of +handwritten characters as input and interpret them into machine readable +text. While you may not think twice when depositing a handwritten cheque +into a bank machine that confirms the deposit value, there is some +interesting work going on in the background. This chapter will examine a +working example of a simple OCR system that recognizes numerical digits +using an Artificial Neural Network (ANN). But first, let's establish a +bit more context. + +\aosasecti{What is Artificial +Intelligence?}\label{what-is-artificial-intelligence} + +\label{sec.ocr.ai} While Turing's definition of intelligence sounds +reasonable, at the end of the day what constitutes intelligence is +fundamentally a philosophical debate. Computer scientists have, however, +categorized certain types of systems and algorithms into branches of AI. +Each branch is used to solve certain sets of problems. These branches +include the following examples, as well as +\href{http://www-formal.stanford.edu/jmc/whatisai/node2.html}{many +others}: + +\begin{aosaitemize} + +\item + Logical and probabilistic deduction and inference based on some + predefined knowledge of a world. e.g. + \href{http://www.cs.princeton.edu/courses/archive/fall07/cos436/HIDDEN/Knapp/fuzzy004.htm}{Fuzzy + inference} can help a thermostat decide when to turn on the air + conditioning when it detects that the temperature is hot and the + atmosphere is humid +\item + Heuristic search. e.g.~Searching can be used to find the best possible + next move in a game of chess by searching all possible moves and + choosing the one that most improves your position +\item + Machine learning (ML) with feedback models. e.g.~Pattern-recognition + problems like OCR. +\end{aosaitemize} + +In general, ML involves using large data sets to train a system to +identify patterns. The training data sets may be labelled, meaning the +system's expected outputs are specified for given inputs, or unlabelled +meaning expected outputs are not specified. Algorithms that train +systems with unlabelled data are called \emph{unsupervised} algorithms +and those that train with labelled data are called \emph{supervised}. +Although many ML algorithms and techniques exist for creating OCR +systems, ANNs are one simple approach. + +\aosasecti{Artificial Neural Networks}\label{artificial-neural-networks} + +\aosasectii{What Are ANNs?}\label{what-are-anns} + +\label{sec.ocr.ann} An ANN is a structure consisting of interconnected +nodes that communicate with one another. The structure and its +functionality are inspired by neural networks found in a biological +brain. +\href{http://www.nbb.cornell.edu/neurobio/linster/BioNB420/hebb.pdf}{Hebbian +Theory} explains how these networks can learn to identify patterns by +physically altering their structure and link strengths. Similarly, a +typical ANN (shown in \aosafigref{500l.ocr.ann}) has connections between +nodes that have a weight which is updated as the network learns. The +nodes labelled ``+1'' are called \emph{biases}. The leftmost blue column +of nodes are \emph{input nodes}, the middle column contains \emph{hidden +nodes}, and the rightmost column contains \emph{output nodes}. There may +be many columns of hidden nodes, known as \emph{hidden layers}. + +\aosafigure[360pt]{ocr-images/ann.png}{An Artificial Neural Network}{500l.ocr.ann} + +The values inside all of the circular nodes in \aosafigref{500l.ocr.ann} +represent the output of the node. If we call the output of the $n$th +node from the top in layer $L$ as a $n(L)$ and the connection between +the $i$th node in layer $L$ and the $j$th node in layer $L+1$ as +$w^{(L)}_ji$, then the output of node $a^{(2)}_2$ is: + +\[ +a^{(2)}_2 = f(w^{(1)}_{21}x_1 + w^{(1)}_{22}x_2 + b^{(1)}_{2}) +\] + +where $f(.)$ is known as the \emph{activation function} and $b$ is the +\emph{bias}. An activation function is the decision-maker for what type +of output a node has. A bias is an additional node with a fixed output +of 1 that may be added to an ANN to improve its accuracy. We'll see more +details on both of these in \aosasecref{sec.ocr.feedforward}. + +This type of network topology is called a feedforward neural network +because there are no cycles in the network. ANNs with nodes whose +outputs feed into their inputs are called recurrent neural networks. +There are many algorithms that can be applied to train feedforward ANNs; +one commonly used algorithm is called \emph{backpropagation}. The OCR +system we will implement in this chapter will use backpropagation. + +\aosasectii{How Do We Use ANNs?}\label{how-do-we-use-anns} + +Like most other ML approaches, the first step for using backpropagation +is to decide how to transform or reduce our problem into one that can be +solved by an ANN. In other words, how can we manipulate our input data +so we can feed it into the ANN? For the case of our OCR system, we can +use the positions of the pixels for a given digit as input. It is worth +noting that, often times, choosing the input data format is not this +simple. If we were analyzing large images to identify shapes in them, +for instance, we may need to pre-process the image to identify contours +within it. These contours would be the input. + +Once we've decided on our input data format, what's next? Since +backpropagation is a supervised algorithm, it will need to be trained +with labelled data, as mentioned in \aosasecref{sec.ocr.ai}. Thus, when +passing the pixel positions as training input, we must also pass the +associated digit. This means that we must find or gather a large data +set of drawn digits and associated values. + +The next step is to partition the data set into a training set and +validation set. The training data is used to run the backpropagation +algorithm to set the weights of the ANN. The validation data is used to +make predictions using the trained network and compute its accuracy. If +we were comparing the performance of backpropagation vs.~another +algorithm on our data, we would +\href{http://www-group.slac.stanford.edu/sluo/Lectures/stat_lecture_files/sluo2006lec7.pdf}{split +the data} into 50\% for training, 25\% for comparing performance of the +2 algorithms (validation set) and the final 25\% for testing accuracy of +the chosen algorithm (test set). Since we're not comparing algorithms, +we can group one of the 25\% sets as part of the training set and use +75\% of the data to train the network and 25\% for validating that it +was trained well. + +The purpose of identifying the accuracy of the ANN is two-fold. First, +it is to avoid the problem of \emph{overfitting}. Overfitting occurs +when the network has a much higher accuracy on predicting the training +set than the validation set. Overfitting tells us that the chosen +training data does not generalize well enough and needs to be refined. +Secondly, testing the accuracy of several different numbers of hidden +layers and hidden nodes helps in designing the most optimal ANN size. An +optimal ANN size will have enough hidden nodes and layers to make +accurate predictions but also as few nodes/connections as possible to +reduce computational overhead that may slow down training and +predictions. Once the optimal size has been decided and the network has +been trained, it's ready to make predictions! + +\aosasecti{Design Decisions in a Simple OCR +System}\label{design-decisions-in-a-simple-ocr-system} + +\label{sec.ocr.decisions} In the last few paragraphs we've gone over +some of the basics of feedforward ANNs and how to use them. Now it's +time to talk about how we can build an OCR system. + +First off, we must decide what we want our system to be able to do. To +keep things simple, let's allow users to draw a single digit and be able +to train the OCR system with that drawn digit or to request that the +system predict what the drawn digit is. While an OCR system could run +locally on a single machine, having a client-server setup gives much +more flexibility. It makes crowd-sourced training of an ANN possible and +allows powerful servers to handle intensive computations. + +Our OCR system will consist of 5 main components, divided into 5 files. +There will be: + +\begin{aosaitemize} + +\item + a client (\texttt{ocr.js}) +\item + a server (\texttt{server.py}) +\item + a simple user interface (\texttt{ocr.html}) +\item + an ANN trained via backpropagation (\texttt{ocr.py}) +\item + an ANN design script (\texttt{neural\_network\_design.py}) +\end{aosaitemize} + +The user interface will be simple: a canvas to draw digits on and +buttons to either train the ANN or request a prediction. The client will +gather the drawn digit, translate it into an array, and pass it to the +server to be processed either as a training sample or as a prediction +request. The server will simply route the training or prediction request +by making API calls to the ANN module. The ANN module will train the +network with an existing data set on its first initialization. It will +then save the ANN weights to a file and re-load them on subsequent +startups. This module is where the core of training and prediction logic +happens. Finally, the design script is for experimenting with different +hidden node counts and deciding what works best. Together, these pieces +give us a very simplistic, but functional OCR system. + +Now that we've thought about how the system will work at a high level, +it's time to put the concepts into code! + +\aosasectii{A Simple Interface +(\texttt{ocr.html})}\label{a-simple-interface-ocr.html} + +As mentioned earlier, the first step is to gather data for training the +network. We could upload a sequence of hand-written digits to the +server, but that would be awkward. Instead, we could have users actually +handwrite the digits on the page using an HTML canvas. We could then +give them a couple of options to either train or test the network, where +training the network also involves specifying what digit was drawn. This +way it is possible to easily outsource the data collection by pointing +people to a website to receive their input. Here's some HTML to get us +started. + +\begin{verbatim} + +
+ + + + +