Skip to content

Commit 81dc23b

Browse files
committed
polish part 1; first draft of part 2
1 parent 988a8e7 commit 81dc23b

File tree

3 files changed

+2794
-225
lines changed

3 files changed

+2794
-225
lines changed

0 - Intro & Setup.ipynb

Lines changed: 60 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"metadata": {
33
"name": "",
4-
"signature": "sha256:d47ab3bdbb8947837ed647f17839e988544b737b9660a40130ff49b8b17c1d91"
4+
"signature": "sha256:faa812492cedf41f121b213c09df73e486bfb9dcde47a584b0596882dbc13c23"
55
},
66
"nbformat": 3,
77
"nbformat_minor": 0,
@@ -462,6 +462,63 @@
462462
"A common pattern that we'll be using is **combining the efficiency of in-memory arrays** (numpy, scipy.sparse) with the **scalability of data streaming**. Instead of processing one document at a time (slow), or all documents at once (non-scalable), we'll be reading **a chunk of documents** into RAM (= as many documents as RAM allows), processing this chunk, then throwing it away and streaming a new chunk into RAM."
463463
]
464464
},
465+
{
466+
"cell_type": "heading",
467+
"level": 3,
468+
"metadata": {},
469+
"source": [
470+
"Itertools"
471+
]
472+
},
473+
{
474+
"cell_type": "markdown",
475+
"metadata": {},
476+
"source": [
477+
"A [built-in Python library](https://docs.python.org/2/library/itertools.html) for efficient work data streams (iterables, iterators, generators):"
478+
]
479+
},
480+
{
481+
"cell_type": "code",
482+
"collapsed": false,
483+
"input": [
484+
"import itertools\n",
485+
"\n",
486+
"infinite_stream = OddNumbers()\n",
487+
"\n",
488+
"# compute the first 10 items (and no more) & print them\n",
489+
"print(list(itertools.islice(infinite_stream, 10)))\n",
490+
"\n",
491+
"# lazily concatenate streams; the result is also infinite\n",
492+
"concat_stream = itertools.chain('abcde', infinite_stream)\n",
493+
"print(list(itertools.islice(concat_stream, 10)))\n",
494+
"\n",
495+
"numbered_stream = enumerate(infinite_stream) # also infinite\n",
496+
"print(list(itertools.islice(numbered_stream, 10)))\n",
497+
"\n",
498+
"# etc; see the itertools docs for more examples"
499+
],
500+
"language": "python",
501+
"metadata": {},
502+
"outputs": [
503+
{
504+
"output_type": "stream",
505+
"stream": "stdout",
506+
"text": [
507+
"[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]\n",
508+
"['a', 'b', 'c', 'd', 'e', 1, 3, 5, 7, 9]\n",
509+
"[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9), (5, 11), (6, 13), (7, 15), (8, 17), (9, 19)]\n"
510+
]
511+
}
512+
],
513+
"prompt_number": 17
514+
},
515+
{
516+
"cell_type": "markdown",
517+
"metadata": {},
518+
"source": [
519+
"The examples above show another useful pattern: take a small sample of the stream (e.g. the first ten elements) and convert them into plain Python list, with `list(islice(stream, 10))`. To convert an entire stream into list, simply `list(stream)` (watch out for RAM here though, especially with infinite streams!). Nothing beats the simplicity of `list(stream)` for debugging purposes."
520+
]
521+
},
465522
{
466523
"cell_type": "heading",
467524
"level": 2,
@@ -474,9 +531,9 @@
474531
"cell_type": "markdown",
475532
"metadata": {},
476533
"source": [
477-
"At any point, you can save the notebook (any notebook) to disk by pressing `CTRL`+`s` or `CMD`+`s`. This will **save all your changes**, including cell outputs.\n",
534+
"At any point, you can save the notebook (any notebook) to disk by pressing `CTRL`+`s` (or `CMD`+`s`). This will **save all changes you've made to the notebook**, including cell outputs, locally to your disk.\n",
478535
"\n",
479-
"To discard your notebook changes, simply checkout the file again from git (or extract it again from the repository ZIP archive). This will reset the notebook to its original state, **losing all changes**."
536+
"To discard your notebook changes, simply checkout the notebook file again from git (or extract it again from the repository ZIP archive). This will reset the notebook to its original state, **losing all changes changes**."
480537
]
481538
}
482539
],

0 commit comments

Comments
 (0)