piskvorky
diff --git a/‎0 - Intro & Setup.ipynb
Lines changed: 60 additions & 3 deletions b/‎0 - Intro & Setup.ipynb
Lines changed: 60 additions & 3 deletions
@@ -1,7 +1,7 @@
 {
  "metadata": {
   "name": "",
-  "signature": "sha256:d47ab3bdbb8947837ed647f17839e988544b737b9660a40130ff49b8b17c1d91"
+  "signature": "sha256:faa812492cedf41f121b213c09df73e486bfb9dcde47a584b0596882dbc13c23"
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -462,6 +462,63 @@
       "A common pattern that we'll be using is **combining the efficiency of in-memory arrays** (numpy, scipy.sparse) with the **scalability of data streaming**. Instead of processing one document at a time (slow), or all documents at once (non-scalable), we'll be reading **a chunk of documents** into RAM (= as many documents as RAM allows), processing this chunk, then throwing it away and streaming a new chunk into RAM."
      ]
     },
+    {
+     "cell_type": "heading",
+     "level": 3,
+     "metadata": {},
+     "source": [
+      "Itertools"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "A [built-in Python library](https://docs.python.org/2/library/itertools.html) for efficient work data streams (iterables, iterators, generators):"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "import itertools\n",
+      "\n",
+      "infinite_stream = OddNumbers()\n",
+      "\n",
+      "# compute the first 10 items (and no more) & print them\n",
+      "print(list(itertools.islice(infinite_stream, 10)))\n",
+      "\n",
+      "# lazily concatenate streams; the result is also infinite\n",
+      "concat_stream = itertools.chain('abcde', infinite_stream)\n",
+      "print(list(itertools.islice(concat_stream, 10)))\n",
+      "\n",
+      "numbered_stream = enumerate(infinite_stream)  # also infinite\n",
+      "print(list(itertools.islice(numbered_stream, 10)))\n",
+      "\n",
+      "# etc; see the itertools docs for more examples"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "output_type": "stream",
+       "stream": "stdout",
+       "text": [
+        "[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]\n",
+        "['a', 'b', 'c', 'd', 'e', 1, 3, 5, 7, 9]\n",
+        "[(0, 1), (1, 3), (2, 5), (3, 7), (4, 9), (5, 11), (6, 13), (7, 15), (8, 17), (9, 19)]\n"
+       ]
+      }
+     ],
+     "prompt_number": 17
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "The examples above show another useful pattern: take a small sample of the stream (e.g. the first ten elements) and convert them into plain Python list, with `list(islice(stream, 10))`. To convert an entire stream into list, simply `list(stream)` (watch out for RAM here though, especially with infinite streams!). Nothing beats the simplicity of `list(stream)` for debugging purposes."
+     ]
+    },
     {
      "cell_type": "heading",
      "level": 2,
@@ -474,9 +531,9 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "At any point, you can save the notebook (any notebook) to disk by pressing `CTRL`+`s` or `CMD`+`s`. This will **save all your changes**, including cell outputs.\n",
+      "At any point, you can save the notebook (any notebook) to disk by pressing `CTRL`+`s` (or `CMD`+`s`). This will **save all changes you've made to the notebook**, including cell outputs, locally to your disk.\n",
       "\n",
-      "To discard your notebook changes, simply checkout the file again from git (or extract it again from the repository ZIP archive). This will reset the notebook to its original state, **losing all changes**."
+      "To discard your notebook changes, simply checkout the notebook file again from git (or extract it again from the repository ZIP archive). This will reset the notebook to its original state, **losing all changes changes**."
      ]
     }
    ],