tag:github.com,2008:https://github.com/spencermountain/dumpster-dive/releasesRelease notes from dumpster-dive2018-12-04T01:23:27Ztag:github.com,2008:Repository/26819289/5.0.02018-12-31T20:50:23Z5.0.0<h2>v5</h2>
<ul>
<li>more consistent template json, via <a href="https://github.com/spencermountain/wtf_wikipedia/blob/master/changelog.md#700">wtf_wikipedia@7</a></li>
<li>removal of empty <code>[]</code> results in <code>Section</code>.</li>
<li>fs fixes for node > 9</li>
</ul>spencermountaintag:github.com,2008:Repository/26819289/4.0.22018-10-29T21:00:54Z4.0.2<h3>v3.2.0</h3>
<ul>
<li>update to <a href="https://github.com/spencermountain/wtf_wikipedia/blob/master/changelog.md#310">wtf_wikipedia v4.2.0</a></li>
<li>support passing-in arbitrary functions to worker</li>
</ul>
<h3>3.3.0</h3>
<ul>
<li>bugfix for runtime parsing error</li>
</ul>
<h3>3.4.2</h3>
<ul>
<li>update deps, wtf library improvements</li>
<li>relicense as MIT</li>
<li>use latest mongo api</li>
</ul>
<h3>3.6.0</h3>
<ul>
<li><g-emoji class="g-emoji" alias="warning">⚠️</g-emoji> remove <code>.infoboxes</code> and <code>.citations</code> from top-level result. this is duplicate data. find them both in <code>section[i].templates</code></li>
<li>improve handling of redirect pages</li>
<li>refactor encoding logic</li>
</ul>
<h2>v4</h2>
<ul>
<li>major json format changes from <a href="https://github.com/spencermountain/wtf_wikipedia/pull/190" data-hovercard-type="pull_request" data-hovercard-url="/spencermountain/wtf_wikipedia/pull/190/hovercard">wtf_wikipedia v6.0.0</a></li>
<li>get skip_redirects actually working</li>
<li>reduce default batch_size even lower</li>
<li>add <code>verbose_skip</code> option, to log disambig/redirect skipping</li>
</ul>spencermountaintag:github.com,2008:Repository/26819289/3.1.02018-05-23T16:54:13Z3.1.0<p>some successes with getting to the end of en-wiki!</p>
<p>~11hrs</p>
<ul>
<li>fix connection time-outs & improve logging output</li>
<li>change default collection name to <code>pages</code></li>
<li>add <code>.custom()</code> function support</li>
</ul>spencermountaintag:github.com,2008:Repository/26819289/3.0.02018-04-28T03:32:00Z3.0.0<ul>
<li>MASSIVE SPEEDUP! full re-write by <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/devrim/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/devrim">@devrim</a> 🙏 to fix <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="358746759" data-permission-text="Title is private" data-url="https://github.com/spencermountain/dumpster-dive/issues/59" data-hovercard-type="issue" data-hovercard-url="/spencermountain/dumpster-dive/issues/59/hovercard" href="https://github.com/spencermountain/dumpster-dive/issues/59">#59</a> issue</li>
<li>rename from <code>wikipedia-to-mongo</code> to <code>dumpster-dive</code></li>
<li>use wtf_wikipedia v3 (a big re-factor too!)</li>
<li>use <code>line-by-line</code>, and <code>worker-nodes</code> to run parsing in parallel</li>
</ul>spencermountaintag:github.com,2008:Repository/26819289/2.0.02017-09-20T14:06:42Z2.0<ul>
<li>updates to use <code>wtf_wikipedia@2.0.0</code> - a <a href="https://github.com/spencermountain/wtf_wikipedia/blob/master/changelog.md#200">major</a> result-format change</li>
<li>renames bin cmd to <code>wiki2mongo</code></li>
<li>supports use from cli, or use via javascript <code>require()</code></li>
<li>support --plaintext flag</li>
</ul>spencermountain