tag:github.com,2008:https://github.com/spencermountain/dumpster-dive/releases Release notes from dumpster-dive 2018-12-04T01:23:27Z tag:github.com,2008:Repository/26819289/5.0.0 2018-12-31T20:50:23Z 5.0.0 <h2>v5</h2> <ul> <li>more consistent template json, via <a href="https://github.com/spencermountain/wtf_wikipedia/blob/master/changelog.md#700">wtf_wikipedia@7</a></li> <li>removal of empty <code>[]</code> results in <code>Section</code>.</li> <li>fs fixes for node &gt; 9</li> </ul> spencermountain tag:github.com,2008:Repository/26819289/4.0.2 2018-10-29T21:00:54Z 4.0.2 <h3>v3.2.0</h3> <ul> <li>update to <a href="https://github.com/spencermountain/wtf_wikipedia/blob/master/changelog.md#310">wtf_wikipedia v4.2.0</a></li> <li>support passing-in arbitrary functions to worker</li> </ul> <h3>3.3.0</h3> <ul> <li>bugfix for runtime parsing error</li> </ul> <h3>3.4.2</h3> <ul> <li>update deps, wtf library improvements</li> <li>relicense as MIT</li> <li>use latest mongo api</li> </ul> <h3>3.6.0</h3> <ul> <li><g-emoji class="g-emoji" alias="warning">⚠️</g-emoji> remove <code>.infoboxes</code> and <code>.citations</code> from top-level result. this is duplicate data. find them both in <code>section[i].templates</code></li> <li>improve handling of redirect pages</li> <li>refactor encoding logic</li> </ul> <h2>v4</h2> <ul> <li>major json format changes from <a href="https://github.com/spencermountain/wtf_wikipedia/pull/190" data-hovercard-type="pull_request" data-hovercard-url="/spencermountain/wtf_wikipedia/pull/190/hovercard">wtf_wikipedia v6.0.0</a></li> <li>get skip_redirects actually working</li> <li>reduce default batch_size even lower</li> <li>add <code>verbose_skip</code> option, to log disambig/redirect skipping</li> </ul> spencermountain tag:github.com,2008:Repository/26819289/3.1.0 2018-05-23T16:54:13Z 3.1.0 <p>some successes with getting to the end of en-wiki!</p> <p>~11hrs</p> <ul> <li>fix connection time-outs &amp; improve logging output</li> <li>change default collection name to <code>pages</code></li> <li>add <code>.custom()</code> function support</li> </ul> spencermountain tag:github.com,2008:Repository/26819289/3.0.0 2018-04-28T03:32:00Z 3.0.0 <ul> <li>MASSIVE SPEEDUP! full re-write by <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/devrim/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/devrim">@devrim</a> 🙏 to fix <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="358746759" data-permission-text="Title is private" data-url="https://github.com/spencermountain/dumpster-dive/issues/59" data-hovercard-type="issue" data-hovercard-url="/spencermountain/dumpster-dive/issues/59/hovercard" href="https://github.com/spencermountain/dumpster-dive/issues/59">#59</a> issue</li> <li>rename from <code>wikipedia-to-mongo</code> to <code>dumpster-dive</code></li> <li>use wtf_wikipedia v3 (a big re-factor too!)</li> <li>use <code>line-by-line</code>, and <code>worker-nodes</code> to run parsing in parallel</li> </ul> spencermountain tag:github.com,2008:Repository/26819289/2.0.0 2017-09-20T14:06:42Z 2.0 <ul> <li>updates to use <code>wtf_wikipedia@2.0.0</code> - a <a href="https://github.com/spencermountain/wtf_wikipedia/blob/master/changelog.md#200">major</a> result-format change</li> <li>renames bin cmd to <code>wiki2mongo</code></li> <li>supports use from cli, or use via javascript <code>require()</code></li> <li>support --plaintext flag</li> </ul> spencermountain