- make
sections
into an ordered array, instead of an es6 Map thing. - add 'depth' too
- move possibly-repeatable data into the
sections
object, list 'lists' and 'tables' - change library export name to
wtf
- turn
infobox
into 'infoboxes' array - moved 'infobox_template' to infobox.type
- change initial depth to 0
- change 'translations' property to 'interwiki'
- support {{main}} and {{wide image|}} templates
- support table '! row' row heading syntax, and other forms
- support for {{coords}} geo-coordinate parsing+conversion
- early-support for custom template-parsing
- co-ordinate parsing fix
- support longer ref tags
- smarter disambiguation for interwiki links vs pages containing ':'
- more support for various list syntaxes
- support for markdown output
- support for html output
- add page 'title' to response, where possible.
- better support for capturing the
[[link]]'s
syntax - opt-out of citation, infobox, image ... parsing
- support a whack of date/time/age templates
- better html output tables/infoboxes
- BIG API RE-WRITE!
- move
.parse()
to mainwtf()
method - allow repeated processes without a pre-parse of the document
- wtf.fetch() uses promises, and native
fetch()
method (when available) - allow per-section images, lists, tables + templates
- section depth values now start at 0
- infobox values now return sentence objects
- latex output (thanks @niebert!)
- refactor shell scripts to
wtf_wikipedia Toronto --plaintext
- use babel-preset-env cause it's new-new
- update deps
- improved .json() results
- guess a page's title based on bold formatting in first sentence
- make section.title a function
- 🚨 non-api changing, but large result-format change
- add
.wikitext()
method to Document, Section, Sentence (thanks @niebert) - move infobox, citation parser/data to Section class
.templates()
are now an ordered array, instead of an object, and include infoboxes and citations- add (early) support for 'generic' key-value template parsing
- normalize/lowercase template/infobox properties - add loose
.get('key')
method to Infobox class - mess-around with citation-template formatting
- beginning to support unknown template forms
- move
date
data from Sentence to Section object. - rollback of awkward+undocumented
options
param in parser (but keep options param for output methods) - add support for about a hundred new templates
- templates, including citations, try to be flat-text, and no-longer return Sentence objects
- remove repeated/redundant text in
.links()
results - don't automatically titlecase link srcs anymore
- return a result or undefined for
sentences.bolds(0)
, and the like
- support dollar templates
- support
section(0).wikitext()
- support inline {{marriage}} template
- dangling semi-colons in first-sentence parentheses
<gallery>
tag support in.images()
- support pageids again in .fetch()
- better disambiguation-page detection in english
- remove wikitext from caption titles
- support 3-level templates (whew!)
- new
Table
class andList
classes - improved table-parser - generate name
col1
instead ofcol-0
- support
options.verbose_template
for debugging - support recursive tables
- improved support for gallery tag
- more support for wiktionary grammar templates
- tweak some regexes
- make
.json()
results return proper json for tables
- add infobox html back into html output (tentative)
- redirect support in .json(), .html() output
- remove empty
[]
properties in .json() results (saves disk space!) - keep
#
anchor data in .links() - show links default-on in latex output, like in md and html
- render html/latex/json 'soft redirect', instead of blank pages
- support
.paragraphs()
⚠️ major changes to output of.json()
. cleaning-up redundant data.:warning:-
- remove top-level
templates
data (found insection
) - resume it with{templates:true}
- remove top-level
-
- remove top-level
coordinates
data (found intemplates
) - resume it with{coordinates:true}
- remove top-level
-
- remove top-level
citations
data (found insection
) - resume it with{citations:true}
- remove top-level
- return empty arrays in
.json()
again ¯_(:/)_ /¯ - remove
h1
title on html output - change ambiguous
options.title
for sections tooptions.headers
- support lists of 1
- begin removing empty references section by default
- begin support for rendering citations at the bottom of documents
- begin first-class references-parsing as objects at paragraph-level
- use this:
.citations()
-->.citations().map(c => c.json());
- use this:
- remove
.wikitext()
and.reparse()
methods - keeping wikitext stateful caused too many issues - turn
Image.file
into a function - include
interwiki()
results in.links()
- support
follow_redirects
option to fetch - hide object data in console.logs
- move ALL image urls from
upload.wikimedia.org/wikipedia/commons
towikipedia.org/wiki/Special:Redirect/file/
via 86 - image captions are now Sentence objects
- rename citation → reference internally, and in json output
- remove references inside section titles
- titlecase internal link destinations #192
- support categories in redirects
- add mongo-encoding from dumpster-dive
- support way (+20%?) more templates.
- change result-format in a lot of templates, for more consistency.
-
- notably: reference format, see also, IPA, main
- support colspan/rowspan in tables (a little!)
- support implicit first-row headers for some tables
- return templates even if they have no data
- begin support for some well-used
{{foo start}}...{{foo end}}
templates - remove empty
[]
for some more section properties in.json()
response
- some template fixes
- add a 'number' field in sentence json, when it looks like a number
- slight change in coordinate result format, support inline coordinate text
- handle fetching a large list of titles in sequence
- support population, weatherbox templates
- improved date templates, bugfixes
- few more sports templates,
- rowspan parsing fix
- no-longer include package.json in builds
- use full template-parser for image captions
- support manually setting doc.title()
- lowercase/normalize table headers
- date templates response format
- .keyValue() should return page title if exists, instead of text
- return country name for
{{BAN}}
etc templates