boilerpipe

API

GET /extractor/method?url=URL

Parameters	Descriptions
extractor	name of the extractor to use
method	extraction method
url	the url to extract content from

Name	Descriptions
article	A full-text extractor which is tuned towards news articles. In this scenario it achieves higher accuracy than DefaultExtractor.
keepeverything	Treats everything as "content". Useful to track down SAX parsing errors.
keepeverythingwithminkwords	-
largestcontent	Like DefaultExtractor, but only keeps the largest content block. Good for non-article style texts with only one main content block.
numwordsrules	-
canola	-
default	quite generic full-text extractor, but usually not as good as ArticleExtractor.

Name	Descriptions
text	Output the extracted main content as plain text
images	-
html	Output the whole HTML document and highlight the extracted main content

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
src		src
test		test
.gitignore		.gitignore
.jscsrc		.jscsrc
.jshintignore		.jshintignore
.jshintrc		.jshintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.js		app.js
circle.yml		circle.yml
docker-compose.yml		docker-compose.yml
package.json		package.json