Open
Description
First, thanks for the port.
When trying to use KeepEverythingWithMinKWordsExtractor
, I get the error:
Traceback (most recent call last):
File "test.py", line 4, in <module>
extractor = Extractor(extractor='KeepEverythingWithMinKWordsExtractor', url=url, kMin=20)
File "/private/tmp/html_extract/venv/lib/python3.6/site-packages/boilerpipe/extract/__init__.py", line 62, in __init__
"de.l3s.boilerpipe.extractors."+extractor).INSTANCE
AttributeError: type object 'de.l3s.boilerpipe.extractors.KeepEverythingWithMin' has no attribute 'INSTANCE'
The problem is that the KeepEverythingWithMinKWordsExtractor
constructor takes an argument (see the java code).
To fix this, line 60 in extract/__init__.py
should be replaced with:
if extractor == "KeepEverythingWithMinKWordsExtractor":
# handle argument
kMin = kwargs.get("kMin", 1) # set default to 1
self.extractor = jpype.JClass(
"de.l3s.boilerpipe.extractors."+extractor)(kMin)
else:
self.extractor = jpype.JClass(
"de.l3s.boilerpipe.extractors."+extractor).INSTANCE
Metadata
Metadata
Assignees
Labels
No labels