Skip to content

KeepEverythingWithMinKWordsExtractor not working #1

Open
@derlin

Description

@derlin

First, thanks for the port.

When trying to use KeepEverythingWithMinKWordsExtractor, I get the error:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    extractor = Extractor(extractor='KeepEverythingWithMinKWordsExtractor', url=url, kMin=20)
  File "/private/tmp/html_extract/venv/lib/python3.6/site-packages/boilerpipe/extract/__init__.py", line 62, in __init__
    "de.l3s.boilerpipe.extractors."+extractor).INSTANCE
AttributeError: type object 'de.l3s.boilerpipe.extractors.KeepEverythingWithMin' has no attribute 'INSTANCE'

The problem is that the KeepEverythingWithMinKWordsExtractor constructor takes an argument (see the java code).

To fix this, line 60 in extract/__init__.py should be replaced with:

if extractor == "KeepEverythingWithMinKWordsExtractor":
   # handle argument
    kMin = kwargs.get("kMin", 1)  # set default to 1
    self.extractor = jpype.JClass(
            "de.l3s.boilerpipe.extractors."+extractor)(kMin)
else:
    self.extractor = jpype.JClass(
        "de.l3s.boilerpipe.extractors."+extractor).INSTANCE

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions