Skip to content

Commit

Permalink
Fix TypeError when using the -m flag (#2734)
Browse files Browse the repository at this point in the history
Currently, if you attempt to use the script with the --min-article-character you get an error because it gets parsed a string and the functions expect an int. This fix addresses the issue.

```
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/gensim/scripts/segment_wiki.py", line 385, in <module>
    include_interlinks=args.include_interlinks
  File "/usr/local/lib/python3.6/dist-packages/gensim/scripts/segment_wiki.py", line 141, in segment_and_write_all_articles
    for idx, article in enumerate(article_stream):
  File "/usr/local/lib/python3.6/dist-packages/gensim/scripts/segment_wiki.py", line 100, in segment_all_articles
    for article in wiki_sections_text:
  File "/usr/local/lib/python3.6/dist-packages/gensim/scripts/segment_wiki.py", line 332, in get_texts_with_sections
    if sum(len(body.strip()) for (_, body) in sections) < self.min_article_character:
TypeError: '<' not supported between instances of 'int' and 'str'```
  • Loading branch information
Tenoke authored Jan 30, 2020
1 parent 9352dad commit 8d79794
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions gensim/scripts/segment_wiki.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,7 @@ def get_texts_with_sections(self):
parser.add_argument(
'-m', '--min-article-character',
help="Ignore articles with fewer characters than this (article stubs). Default: %(default)s.",
type=int,
default=200
)
parser.add_argument(
Expand Down

0 comments on commit 8d79794

Please sign in to comment.