Skip to content

Commit

Permalink
Troubleshooting MeCab Outputs Different Versions.
Browse files Browse the repository at this point in the history
There was an issue where the output of the 32-bit MeCab and the 64-bit MeCab changed.

The issue of the link below is the same problem.
ikegami-yukino#2

I modified the code, so please check it.
  • Loading branch information
kdrkdrkdr committed Sep 27, 2022
1 parent 54f120b commit f62d3fa
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions sengiri/sengiri.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
import emoji
import MeCab

import platform

EMOJIS = set(emoji.unicode_codes.EMOJI_DATA.keys())
DELIMITERS = set({'。', '.', '…', '・・・', '...', '!', '!', '?', '?',
'!?', '?!', '!?', '?!'})
Expand All @@ -24,6 +26,16 @@ def _analyze_by_mecab(line, mecab_args, emoji_threshold):
tagger = MeCab.Tagger(mecab_args)
pairs = [l.split('\t') for l in tagger.parse(line).splitlines()[:-1]]


if platform.architecture()[0] == '64bit':
# Python 64bit + MeCab 64bit
pairs = [(i[0], i[4]) for i in pairs]

else:
# Python 32bit + MeCab 32bit
pairs = [(i[0], i[1]) for i in pairs]


result = [[]]
has_delimiter_flag = False
emoji_count = 0
Expand Down

0 comments on commit f62d3fa

Please sign in to comment.