Feature/conjugation preserving normalize for subword #35

t-yamamura · 2021-12-24T09:18:09Z

No description provided.

sudachitra/sudachipy_word_tokenizer.py

eiennohito · 2021-12-27T00:46:11Z

sudachitra/word_formatter.py

+        }
+        self._format = self.word_form_types[self.word_form_type]
+
+    def format(self, m: Morpheme):


At this moment the format function is not very useful, it also adds to the hotpath one more Python function call which are unfortunately not free. Returning a callable instead of making this a class would be probably better.

There still should be support of processing data in parallel, which probably will use PreTokenizer from SudachiPy, the fix can be delayed till then.

sudachitra/word_formatter.py

…morpheme to the specified word form

eiennohito

LGTM

t-yamamura added 8 commits December 24, 2021 18:15

adapt to sudachipy 0.6.2

2717939

add WordFormatter

5641680

unpickle word_formatter

1438224

specify resources by relative path

4b080ad

fix typo

cdf3f24

use WordFormTypes as typehint

b62d272

avoid built-in name

79b1a19

rename NormalizerLeavedConjugation to ConjugationPreservingNormalizer

ca00619

t-yamamura requested a review from eiennohito December 24, 2021 11:29

t-yamamura self-assigned this Dec 24, 2021

eiennohito reviewed Dec 27, 2021

View reviewed changes

t-yamamura requested a review from katsutan December 27, 2021 03:05

katsutan reviewed Dec 27, 2021

View reviewed changes

sudachitra/word_formatter.py Outdated Show resolved Hide resolved

t-yamamura added 4 commits December 27, 2021 15:03

set SplitMode when creating a sudachi dictionary

07dd4c7

remove undefined POS in the sudachi dictionary

84f3c3e

make WordFormatter a function that return a function that converts a …

2ee38d2

…morpheme to the specified word form

reformat code

91d005e

eiennohito self-requested a review December 27, 2021 07:22

eiennohito reviewed Dec 27, 2021

View reviewed changes

t-yamamura merged commit 6b8be88 into main Dec 27, 2021

t-yamamura deleted the feature/conjugation_preserving_normalize_for_subword branch January 17, 2022 04:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature/conjugation preserving normalize for subword #35

Feature/conjugation preserving normalize for subword #35

Uh oh!

t-yamamura commented Dec 24, 2021

Uh oh!

Uh oh!

eiennohito Dec 27, 2021

Uh oh!

eiennohito Dec 27, 2021

Uh oh!

Uh oh!

eiennohito left a comment

Uh oh!

Uh oh!

Uh oh!

Feature/conjugation preserving normalize for subword #35

Feature/conjugation preserving normalize for subword #35

Uh oh!

Conversation

t-yamamura commented Dec 24, 2021

Uh oh!

Uh oh!

eiennohito Dec 27, 2021

Choose a reason for hiding this comment

Uh oh!

eiennohito Dec 27, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eiennohito left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!