Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about add_endings method #7

Open
qiaoyinglin19 opened this issue Aug 10, 2020 · 0 comments
Open

Confusion about add_endings method #7

qiaoyinglin19 opened this issue Aug 10, 2020 · 0 comments

Comments

@qiaoyinglin19
Copy link

qiaoyinglin19 commented Aug 10, 2020

Hi,MashaPo.

The data preprocessing of your paper Automated Word Stress Detection in Russian stated like,

''if the previous word has less than three letters, we remove its word stress and concatenate it with the current word (for example, “te oblaka ́ ” [that-Pl.Nom cloud-Pl.Nom]). If the previ- ous word has 3 or more letters, we use the last three, since Russian endings are typically 2-3 letters long and derivational morphemes are usually located on the right periphery of the word.''

To my understanding, the begining of the sentence should be concatenated with _ as mark. Besizes, every word should be concatenated with context information no matter how many context letters there are.

Thus, I took sentence 'Шла речь о том что где-то в газете напечатан хороший рецепт творожника' as input and call method of add_endings.

However, the result wasn't consistent with the paper statement in two ways.

  • Begining of the sentence was not concatenated with _

  • The index is clearly 0, but the code doesn't go into the 'elif i == 0'

image

  • The previous word which is less than 3 letters wasn't concatenated with the current word.

image

Your add_endings stated as follows.
def add_endings(wordlist): pluswords = [] for i, word in enumerate(wordlist): if not bool(re.search(REG, word)): # won't predict, just return (less then two syllables ) pluswords.append(word) elif i == 0 or wordlist[i - 1] == '_': pluswords.append('_' + word) else: context = wordlist[i - 1].replace("'", "") if len(context) < 3: ending = context else: ending = context[-3:] plusword = ending + '_' + word pluswords.append(plusword) return pluswords

It would be appreciated if you could help me with the problems aforementioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant