Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numbers are inverted in MWT and sentence text #26

Open
amir-zeldes opened this issue May 24, 2021 · 0 comments
Open

Numbers are inverted in MWT and sentence text #26

amir-zeldes opened this issue May 24, 2021 · 0 comments

Comments

@amir-zeldes
Copy link

Tokens containing multiple adjacent digits are inverted (character order is reversed) in MWT text and sentence comment text throughout the corpus, for example here:

# sent_id = 5930
# text = מ5491 עד 1989 היה זה אזור אסור.
1-2	מ5491	_	_	_	_	_	_	_	_
1	מ	מ	ADP	ADP	_	2	case	_	_
2	1945	1945	NUM	NUM	_	7	nmod	_	_
3	עד	עד	ADP	ADP	_	4	case	_	_
4	1989	1989	NUM	NUM	_	7	nmod	_	_
5	היה	_	AUX	AUX	Gender=Masc|Number=Sing|Person=3|Polarity=Pos|Tense=Past|VerbType=Cop	7	cop	_	_
6	זה	זה	PRON	PRON	Gender=Masc|Number=Sing|Person=3	7	nsubj	_	_
7	אזור	אזור	NOUN	NOUN	Gender=Masc|Number=Sing	0	root	_	_
8	אסור	אסור	ADJ	ADJ	Gender=Masc|Number=Sing	7	amod	_	SpaceAfter=No
9	.	.	PUNCT	PUNCT	_	7	punct	_	_

https://github.com/UniversalDependencies/UD_Hebrew-HTB/blob/master/he_htb-ud-test.conllu#L6229-L6230

The second year number in this sentence is correct in both the tokens and the sentence text. The first year number is inverted in the MWT and sentence text, but not in the actual token. I suspect this only(?) happens if there is a MWT, but it's hard to be sure for numbers that aren't obviously year numbers without having the original underlying text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant