MWT and sentence text have incorrect final letters not found in tokens

Some MWTs which looks outwardly like complex Hebrew word forms seem to have automatically inserted final letters where they don't belong, for example:

```CoNLL-U
# sent_id = 3488
# text = יש ביוניוורסל תשובה לכל מה שיש בם-ג-מ, ואף יותר.
1	יש	יש	VERB	VERB	HebExistential=Yes	0	root	_	_
2-3	ביוניוורסל	_	_	_	_	_	_	_	_
2	ב	ב	ADP	ADP	_	3	case	_	_
3	יוניוורסל	יוניוורסל	PROPN	PROPN	_	1	obl	_	_
4	תשובה	תשובה	NOUN	NOUN	Gender=Fem|Number=Sing	1	nsubj	_	_
5-6	לכל	_	_	_	_	_	_	_	_
5	ל	ל	ADP	ADP	_	7	case	_	_
6	כל	כול	DET	DET	Definite=Cons	7	det	_	_
7	מה	מה	ADV	ADV	PronType=Int	4	nmod	_	_
8-9	שיש	_	_	_	_	_	_	_	_
8	ש	ש	SCONJ	SCONJ	_	9	mark	_	_
9	יש	יש	VERB	VERB	HebExistential=Yes	7	acl:relcl	_	_
10-11	בם	_	_	_	_	_	_	_	SpaceAfter=No
10	ב	ב	ADP	ADP	_	11	case	_	_
11	מ	מ	PROPN	PROPN	_	9	obl	_	_
12	-	-	PUNCT	PUNCT	_	13	punct	_	SpaceAfter=No
13	ג	ג	PROPN	PROPN	_	11	flat:name	_	SpaceAfter=No
14	-	-	PUNCT	PUNCT	_	15	punct	_	SpaceAfter=No
15	מ	מ	PROPN	PROPN	_	11	flat:name	_	SpaceAfter=No
16	,	,	PUNCT	PUNCT	_	17	punct	_	_
...
```


The token text in node 11 is correct, but the sentence and MWT text is wrong (this is the name of the studio MGM, not "בם")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MWT and sentence text have incorrect final letters not found in tokens #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MWT and sentence text have incorrect final letters not found in tokens #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions