Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added missing subtoken information #22

Open
wants to merge 43 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
6f70cd4
Added missing subtoken information
amir-zeldes Jan 29, 2018
3d05af0
update to UD V2.8
amir-zeldes Jun 20, 2021
3ae6f86
Initial revisions
amir-zeldes Jun 21, 2021
fc86f18
markdown
amir-zeldes Jun 21, 2021
2350a84
remove redundant deprel subtypes
amir-zeldes Jun 22, 2021
0121d89
document added nsubj:pass, csubj:pass
amir-zeldes Jun 22, 2021
4b80da6
Unify demonstrative lemmas
amir-zeldes Jun 22, 2021
b49731e
pronominal copulas are PRON, have no verbal morphology
amir-zeldes Jun 22, 2021
dbda960
change CCONJ to ADV, SCONJ or ADP depending on deprel
amir-zeldes Jun 22, 2021
3b0311f
Demonstrative lemma adjustment
amir-zeldes Jun 22, 2021
382e292
SCONJ כדי
amir-zeldes Jun 23, 2021
5f3e62e
number lemmas
amir-zeldes Jun 23, 2021
5b2f4bb
fix corrupt text
amir-zeldes Jun 23, 2021
363a34a
attributive number articles are det
amir-zeldes Jun 23, 2021
d411bb3
fixed expression כל אימת ש
amir-zeldes Jun 23, 2021
2cdfed4
totally mangled sentence
amir-zeldes Jun 23, 2021
9e63046
major overhaul of number tokens
amir-zeldes Jun 23, 2021
56885fc
constructions with הרבה
amir-zeldes Jun 23, 2021
f63e8b9
more corrupt tokens
amir-zeldes Jun 23, 2021
cffeb0b
even more corrupt tokens
amir-zeldes Jun 23, 2021
a9e09c4
finite zero acl is also acl:relcl
amir-zeldes Jun 23, 2021
a4afed8
Put definiteness feature on clitic possessor, not on possessed NOUN
amir-zeldes Jun 23, 2021
8dd2038
add PronType=Art to fused ADP articles
amir-zeldes Jun 24, 2021
f782910
PROPN fixes
amir-zeldes Jun 24, 2021
dba89b0
More PROPN
amir-zeldes Jun 24, 2021
a0149e8
tense for participle VERB
amir-zeldes Jun 24, 2021
10b9b9e
manual correction
amir-zeldes Jun 24, 2021
17a149f
fix lots of broken year numbers
amir-zeldes Jun 26, 2021
906b6da
more broken clitics
amir-zeldes Jun 26, 2021
1ca7d13
add nmod:tmod and obl:tmod
amir-zeldes Jun 26, 2021
c45231e
fix all remaining MWTs with subtokens not matching text
amir-zeldes Jun 28, 2021
f3e7aeb
completely valid at UD validator level 3
amir-zeldes Jun 28, 2021
ffa1899
README
amir-zeldes Jun 28, 2021
39f7d6c
set compound:affix POS based on parent
amir-zeldes Jul 2, 2021
5a442cf
remove HebExistential feat
amir-zeldes Jul 8, 2021
14741d7
auto convert impersonal modals to head+csubj
amir-zeldes Jul 9, 2021
7d211fd
error correction
amir-zeldes Jul 12, 2021
22fe4f6
remove Case=Tem
amir-zeldes Jul 22, 2021
5c2645f
Remove Poss=Yes from pronouns with של
amir-zeldes Jul 26, 2021
a2234c1
extensive lemma corrections
amir-zeldes Jul 28, 2021
caa8f5f
remove Person=3 from כך
amir-zeldes Aug 3, 2021
60a177f
Revise Number and some lemmas for numerals
amir-zeldes Aug 11, 2021
e69eb10
Merge pull request #6 from IAHLT/dev
amir-zeldes Aug 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Added missing subtoken information
  * Some compound tokens, such as אח in אחיו were represented by `__`
  * Added word form, lemma, and gender information where relevant
  • Loading branch information
amir-zeldes committed Jan 29, 2018
commit 6f70cd41b5bd36bb939c73155a773f6e12a902c7
4 changes: 2 additions & 2 deletions he-ud-dev.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -14688,7 +14688,7 @@
4 כהנא כהנא PROPN PROPN _ 3 flat:name _ SpaceAfter=No
5 , , PUNCT PUNCT _ 2 punct _ _
6-8 אחיו _ _ _ _ _ _ _ _
6 __ _ NOUN NOUN Definite=Def|Number=Sing 2 appos _ _
6 אח_ אח NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 2 appos _ _
7 _של_ של ADP ADP _ 8 case:gen _ _
8 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 6 nmod:poss _ _
9 של של PART PART Case=Gen 10 case:gen _ _
Expand Down Expand Up @@ -14827,7 +14827,7 @@
3 עת עת NOUN NOUN Gender=Fem|Number=Sing 0 root _ _
4 להרוג הרג VERB VERB HebBinyan=PAAL|VerbForm=Inf|Voice=Act 3 advmod:inf _ _
5-7 רבותי _ _ _ _ _ _ _ SpaceAfter=No
5 __ _ NOUN NOUN Definite=Def|Gender=Fem|Number=Plur 3 nmod _ _
5 רבות_ רב NOUN NOUN Definite=Def|Gender=Masc|Number=Plur 3 nmod _ _
6 _של_ של ADP ADP _ 7 case:gen _ _
7 _אני הוא PRON PRON Case=Gen|Gender=Fem,Masc|Number=Sing|Person=1|PronType=Prs 5 nmod:poss _ _
8 . . PUNCT PUNCT _ 3 punct _ _
Expand Down
2 changes: 1 addition & 1 deletion he-ud-test.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -6763,7 +6763,7 @@
32 מאשר אישר VERB VERB Gender=Masc|HebBinyan=PIEL|Number=Sing|Person=1,2,3|VerbForm=Part|Voice=Act 30 acl:relcl _ _
33 את את PART PART Case=Acc 34 case:acc _ _
34-36 שובם _ _ _ _ _ _ _ _
34 __ _ NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 32 obj _ _
34 שוב_ שוב NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 32 obj _ _
35 _של_ של ADP ADP _ 36 case:gen _ _
36 _הם הוא PRON PRON Case=Gen|Gender=Masc|Number=Plur|Person=3|PronType=Prs 34 nmod:poss _ _
37 של של PART PART Case=Gen 39 case:gen _ _
Expand Down
32 changes: 16 additions & 16 deletions he-ud-train.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -4290,7 +4290,7 @@
24-28 ש"נטייתו _ _ _ _ _ _ _ _
24 ש _ SCONJ SCONJ _ 31 mark _ _
25 " " PUNCT PUNCT _ 31 punct _ _
26 __ _ NOUN NOUN Definite=Def|Gender=Fem|Number=Sing 31 nsubj:cop _ _
26 נטייה_ נטייה NOUN NOUN Definite=Def|Gender=Fem|Number=Sing 31 nsubj:cop _ _
27 _של_ של ADP ADP _ 28 case:gen _ _
28 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 26 nmod:poss _ _
29-30 הטבעית _ _ _ _ _ _ _ _
Expand Down Expand Up @@ -5659,7 +5659,7 @@
2 שיננו שינן VERB VERB Gender=Fem,Masc|HebBinyan=PIEL|Number=Plur|Person=3|Tense=Past|Voice=Act 0 root _ _
3-6 באזניהם _ _ _ _ _ _ _ _
3 ב ב ADP ADP _ 4 case _ _
4 __ _ NOUN NOUN Definite=Def|Gender=Fem|Number=Plur 2 obl _ _
4 אוזן_ אוזן NOUN NOUN Definite=Def|Gender=Fem|Number=Plur 2 obl _ _
5 _של_ של ADP ADP _ 6 case:gen _ _
6 _הם הוא PRON PRON Case=Gen|Gender=Masc|Number=Plur|Person=3|PronType=Prs 4 nmod:poss _ _
7-8 שהיא _ _ _ _ _ _ _ _
Expand Down Expand Up @@ -7943,7 +7943,7 @@
49 דואגים דאג VERB VERB Gender=Masc|HebBinyan=PAAL|Number=Plur|Person=1,2,3|VerbForm=Part|Voice=Act 45 acl:relcl _ _
50-53 לאחיהם _ _ _ _ _ _ _ SpaceAfter=No
50 ל ל ADP ADP _ 51 case _ _
51 __ _ NOUN NOUN Definite=Def|Number=Plur 49 iobj _ _
51 אח_ אח NOUN NOUN Definite=Def|Gender=Masc|Number=Plur 49 iobj _ _
52 _של_ של ADP ADP _ 53 case:gen _ _
53 _הם הוא PRON PRON Case=Gen|Gender=Masc|Number=Plur|Person=3|PronType=Prs 51 nmod:poss _ _
54 . . PUNCT PUNCT _ 4 punct _ _
Expand Down Expand Up @@ -9844,7 +9844,7 @@
# sent_id = 753
# text = אחיו תיאר אותו כגדול הכהניסטים במשפחתו.
1-3 אחיו _ _ _ _ _ _ _ _
1 __ _ NOUN NOUN Definite=Def|Number=Sing 4 nsubj _ _
1 אח_ אח NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 4 nsubj _ _
2 _של_ של ADP ADP _ 3 case:gen _ _
3 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 1 nmod:poss _ _
4 תיאר תיאר VERB VERB Gender=Masc|HebBinyan=PIEL|Number=Sing|Person=3|Tense=Past|Voice=Act 0 root _ _
Expand Down Expand Up @@ -13124,7 +13124,7 @@
14 הכתיבה הכתיב VERB VERB Gender=Fem|HebBinyan=HIFIL|Number=Sing|Person=3|Tense=Past|Voice=Act 0 root _ _
15 את את PART PART Case=Acc 16 case:acc _ _
16-18 רשמיו _ _ _ _ _ _ _ SpaceAfter=No
16 __ _ NOUN NOUN Definite=Def|Gender=Masc|Number=Plur 14 obj _ _
16 רושם_ רושם NOUN NOUN Definite=Def|Gender=Masc|Number=Plur 14 obj _ _
17 _של_ של ADP ADP _ 18 case:gen _ _
18 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 16 nmod:poss _ _
19 . . PUNCT PUNCT _ 14 punct _ _
Expand Down Expand Up @@ -17710,7 +17710,7 @@
29 דווקא דווקא ADV ADV _ 30 advmod _ _
30 באמצעות באמצעות ADP ADP _ 31 case _ _
31-33 אחיו _ _ _ _ _ _ _ SpaceAfter=No
31 __ _ NOUN NOUN Definite=Def|Number=Sing 27 parataxis _ _
31 אח_ אח NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 27 parataxis _ _
32 _של_ של ADP ADP _ 33 case:gen _ _
33 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 31 nmod:poss _ _
34 , , PUNCT PUNCT _ 31 punct _ _
Expand Down Expand Up @@ -17940,7 +17940,7 @@
7 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 5 nmod:poss _ _
8-11 וסיסמאותיו _ _ _ _ _ _ _ _
8 ו _ CCONJ CCONJ _ 9 cc _ _
9 __ _ NOUN NOUN Definite=Def|Gender=Fem|Number=Plur 1 conj _ _
9 סיסמא_ סיסמא NOUN NOUN Definite=Def|Gender=Fem|Number=Plur 1 conj _ _
10 _של_ של ADP ADP _ 11 case:gen _ _
11 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 9 nmod:poss _ _
12 זכו זכה VERB VERB Gender=Fem,Masc|HebBinyan=PAAL|Number=Plur|Person=3|Tense=Past|Voice=Act 0 root _ _
Expand Down Expand Up @@ -38286,7 +38286,7 @@
41 נישא נישא VERB VERB Gender=Masc|HebBinyan=NIFAL|Number=Sing|Person=1,2,3|VerbForm=Part|Voice=Mid 37 acl:relcl _ _
42-45 בכליו _ _ _ _ _ _ _ _
42 ב ב ADP ADP _ 43 case _ _
43 __ _ NOUN NOUN Definite=Def|Gender=Fem|Number=Plur 41 obl _ _
43 כלי_ כלי NOUN NOUN Definite=Def|Gender=Masc|Number=Plur 41 obl _ _
44 _של_ של ADP ADP _ 45 case:gen _ _
45 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 43 nmod:poss _ _
46 שם שם VERB VERB Gender=Masc|HebBinyan=PAAL|Number=Sing|Person=1,2,3|VerbForm=Part|Voice=Act 32 acl:relcl _ _
Expand Down Expand Up @@ -90651,7 +90651,7 @@
37-41 ש"לבו _ _ _ _ _ _ _ _
37 ש _ SCONJ SCONJ _ 44 mark _ _
38 " " PUNCT PUNCT _ 44 punct _ _
39 __ _ NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 44 nsubj _ _
39 לב_ לב NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 44 nsubj _ _
40 _של_ של ADP ADP _ 41 case:gen _ _
41 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 39 nmod:poss _ _
42 לא לא ADV ADV Polarity=Neg 44 advmod _ _
Expand Down Expand Up @@ -92143,7 +92143,7 @@
6-10 ל"זרועו _ _ _ _ _ _ _ _
6 ל ל ADP ADP _ 8 case _ _
7 " " PUNCT PUNCT _ 8 punct _ _
8 __ _ NOUN NOUN Definite=Def|Gender=Fem|Number=Sing 5 obl _ _
8 זרוע_ זרוע NOUN NOUN Definite=Def|Gender=Fem|Number=Sing 5 obl _ _
9 _של_ של ADP ADP _ 10 case:gen _ _
10 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 8 nmod:poss _ _
11-12 המוארכת _ _ _ _ _ _ _ _
Expand Down Expand Up @@ -96046,7 +96046,7 @@
4 מכין הכין VERB VERB Gender=Masc|HebBinyan=HIFIL|Number=Sing|Person=1,2,3|VerbForm=Part|Voice=Act 0 root _ _
5-8 למענו _ _ _ _ _ _ _ _
5 ל ל ADP ADP _ 6 case _ _
6 __ _ NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 4 obl _ _
6 מען_ מען NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 4 obl _ _
7 _של_ של ADP ADP _ 8 case:gen _ _
8 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 6 nmod:poss _ _
9 נושאים נושא NOUN NOUN Gender=Masc|Number=Plur 4 obj _ _
Expand Down Expand Up @@ -96504,7 +96504,7 @@
13 צדים צד VERB VERB Gender=Masc|HebBinyan=PAAL|Number=Plur|Person=1,2,3|VerbForm=Part|Voice=Act 11 acl:relcl _ _
14-17 למענו _ _ _ _ _ _ _ _
14 ל ל ADP ADP _ 15 case _ _
15 __ _ NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 13 obl _ _
15 מען_ מען NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 13 obl _ _
16 _של_ של ADP ADP _ 17 case:gen _ _
17 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 15 nmod:poss _ _
18 זאבי זאב NOUN NOUN Definite=Cons|Gender=Masc|Number=Plur 13 nsubj _ _
Expand Down Expand Up @@ -121704,7 +121704,7 @@
2 נרצחה נרצח VERB VERB Gender=Fem|HebBinyan=NIFAL|Number=Sing|Person=3|Tense=Past|Voice=Mid 0 root _ _
3 בידי בידי ADP ADP _ 4 case _ _
4-6 אחיה _ _ _ _ _ _ _ SpaceAfter=No
4 __ _ NOUN NOUN Definite=Def|Number=Sing 2 obl _ _
4 אח_ אח NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 2 obl _ _
5 _של_ של ADP ADP _ 6 case:gen _ _
6 _היא הוא PRON PRON Case=Gen|Gender=Fem|Number=Sing|Person=3|PronType=Prs 4 nmod:poss _ _
7 . . PUNCT PUNCT _ 2 punct _ _
Expand Down Expand Up @@ -127154,7 +127154,7 @@
11 אורטגה אורטגה PROPN PROPN _ 10 flat:name _ _
12 ( ( PUNCT PUNCT _ 13 punct _ SpaceAfter=No
13-15 אחיו _ _ _ _ _ _ _ _
13 __ _ NOUN NOUN Definite=Def|Number=Sing 10 appos _ _
13 אח_ אח NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 10 appos _ _
14 _של_ של ADP ADP _ 15 case:gen _ _
15 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 13 nmod:poss _ _
16 של של PART PART Case=Gen 17 case:gen _ _
Expand Down Expand Up @@ -167132,7 +167132,7 @@
14 ש _ SCONJ SCONJ _ 22 mark _ _
15 " " PUNCT PUNCT _ 22 punct _ _
16 ב ב ADP ADP _ 17 case _ _
17 __ _ NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 22 advmod _ _
17 סוף_ סוף NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 22 advmod _ _
18 _של_ של ADP ADP _ 19 case:gen _ _
19 _הוא הוא PRON PRON Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs 17 nmod:poss _ _
20 של של PART PART Case=Gen 17 dep _ _
Expand Down Expand Up @@ -169638,7 +169638,7 @@
13 נאשמים נאשם NOUN NOUN Gender=Masc|Number=Plur 7 nmod _ _
14-17 לסניגוריהם _ _ _ _ _ _ _ _
14 ל ל ADP ADP _ 15 case _ _
15 __ _ NOUN NOUN Definite=Def|Gender=Masc|Number=Plur 7 nmod _ _
15 סניגור_ סניגור NOUN NOUN Definite=Def|Gender=Masc|Number=Plur 7 nmod _ _
16 _של_ של ADP ADP _ 17 case:gen _ _
17 _הם הוא PRON PRON Case=Gen|Gender=Masc|Number=Plur|Person=3|PronType=Prs 15 nmod:poss _ _
18-19 וגובשה _ _ _ _ _ _ _ _
Expand Down