Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added missing subtoken information #22

Open
wants to merge 43 commits into
base: dev
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
6f70cd4
Added missing subtoken information
amir-zeldes Jan 29, 2018
3d05af0
update to UD V2.8
amir-zeldes Jun 20, 2021
3ae6f86
Initial revisions
amir-zeldes Jun 21, 2021
fc86f18
markdown
amir-zeldes Jun 21, 2021
2350a84
remove redundant deprel subtypes
amir-zeldes Jun 22, 2021
0121d89
document added nsubj:pass, csubj:pass
amir-zeldes Jun 22, 2021
4b80da6
Unify demonstrative lemmas
amir-zeldes Jun 22, 2021
b49731e
pronominal copulas are PRON, have no verbal morphology
amir-zeldes Jun 22, 2021
dbda960
change CCONJ to ADV, SCONJ or ADP depending on deprel
amir-zeldes Jun 22, 2021
3b0311f
Demonstrative lemma adjustment
amir-zeldes Jun 22, 2021
382e292
SCONJ כדי
amir-zeldes Jun 23, 2021
5f3e62e
number lemmas
amir-zeldes Jun 23, 2021
5b2f4bb
fix corrupt text
amir-zeldes Jun 23, 2021
363a34a
attributive number articles are det
amir-zeldes Jun 23, 2021
d411bb3
fixed expression כל אימת ש
amir-zeldes Jun 23, 2021
2cdfed4
totally mangled sentence
amir-zeldes Jun 23, 2021
9e63046
major overhaul of number tokens
amir-zeldes Jun 23, 2021
56885fc
constructions with הרבה
amir-zeldes Jun 23, 2021
f63e8b9
more corrupt tokens
amir-zeldes Jun 23, 2021
cffeb0b
even more corrupt tokens
amir-zeldes Jun 23, 2021
a9e09c4
finite zero acl is also acl:relcl
amir-zeldes Jun 23, 2021
a4afed8
Put definiteness feature on clitic possessor, not on possessed NOUN
amir-zeldes Jun 23, 2021
8dd2038
add PronType=Art to fused ADP articles
amir-zeldes Jun 24, 2021
f782910
PROPN fixes
amir-zeldes Jun 24, 2021
dba89b0
More PROPN
amir-zeldes Jun 24, 2021
a0149e8
tense for participle VERB
amir-zeldes Jun 24, 2021
10b9b9e
manual correction
amir-zeldes Jun 24, 2021
17a149f
fix lots of broken year numbers
amir-zeldes Jun 26, 2021
906b6da
more broken clitics
amir-zeldes Jun 26, 2021
1ca7d13
add nmod:tmod and obl:tmod
amir-zeldes Jun 26, 2021
c45231e
fix all remaining MWTs with subtokens not matching text
amir-zeldes Jun 28, 2021
f3e7aeb
completely valid at UD validator level 3
amir-zeldes Jun 28, 2021
ffa1899
README
amir-zeldes Jun 28, 2021
39f7d6c
set compound:affix POS based on parent
amir-zeldes Jul 2, 2021
5a442cf
remove HebExistential feat
amir-zeldes Jul 8, 2021
14741d7
auto convert impersonal modals to head+csubj
amir-zeldes Jul 9, 2021
7d211fd
error correction
amir-zeldes Jul 12, 2021
22fe4f6
remove Case=Tem
amir-zeldes Jul 22, 2021
5c2645f
Remove Poss=Yes from pronouns with של
amir-zeldes Jul 26, 2021
a2234c1
extensive lemma corrections
amir-zeldes Jul 28, 2021
caa8f5f
remove Person=3 from כך
amir-zeldes Aug 3, 2021
60a177f
Revise Number and some lemmas for numerals
amir-zeldes Aug 11, 2021
e69eb10
Merge pull request #6 from IAHLT/dev
amir-zeldes Aug 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
constructions with הרבה
  • Loading branch information
amir-zeldes committed Jun 23, 2021
commit 56885fc779f2f200e1f9dfb9b1bae88aee9a6666
34 changes: 17 additions & 17 deletions he_htb-ud-train.conllu
Original file line number Diff line number Diff line change
Expand Up @@ -62453,19 +62453,19 @@
2 זאת זה PRON PRON Gender=Fem|Number=Sing|Person=3|PronType=Dem 4 obl _ SpaceAfter=No
3 , , PUNCT PUNCT _ 2 punct _ _
4 יש יש VERB VERB HebExistential=Yes 0 root _ _
5 הרבה הרבה DET DET Definite=Cons 7 dep _ _
6 מאוד מאוד ADV ADV _ 7 advmod _ _
5 הרבה הרבה DET DET _ 7 det _ _
6 מאוד מאוד ADV ADV _ 5 advmod _ _
7 מוצרים מוצר NOUN NOUN Gender=Masc|Number=Plur 4 nsubj _ _
8-10 שמחירם _ _ _ _ _ _ _ _
8 ש ש SCONJ SCONJ _ 11 mark _ _
9 מחיר מחיר NOUN NOUN Definite=Def|Gender=Masc|Number=Sing 11 nsubj _ _
10 ם הוא PRON PRON Case=Gen|Gender=Masc|Number=Plur|Person=3|Poss=Yes|PronType=Prs 9 nmod:poss _ _
11 עלה עלה VERB VERB Gender=Masc|HebBinyan=PAAL|Number=Sing|Person=3|Tense=Past|Voice=Act 7 acl:relcl _ _
12 פחות פחות ADV ADV _ 15 advmod _ HebSource=ConvUncertainHead
12 פחות פחות ADV ADV _ 11 advmod _ _
13-15 מהממוצע _ _ _ _ _ _ _ SpaceAfter=No
13 מ מ ADP ADP _ 12 fixed _ _
13 מ מ ADP ADP _ 15 case _ _
14 ה ה DET DET Definite=Def|PronType=Art 15 det _ _
15 ממוצע ממוצע NOUN NOUN Gender=Masc|Number=Sing 11 dep _ _
15 ממוצע ממוצע NOUN NOUN Gender=Masc|Number=Sing 12 obl _ _
16 . . PUNCT PUNCT _ 4 punct _ _

# sent_id = 2183
Expand Down Expand Up @@ -169574,27 +169574,27 @@
# text = בבנק ישראל אמרו שאפשר לקיים גירעון הרבה יותר גדול בתקופה זו של עלייה מאסיווית, כאשר אין תעסוקה מלאה במשק, ואילו באוצר התנגדו להגדלת הגירעון מעבר לקו מסוים.
1-2 בבנק _ _ _ _ _ _ _ _
1 ב ב ADP ADP _ 2 case _ _
2 בנק בנק NOUN NOUN Definite=Cons|Gender=Masc|Number=Sing 0 root _ _
3 ישראל ישראל PROPN PROPN _ 2 flat _ _
4 אמרו אמר VERB VERB Gender=Fem,Masc|HebBinyan=PAAL|Number=Plur|Person=3|Tense=Past|Voice=Act 2 dep _ _
2 בנק בנק PROPN PROPN Definite=Cons|Gender=Masc|Number=Sing 4 obl _ _
3 ישראל ישראל PROPN PROPN _ 2 compound _ _
4 אמרו אמר VERB VERB Gender=Fem,Masc|HebBinyan=PAAL|Number=Plur|Person=3|Tense=Past|Voice=Act 0 root _ _
5-6 שאפשר _ _ _ _ _ _ _ _
5 ש ש SCONJ SCONJ _ 7 mark _ _
6 אפשר אפשר AUX AUX VerbType=Mod 7 aux _ HebSource=ConvUncertainHead
7 לקיים קיים VERB VERB HebBinyan=PIEL|VerbForm=Inf|Voice=Act 2 dep _ _
5 ש ש SCONJ SCONJ _ 6 mark _ _
6 אפשר אפשר AUX AUX VerbType=Mod 4 ccomp _ HebSource=ConvUncertainHead
7 לקיים קיים VERB VERB HebBinyan=PIEL|VerbForm=Inf|Voice=Act 6 csubj _ _
8 גירעון גירעון NOUN NOUN Gender=Masc|Number=Sing 7 obj _ _
9 הרבה הרבה DET DET Definite=Cons 11 dep _ _
9 הרבה הרבה ADV ADV _ 10 advmod _ _
10 יותר יותר ADV ADV _ 11 advmod _ _
11 גדול גדול ADJ ADJ Gender=Masc|Number=Sing 8 amod _ _
12-13 בתקופה _ _ _ _ _ _ _ _
12 ב ב ADP ADP _ 13 case _ _
13 תקופה תקופה NOUN NOUN Gender=Fem|Number=Sing 7 obl _ _
14 זו זה PRON PRON Gender=Fem|Number=Sing|Person=3|PronType=Dem 13 det _ _
15 של של ADP ADP Case=Gen 16 case _ _
16 עלייה עלייה NOUN NOUN Gender=Fem|Number=Sing 13 nmod _ _
16 עלייה עלייה NOUN NOUN Gender=Fem|Number=Sing 13 nmod:poss _ _
17 מאסיווית מסיבי ADJ ADJ Gender=Fem|Number=Sing 16 amod _ SpaceAfter=No
18 , , PUNCT PUNCT _ 20 punct _ _
19 כאשר כאשר SCONJ SCONJ _ 20 mark _ _
20 אין אין VERB VERB HebExistential=Yes 13 acl:relcl _ _
20 אין אין VERB VERB HebExistential=Yes 6 advcl _ _
21 תעסוקה תעסוקה NOUN NOUN Gender=Fem|Number=Sing 20 nsubj _ _
22 מלאה מלא ADJ ADJ Gender=Fem|Number=Sing 21 amod _ _
23-24 במשק _ _ _ _ _ _ _ SpaceAfter=No
Expand All @@ -169603,11 +169603,11 @@
25 , , PUNCT PUNCT _ 30 punct _ _
26-27 ואילו _ _ _ _ _ _ _ _
26 ו ו CCONJ CCONJ _ 30 cc _ _
27 אילו אילו ADV ADV _ 30 advmod _ _
27 אילו אילו SCONJ ADV _ 26 fixed _ _
28-29 באוצר _ _ _ _ _ _ _ _
28 ב ב ADP ADP Definite=Def 29 case _ _
29 אוצר אוצר NOUN NOUN Gender=Masc|Number=Sing 30 obl _ _
30 התנגדו התנגד VERB VERB Gender=Fem,Masc|HebBinyan=HITPAEL|Number=Plur|Person=3|Tense=Past 2 conj _ _
30 התנגדו התנגד VERB VERB Gender=Fem,Masc|HebBinyan=HITPAEL|Number=Plur|Person=3|Tense=Past 4 conj _ _
31-32 להגדלת _ _ _ _ _ _ _ _
31 ל ל ADP ADP _ 32 case _ _
32 הגדלת הגדלה NOUN NOUN Definite=Cons|Gender=Fem|Number=Sing 30 obl _ _
Expand All @@ -169619,7 +169619,7 @@
36 ל ל ADP ADP _ 35 fixed _ _
37 קו קו NOUN NOUN Gender=Masc|Number=Sing 32 nmod _ _
38 מסוים מסוים ADJ ADJ Gender=Masc|Number=Sing 37 amod _ SpaceAfter=No
39 . . PUNCT PUNCT _ 2 punct _ _
39 . . PUNCT PUNCT _ 4 punct _ _

# sent_id = 5373
# text = באוצר התחילו לדון ברמה עקרונית על הגדלת מסי הקנייה לפני שבועות אחדים, אבל שום הצעה פורמאלית עדיין לא גובשה.
Expand Down