-
Notifications
You must be signed in to change notification settings - Fork 6
/
howto_lemmatizing.dita
141 lines (141 loc) · 10.4 KB
/
howto_lemmatizing.dita
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="concept_i2s_qwf_mr">
<title>Lemmatizing</title>
<conbody>
<p>Lemmatization is one of the most important steps in the BTS text editing process and is used
in almost all queries and analyses done in the database, e.g. if you want to find references
for a specific lemma, make collocation analyses, or search for word combinations etc. It is
therefore essential that you lemmatize your text correctly.</p>
<p><b>Opening and activating the lemmatizer</b></p>
<p>To lemmatize a transliterated text you need two tabs open: <xref
href="ui_texteditor_e10.dita">Sign Text Editor</xref> (default location: top center) and
<xref href="ui_lemmatizer.dita">Lemmatizer</xref> (default location: bottom center; note
that the title of this tab changes as soon as you select a database object in the Navigator
tab). Only non-grey tokens in the "Sign Text Editor" are lemmata and can be lemmatized. </p>
<p><image href="Images/STE.png" id="image_erk_k3s_2x"/></p>
<p/>
<p>Several elements are displayed in the <image href="Images/icons/full/icon16/lemmatizer.png"
id="image_dhr_pqv_nr"/> "Lemmatizer" tab:</p>
<p><image href="Images/new%20screenshots/BTS%20Screenshots/Lemmatizer_marked.png"
id="image_iht_myx_st"/></p>
<p>
<ol id="ol_xdz_4xy_qt">
<li>"Activate" button: activates/deactivates the Lemmatizer (grey = deactivated; blue =
activated).<p>NOTE: You should always deactivate the Lemmatizer after you have finished
the lemmatization to avoid unintentional lemmatizations.</p></li>
<li>"Selected Word" displays the transliteration of the lemma selected in the "Sign Text
Editor". In the example above, <i>wbn</i> has already been lemmatized so the other fields
are filled out. If you find out that you have made an error in the transliteration, you
should modify it in the "Selected Word" field; the change will be updated in the
Sign-Text-Editor and the Transliteration tab automatically. Note that the lemmatizer will
search for lemma according to your transliterated token, ignoring any text after a full
stop ".", which means you may have to manually search for tokens that contain this
character (e.g. ḥm.t-nṯr).<p>NOTE: If you make changes to an already lemmatized token in
the text editor, the token might loose its lemmatization, especially if you add a
character to the beginning or the end of the word.</p></li>
<li>The Lemma field shows the results of the system's search based on your transliterated
token; it shows the lemma number (WCN) from the word list (WL) and the transliteration of
the lemma which has been selected in the below list. To lemmatize a token, select the
appropriate suggestion from the list below. The default setting in
Preferences/Preferences/Lemmatizer allows you to activate the function "automatically
select the first lemma proposal". This feature enables you to navigate through the
proposals via the arrow keys on your keyboard.</li>
<li>In the field "Flexion" you can record the grammatical form of this word by entering the
appropriate code (see <xref href="grammar_inflection.dita"/>).</li>
<li>In the field "Translation" you can choose one or several translations from the lower
field; to choose several left-click and hold CTRL at the same time. If none of the options
seem suitable, you can enter your own translation instead. Make sure you have selected the
right language in the drop-down menu on the right; the only choices currently available
are German and English. </li>
</ol>
</p>
<p><b>Organization of the lemma list</b></p>
<p>The Lemmatizer provides a list of lemmas based on the parameters of your search (i.e. your
transliteration). The entries are sorted alphabetically according to their Egyptological
transliteration. Sublemmata are displayed as child-entries of their related (parent) lemmata.
Each entry includes the transliteration, translation, word class, and bibliographical
references of that lemma. The list is divided into two parts: at the beginning it displays the
entries in which your search term occurs in the first position; entries in which your search
term occurs in another position (e.g. composita) are shown in the latter part of the list.</p>
<p>NOTE: The list of results is limited to 500 items. If you cannot find your lemma in the list,
specify or alter your search parameters; you can use the TLA to find the word's lemma ID,
which you can type in the section below. </p>
<p>There are two kinds of icons in the list:</p>
<p>
<ul id="ul_cbf_syf_fx">
<li><image href="Images/Icons_updated/full/icon16/lemma_checked.png" id="image_mhb_jmm_fx"/>
This lemma has been revised and can be used for lemmatization.</li>
<li><image href="Images/icons/full/icon16/lemma-uncertain.png" id="image_ckk_1zf_fx"/> This
lemma has not been revised yet. This applies in particular to personal names from Ranke's
"Personennamen". If you would like to use this lemma, please contact the BTS team in
Berlin (aegypt1@bbaw.de).</li>
</ul>
</p>
<p><b>Finding the correct lemma</b></p>
<p>The lemmatizer automatically suggests lemmata whose transliteration exactly matches your
lemma. If your lemma is not listed, click the magnifying glass <image
href="Images/icons/full/icon16/search.png" id="image_zyn_pqp_nt"/> and search for it in the
lemma list by entering the WCN (i.e. the ID), the transliteration, or the translation of the
lemma. You can also filter the search results by using "Search for IDs" or "Search for Names
only". Additionally, you can add wild cards / quotation marks by clicking the provided buttons
in the search pop-up. For more details on the search function see <xref
href="howto_searching.dita"/>. </p>
<p><i>Search for ID (WCN number)</i></p>
<p>To filter for a specific WCN you have to activate the check-box "Search for ID", as the
(default) "Full text search" does not search for IDs. (This does not apply to the lemmata
imported from the previous BTS version, which have the WCN set an external ID and which will
thus show up in a "Full text search result"). </p>
<p><i>Search for transliteration</i></p>
<p>If you use the filter "Search for Names only", BTS does not execute a full text search, but
only a search in the names of the database objects. This does not mean names as in "Personal
Names", but is the standard description for what a database object is called when created,
i.e. in this case it means the transliteration of the lemma (dictionary form). This box is
activated by default, because otherwise BTS will make a full text search in the lemma list
which would yield too many results. </p>
<p><b>Confirming the selected lemma</b></p>
<p>When you have found the correct lemma in the list, select it by left-clicking on it. Now
select the specific translation for your text and click on <image
href="Images/icons/full/icon16/lemma-ok.png" id="image_ktj_cqf_fx"/> "Confirm current lemma
editing and continue to next unlemmatized word". The program will automatically move to the
next unlemmatized word.</p>
<p>NOTE: Due to technical reasons the confirmation button (<image
href="Images/icons/full/icon16/lemma-ok.png" id="image_i5c_52x_dx"/>) does not always work.
If this happens just press it a second time. Sometimes clicking this button <image
href="Images/icons/full/icon16/lemma-ok.png" id="image_fsj_4qf_fx"/> moves the program to
the first unlemmatized word in the text, rather than to the next unlemmatized word.</p>
<p><b>Skipping lemmatization and navigating in the Sign Text Editor</b></p>
<p>If you do not want to lemmatize a word, e.g. because the lemma does not exist in the list or
you are not yet sure about its identification, you can skip the lemmatization by either
clicking on the next word in the Sign Text Editor or by using the ">" button ("move selection
to next word"). You can also use these buttons to navigate through the text without
lemmatizing. The button ">" brings you to the next word, ">|" to the end of the line, and >||
to the end of the text; the buttons "<", "|<" and "||<" work vice versa.</p>
<p><b>Encoding inflection</b></p>
<p>We recommend lemmatizing a text and then adding codes in the field "Flexion". If you decide
to encode the inflection during the lemmatization process, type the appropriate code into the
field "Flexion" either before or after you have selected the lemma, and then confirm with
<image href="Images/icons/full/icon16/lemma-ok.png" id="image_zt4_fvf_fx"/>. For details see
<xref href="howto_encoding.dita"/>.</p>
<p><b>Remove all lemma information</b></p>
<p>If you want to undo the lemmatization of a word, select the word and click on the button
<image href="Images/Icons_updated/full/icon16/lemmatizer-delete.png" id="image_l33_m2g_ms"/>
"Remove lemma information" in the Lemmatizer toolbar. This will remove both the lemmatization
and the encoded inflection.</p>
<p><b>Lemma does not exist</b></p>
<p>In addition to the "Wörterbuch der ägyptischen Sprache", the BTS lemma list contains words
from other sources. If you have a new lemma, which is not yet present in the BTS lemma list,
please contact the BTS team in Berlin: aegypt1@bbaw.de. </p>
<p>Before you send your proposal, please make sure that the lemma does not exist:</p>
<p>
<ul id="ul_vyl_yks_2x">
<li>The transliteration of the lemma in the BTS might be different from your own (e.g.
differentiation of s and z, weak consonants etc.).</li>
<li>Beware of punctuation and structural signs.</li>
<li>The word might be part of a composite-word.</li>
<li>Your word might correspond to an already existing lemma with a different meaning.</li>
<li>Your word list might not be indexed properly.</li>
</ul>
</p>
</conbody>
</concept>