Latin default package doesn't usually lemmatize words starting with a capital letter


Latin default package (ITTB) doesn't usually lemmatize words starting with a capital letter. This seems to be the case whether the word is a proper noun, normally capitalised (eg "Iacobi"), a common word that is extraordinarily capitalised, or a word capitalised out of devotion (eg "Deo"). This seems to be a systematic problem though in the example below "Erat" is lemmatized to "sum"; I have not done any digging into what might provoke this behaviour.

**To Reproduce**
see code below

**Environment (please complete the following information):**
 - OS: Ubuntu
 - Python version: Conda Python 3.10.9
 - Stanza version: 1.6.1


```python
import stanza
latindefault = stanza.Pipeline('la', processors='tokenize,pos,lemma' )
#%%


sent = "Quod Erat Demonstrandum" 

print(latindefault(sent))

#### Correctly diagnoses parts of speech; does not lemmatize.
 # {
 #      "id": 3,
 #      "text": "Demonstrandum",
 #      "lemma": "Demonstrandum",
 #      "upos": "VERB",
 #      "xpos": "J2|modO|grp1|casA|gen3",
 #      "feats": "Aspect=Prosp|Case=Nom|Gender=Neut|InflClass=LatA|InflClass[nominal]=IndEurO|Number=Sing|VerbForm=Part|Voice=Pass",
 #      "start_char": 10,
 #      "end_char": 23
 #    }

print(latindefault(sent.lower()))
#### Correctly diagnoses parts of speech and lemmatizes.

# {
#       "id": 3,
#       "text": "demonstrandum",
#       "lemma": "demonstro",
#       "upos": "VERB",
#       "xpos": "J2|modO|grp1|casA|gen3",
#       "feats": "Aspect=Prosp|Case=Nom|Gender=Neut|InflClass=LatA|InflClass[nominal]=IndEurO|Number=Sing|VerbForm=Part|Voice=Pass",
#       "start_char": 10,
#       "end_char": 23
#     }
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Latin default package doesn't usually lemmatize words starting with a capital letter #1330

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Latin default package doesn't usually lemmatize words starting with a capital letter #1330

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions