Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving adposition spelling variants #51

Closed
nschneid opened this issue Jun 22, 2018 · 4 comments
Closed

Resolving adposition spelling variants #51

nschneid opened this issue Jun 22, 2018 · 4 comments

Comments

@nschneid
Copy link
Contributor

Each adposition has a citation form and may have variants. For example, the citation form for all possessives is 's, and possessive pronouns need to be linked to this. "Toward"/"towards" and "out of"/"outta" may be considered conventionalized variant spellings. Moreover, annotated sentences may have adpositions with nonstandard spellings or capitalization. The p Markdown macro thus needs to be able to link to an adposition whose canonical name is different from the one used in the sentence.

Proposed solution:

  1. Extend the p macro to include a display spelling that differs from the canonical lemma: [p my en/'s] or [p my en/'s Possessor]. [p en/'s] would continue to work and be equivalent to [p 's en/'s].

  2. To avoid verbosity in the Markdown for standard possessive pronouns and other spelling variants, [p en/my] and similar should, in the absence of a matching citation form, search for a match in the other_forms field. If exactly one is found, the link will point to that adposition.

ablodge added a commit that referenced this issue Jul 2, 2018
ablodge added a commit that referenced this issue Jul 2, 2018
@ablodge
Copy link
Contributor

ablodge commented Jul 2, 2018

    1. is accomplished by the macro pspecial as in [pspecial en/'s my Possessor]. I added a new macro to avoid further overloading p.
    1. is a bit harder, but is currently implemented as function Adposition.normalized_adp() with hardcoded lists of standard spelling variants. (I have implementation of the same function using other_forms in comments.) The trouble is the list of spelling variants has to be hardcoded somewhere before Adpositions and PTokenAnnotations can be imported. Otherwise the script won't know where to send each Adposition foreign key.

@ablodge ablodge closed this as completed Jul 2, 2018
@nschneid
Copy link
Contributor Author

nschneid commented Jul 3, 2018

OK, I guess my assumption was that adpositions with other_forms would be manually created before importing any tokens. Should we support deleting an adposition after it's been created and merging its tokens with another adposition?

@ablodge
Copy link
Contributor

ablodge commented Jul 3, 2018

I can do it by creating them manually. Is that convenient for Hebrew and other languages where adpositions have more than one variant?

@nschneid
Copy link
Contributor Author

nschneid commented Jul 3, 2018

If you're talking about pronominal inflections in Hebrew, those are not reflected in the lemma, so we're OK as long as long as the text has been morphologically processed.

In English we can't rely on lemmas for possessives because the current UD policy is super weird: UniversalDependencies/docs#517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants