Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9776: Hunspell: allow to inflect the last part of COMPOUNDRULE compounds #2397

Merged
merged 1 commit into from
Feb 19, 2021
Merged

LUCENE-9776: Hunspell: allow to inflect the last part of COMPOUNDRULE compounds #2397

merged 1 commit into from
Feb 19, 2021

Conversation

donnerpeter
Copy link
Contributor

Description

To support Dutch "15-urige" (inflected "15-urig")

Solution

To allow affixes and honor ONLYINCOMPOUND there but not COMPOUNDFLAG/COMPOUNDEND, introduce yet another compound word context COMPOUND_RULE_END, use it in the last part of a compound rule.

Tests

compoundrule4 expanded

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the master branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Ref Guide (for Solr changes only).

@@ -397,8 +398,7 @@ private boolean checkCompoundRules(
if (forms != null) {
words.add(forms);

if (dictionary.compoundRules != null
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null check is done by the caller of this method

IntsRef forms = dictionary.lookupWord(wordChars, start, length);
if (forms == null) return false;
IntsRef ref = new IntsRef(new int[1], 0, 1);
words.add(ref);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse the "forms" object at the stack top, change its contents for each root candidate

// we can't add this form, it only belongs inside a compound word
if (!context.isCompound() && dictionary.hasFlag(entryId, dictionary.onlyincompound)) {
continue;
if ((context == WordContext.COMPOUND_BEGIN || context == WordContext.COMPOUND_MIDDLE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "compoundforbid" check extracted (and the contexts enumerated), everything else went into an extracted method, since the same logic was needed in applyAffix

@@ -540,8 +533,8 @@ private boolean isAffixCompatible(
if (!isPrefix && dictionary.hasFlag(append, dictionary.compoundForbid)) {
return false;
}
WordContext allowed = isPrefix ? WordContext.COMPOUND_BEGIN : WordContext.COMPOUND_END;
if (context != allowed && !dictionary.hasFlag(append, dictionary.compoundPermit)) {
if (!context.isAffixAllowedWithoutSpecialPermit(isPrefix)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check becomes a bit more complicated with the addition of a new context, so it's extracted to a method

@@ -550,18 +543,17 @@ private boolean isAffixCompatible(
&& dictionary.hasFlag(append, dictionary.onlyincompound)) {
return false;
}
} else if (dictionary.hasFlag(append, dictionary.onlyincompound)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two similar checks from below are now united here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants