-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Aligned FrenchStemmer.cs with Java code #654
fix: Aligned FrenchStemmer.cs with Java code #654
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you did find an issue, just not the one you thought 😁. It might be cleaner to close #654 and #655 and open a new PR (or PRs as noted below) fixing the order of precedence issue throughout the solution as well as switching to the extension methods in J2N noted below.
Since all of the stemmers (and other files in Lucene.Net.Analysis.Common
) were ported at the same time using an automated Java to C# converter, it wouldn't surprise me if this same precedence issue exists across all of the .Remove(startIndex, length)
calls in the entire project.
Also since the time Lucene.Net.Analysis.Common
was ported, we have added extension methods for Replace()
and Delete()
in J2N that take into account very subtle differences in behavior. Be aware that they use .NET semantics startIndex
and length
, since it would be unwieldly to convert between 2 conventions all the time in the same application.
A type of review you could do is to track down as many of the methods as you can that have to make this conversion from start/end to startIndex/length, check their order of precedence conversion, and change their usage to use the methods in J2N, if appropriate. The best place to start is to compare all of the String
and StringBuilder
method parameters between .NET and Java, since they are the most critical for Lucene.NET. If you prefer to submit 1 PR per method review across the entire project, that is fine.
And if it is sensible to add a new extension method, you may submit a PR to J2N with full documentation (generally copied from .NET) and tests ported from JavaSE 8. Note that the API should match similar methods in .NET, but the behavior should match JavaSE 8.
I've now looked for other occurrences of this error but was unable to find any. Therefore, I think the replacement of the methods with the ones found in J2N might be a better candidate for creating an issue with a link to this PR and then we can open PR's targeting the replacement specifically 😄 I've also pushed a commit with the parentheses fixes to this PR. And just to be sure then the idea would be to replace:
|
Yes, those conversions are correct. Although, we should be checking against the lucene 4.8.0 or 4.8.1 source to be sure. |
[File: FrenchStemmer.cs]
Java reference: https://github.com/apache/lucene/blob/f01152a5909fa6059f4f1d4aeb4e14968ef1d8c2/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchStemmer.java#L450-L477
I'm unsure if this is covered by and unit tests.
Detected on 3 issues here: https://sonarcloud.io/project/issues?fileUuids=AYPAuMLmhbfJOGLOoZHT&resolved=false&severities=MAJOR&types=BUG&id=nikcio_lucenenet&open=AYPAuPYNhbfJOGLOobEI