Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Aligned FrenchStemmer.cs with Java code #654

Merged
merged 2 commits into from
Oct 14, 2022

Conversation

@nikcio nikcio changed the title fix: Aligned DeleteButSuffixFromElseReplace with Java code fix: Aligned DeleteButSuffixFromElseReplace with Java code (FrenchStemmer.cs) Oct 14, 2022
Copy link
Contributor

@NightOwl888 NightOwl888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you did find an issue, just not the one you thought 😁. It might be cleaner to close #654 and #655 and open a new PR (or PRs as noted below) fixing the order of precedence issue throughout the solution as well as switching to the extension methods in J2N noted below.

Since all of the stemmers (and other files in Lucene.Net.Analysis.Common) were ported at the same time using an automated Java to C# converter, it wouldn't surprise me if this same precedence issue exists across all of the .Remove(startIndex, length) calls in the entire project.

Also since the time Lucene.Net.Analysis.Common was ported, we have added extension methods for Replace() and Delete() in J2N that take into account very subtle differences in behavior. Be aware that they use .NET semantics startIndex and length, since it would be unwieldly to convert between 2 conventions all the time in the same application.

A type of review you could do is to track down as many of the methods as you can that have to make this conversion from start/end to startIndex/length, check their order of precedence conversion, and change their usage to use the methods in J2N, if appropriate. The best place to start is to compare all of the String and StringBuilder method parameters between .NET and Java, since they are the most critical for Lucene.NET. If you prefer to submit 1 PR per method review across the entire project, that is fine.

And if it is sensible to add a new extension method, you may submit a PR to J2N with full documentation (generally copied from .NET) and tests ported from JavaSE 8. Note that the API should match similar methods in .NET, but the behavior should match JavaSE 8.

@nikcio
Copy link
Contributor Author

nikcio commented Oct 14, 2022

Well, you did find an issue, just not the one you thought 😁. It might be cleaner to close #654 and #655 and open a new PR (or PRs as noted below) fixing the order of precedence issue throughout the solution as well as switching to the extension methods in J2N noted below.

Since all of the stemmers (and other files in Lucene.Net.Analysis.Common) were ported at the same time using an automated Java to C# converter, it wouldn't surprise me if this same precedence issue exists across all of the .Remove(startIndex, length) calls in the entire project.

Also since the time Lucene.Net.Analysis.Common was ported, we have added extension methods for Replace() and Delete() in J2N that take into account very subtle differences in behavior. Be aware that they use .NET semantics startIndex and length, since it would be unwieldly to convert between 2 conventions all the time in the same application.

A type of review you could do is to track down as many of the methods as you can that have to make this conversion from start/end to startIndex/length, check their order of precedence conversion, and change their usage to use the methods in J2N, if appropriate. The best place to start is to compare all of the String and StringBuilder method parameters between .NET and Java, since they are the most critical for Lucene.NET. If you prefer to submit 1 PR per method review across the entire project, that is fine.

And if it is sensible to add a new extension method, you may submit a PR to J2N with full documentation (generally copied from .NET) and tests ported from JavaSE 8. Note that the API should match similar methods in .NET, but the behavior should match JavaSE 8.

I've now looked for other occurrences of this error but was unable to find any. Therefore, I think the replacement of the methods with the ones found in J2N might be a better candidate for creating an issue with a link to this PR and then we can open PR's targeting the replacement specifically 😄

I've also pushed a commit with the parentheses fixes to this PR.

And just to be sure then the idea would be to replace:

  • StringBuilder.Remove replaced with StringBuilder.Delete (From J2N)
  • StringBuilder.Remove followed by StringBuilder.Insert replaced with StringBuilder.Replace (From J2N)

@nikcio nikcio changed the title fix: Aligned DeleteButSuffixFromElseReplace with Java code (FrenchStemmer.cs) fix: Aligned FrenchStemmer.cs with Java code Oct 14, 2022
@NightOwl888
Copy link
Contributor

NightOwl888 commented Oct 14, 2022

Well, you did find an issue, just not the one you thought 😁. It might be cleaner to close #654 and #655 and open a new PR (or PRs as noted below) fixing the order of precedence issue throughout the solution as well as switching to the extension methods in J2N noted below.
Since all of the stemmers (and other files in Lucene.Net.Analysis.Common) were ported at the same time using an automated Java to C# converter, it wouldn't surprise me if this same precedence issue exists across all of the .Remove(startIndex, length) calls in the entire project.
Also since the time Lucene.Net.Analysis.Common was ported, we have added extension methods for Replace() and Delete() in J2N that take into account very subtle differences in behavior. Be aware that they use .NET semantics startIndex and length, since it would be unwieldly to convert between 2 conventions all the time in the same application.
A type of review you could do is to track down as many of the methods as you can that have to make this conversion from start/end to startIndex/length, check their order of precedence conversion, and change their usage to use the methods in J2N, if appropriate. The best place to start is to compare all of the String and StringBuilder method parameters between .NET and Java, since they are the most critical for Lucene.NET. If you prefer to submit 1 PR per method review across the entire project, that is fine.
And if it is sensible to add a new extension method, you may submit a PR to J2N with full documentation (generally copied from .NET) and tests ported from JavaSE 8. Note that the API should match similar methods in .NET, but the behavior should match JavaSE 8.

I've now looked for other occurrences of this error but was unable to find any. Therefore, I think the replacement of the methods with the ones found in J2N might be a better candidate for creating an issue with a link to this PR and then we can open PR's targeting the replacement specifically 😄

I've also pushed a commit with the parentheses fixes to this PR.

And just to be sure then the idea would be to replace:

  • StringBuilder.Remove replaced with StringBuilder.Delete (From J2N)
  • StringBuilder.Remove followed by StringBuilder.Insert replaced with StringBuilder.Replace (From J2N)

Yes, those conversions are correct. Although, we should be checking against the lucene 4.8.0 or 4.8.1 source to be sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants