-
-
Notifications
You must be signed in to change notification settings - Fork 16
Investigate alignment deuterocanonical books #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #284 +/- ##
==========================================
+ Coverage 70.50% 70.53% +0.02%
==========================================
Files 386 386
Lines 32311 32311
Branches 4546 4546
==========================================
+ Hits 22782 22791 +9
+ Misses 8479 8471 -8
+ Partials 1050 1049 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick scan of the PR looking more for coding issues, but I hope to look at the Google Doc and look over the issue and see if I have any big picture feedback. Thanks, Mudi!
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1273 at r1 (raw file):
}; Versification.Table.Implementation.RemoveAllUnknownVersifications();
Why is this line needed?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
if (!string.IsNullOrEmpty(mismatchReport)) { TestContext.WriteLine("Mismatches found:");
Shouldn't this test fail if there are mismatches? Could we pass the mismatchReport
variable as the message in the exception that's being thrown?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1411 at r1 (raw file):
public void ValidateSourceAndTargetReferencesWithBackup(string bookId, string verseRef) { string sourceCorpusFile = CorporaTestHelpers.TestDataPath + "/Source - LAT.zip";
It's in general not a good idea to commit the binary zip files. We've been moving toward checking in the unzipped Paratext projects. You can just use ParatextTextCorpus
instead of ParatextBackUpTextCorpus
.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1427 at r1 (raw file):
); ParallelTextCorpus parallelCorpus = new ParallelTextCorpus(sourceCorpus, targetCorpus, alignments)
Why not use the AlignRows
function?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1512 at r1 (raw file):
public void ValidateReferencesWithAllVersifications(ScrVers versification, string bookId, string verseRef) { string sourceCorpusFile = CorporaTestHelpers.TestDataPath + "/Source - LAT.zip";
Maybe some of this could be moved to a SetUp
function if it's consistent across all the tests?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1596 at r1 (raw file):
AllTargetRows = true }; ParallelTextRow[] rows = parallelCorpus.ToArray();
You should be able to just do parallelCorpus.GetRows(new string[] {bookId})
instead of using the Where
statement.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1710 at r1 (raw file):
ScrVers versification = versificationType switch { ScrVersType.Original => ScrVers.Original,
I think you have a helper function below that does this, right?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1800 at r1 (raw file):
Dictionary<string, string> expectedMappings = new Dictionary<string, string> { // { "S3Y 1:1-29", "DAG 3:24-52" },
Is there a reason to leave these commented out rows in?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1897 at r1 (raw file):
// Clean and normalize content for comparison string[] unwanted = { "÷" };
Why remove these before comparison? Aren't the source and target the same?
tests/SIL.Machine.Tests/Corpora/TestData/Source - LAT.zip
line 0 at r1 (raw file):
❗ If this is the source text I sent you (and not the one Peter offered to construct), it's not public domain and all record of it needs to be removed from the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @Enkidu93)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1411 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
It's in general not a good idea to commit the binary zip files. We've been moving toward checking in the unzipped Paratext projects. You can just use
ParatextTextCorpus
instead ofParatextBackUpTextCorpus
.
Noted. Will need a little help on that.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1512 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Maybe some of this could be moved to a
SetUp
function if it's consistent across all the tests?
Noted. Will refactor
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1596 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
You should be able to just do
parallelCorpus.GetRows(new string[] {bookId})
instead of using theWhere
statement.
Noted. Will refactor
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1710 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I think you have a helper function below that does this, right?
Yes. Will refactor
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1800 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Is there a reason to leave these commented out rows in?
Yes. They are also mappings found in the vrs file. Well, I will take them out. One mapping is sufficient for the test.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1897 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Why remove these before comparison? Aren't the source and target the same?
No they are not the same due to the presence of that character in the target files
tests/SIL.Machine.Tests/Corpora/TestData/Source - LAT.zip
line at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
❗ If this is the source text I sent you (and not the one Peter offered to construct), it's not public domain and all record of it needs to be removed from the repo.
Noted. Will do that
Previously, mudiagaobrikisil wrote…
You should be able to just extract the binary like a normal .zip file and put it in the repo. See here for an example of the unzipped project being used. If that doesn't clear it up, let me know & we can chat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @Enkidu93)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1411 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
You should be able to just extract the binary like a normal .zip file and put it in the repo. See here for an example of the unzipped project being used. If that doesn't clear it up, let me know & we can chat.
I understand that part. However, since you said the files are not for public domain yet, that is why I need to help and some explanations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 4 files reviewed, 11 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1916 at r2 (raw file):
Versification = versification }; ParatextBackupTextCorpus targetCorpus = new ParatextBackupTextCorpus(sourceCorpusFile)
You use sourceCorpusFile for both sourceCorpus and targetCorpus, so shouldn't they be the same below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 4 files reviewed, 11 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Shouldn't this test fail if there are mismatches? Could we pass the
mismatchReport
variable as the message in the exception that's being thrown?
@johnml1135 suggested I log the issues out and let the tests pass gracefully, since we are not immediately addressing the issues and it would affect the CI/CD pipeline if the tests fail.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1427 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Why not use the
AlignRows
function?
Done
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1916 at r2 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
You use sourceCorpusFile for both sourceCorpus and targetCorpus, so shouldn't they be the same below?
Refactored to use one Corpora
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2, 30 of 30 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1273 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Why is this line needed?
So why include this line?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, mudiagaobrikisil wrote…
@johnml1135 suggested I log the issues out and let the tests pass gracefully, since we are not immediately addressing the issues and it would affect the CI/CD pipeline if the tests fail.
OK, sounds good
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 81 at r3 (raw file):
/// </summary> /// <returns>A tuple containing the source corpus (first) and target corpus (second).</returns> public static (ParatextTextCorpus sourceCorpus, ParatextTextCorpus targetCorpus) GetDeuterocanonicalCorpora()
Is this used/necessary?
Previously, Enkidu93 (Eli C. Lowry) wrote…
I may have misspoken - if there is a mismatch, we should fail and address the underlying issue. If it becomes too difficult, we can create a new issue to document the issue, but no one will ever look at the text output here. Please make the tests fail and if there are failures, we should understand the issue fully (and document in new issues) before knowledgeably removing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 29 of 35 files reviewed, 3 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 81 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Is this used/necessary?
It is not necessary. It has been removed
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1273 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
So why include this line?
I think it's not needed. I have removed it.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
I may have misspoken - if there is a mismatch, we should fail and address the underlying issue. If it becomes too difficult, we can create a new issue to document the issue, but no one will ever look at the text output here. Please make the tests fail and if there are failures, we should understand the issue fully (and document in new issues) before knowledgeably removing tests.
Actually the expected behavior of the test is that it should fail. The mismatch is deliberate and failing is the right response if there is a mismatch. I don't think there is an underlying issue here. There would be if the test does not catch the mismatch. However, if I let the test fail, since it is a test that should fail, would it not affect the CI/CD?
Previously, mudiagaobrikisil wrote…
If a mismatch is the expected behavior, the test should only pass if there is a mismatch. Ideally, the main boundary matches and mismatches would be asserted in these tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r4, all commit messages.
Reviewable status: 34 of 35 files reviewed, 1 unresolved discussion (waiting on @johnml1135 and @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
If a mismatch is the expected behavior, the test should only pass if there is a mismatch. Ideally, the main boundary matches and mismatches would be asserted in these tests.
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 34 of 35 files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
^
@johnml1135 . Thank you. I have edited the test to ensure that it passes only if it catches the matches and the mismatches correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 34 of 35 files reviewed, 1 unresolved discussion (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1280 at r4 (raw file):
"vers.txt", ScrVers.English, "customVersification"
Why change the name?
Previously, mudiagaobrikisil wrote…
Great - I look forward to seeing the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 34 of 35 files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @mudiagaobrikisil)
83fc7a3
to
e193846
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 30 of 35 files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Great - I look forward to seeing the changes.
Changes pushed.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1280 at r4 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Why change the name?
This was an old change just to debug what the error was. I have refactored it to handle the error properly. The error was that it was looping through the books I specified and then trying to load a versification with the same key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 30 of 35 files reviewed, 2 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1280 at r4 (raw file):
Previously, mudiagaobrikisil wrote…
This was an old change just to debug what the error was. I have refactored it to handle the error properly. The error was that it was looping through the books I specified and then trying to load a versification with the same key.
OK, thank you 👍
Fixes sillsdev/serval#478 |
Previously, mudiagaobrikisil wrote…
Thanks for the updates Mudi! As I look at them, I notice that you are just counting the number of mismatches, not verifying each proper match or mismatch. Can you verify each one in a way that makes it clear what each match or mismatch is intending to verify (either self-documented or with comments as needed)? |
Eli - just for my sake, can you place a comment next to each test that verifies each "issue" from the word doc - that is, the ones that need to be verified: https://docs.google.com/document/d/1lV3Sa0IDxFkx8wcQ73xeBg8hs9A60OstzvqvOCOPw9o/edit?tab=t.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. @johnml1135! Issue 1, I think is covered at a book level indirectly many places but more directly in GetVersesInVersification_ButNotInSourceOrTarget
and Issue 2 is covered by GetVersesInSourceOrTarget_ButNotInVersification
. Is that right, Mudi? Without digging into the specific USFM files, it's hard to tell. Could you verify that that test covers both with specific examples.
Issues 3 and 4 don't require any action in the code. Issue 5, I believe, is already addressed in existing tests outside this PR indirectly. Would you like a test specific to the deuterocanon related to this, @johnml1135?
Issue 6: I'm not sure that this is covered; I'm also not 100% sure which cases this is meant to cover. Are you concerned about multiple verses mapping to the same verse?
Sorry about all the new items to address - I seemed to have missed some of the changes the first time through / forgot to get back to them. Thanks, Mudi!
Reviewable status: 30 of 35 files reviewed, 11 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1179 at r5 (raw file):
new MemoryText("Baruch", new[] { TextRow("Baruch", 5, "target segment 5 .") }), new MemoryText("1Maccabees", new[] { TextRow("1Maccabees", 6, "target segment 6 .") }), new MemoryText("2Maccabees", new[] { TextRow("2Maccabees", 7, "target segment 7 .") })
I don't believe that this is testing what you think it's testing since you're just passing a number as the row ref, not the actual refs that would exist for these books' rows.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1193 at r5 (raw file):
[Test] public void GetRows_AllDeuterocanonicalBooks_WithAlignments()
If you have this test, I don't think you really need the previous one at all.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1332 at r5 (raw file):
AllTargetRows = true }; ParallelTextRow[] rows = parallelCorpus.ToArray();
You can just iterate through the parallelCorpus - no need to call ToArray
here, I don't think.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1414 at r5 (raw file):
[TestCase("SUS", "SUS 1:1", Description = "Validate SUS Source and Target References")] [TestCase("BEL", "BEL 1:1", Description = "Validate BEL Source and Target References")] public void ValidateSourceAndTargetReferencesForDeuterocanonicals(string bookId, string verseRef)
It's conceivable that this could get messed up during alignment but more so in the case when either the target or source is missing. I think you could remove this test and put more specific asserts in the previous test to better cover this.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1499 at r5 (raw file):
} if (issues.Any())
Like above, this test should fail if there's an issue. You should have specific asserts.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1741 at r5 (raw file):
} Assert.Pass("Test completed");
Same as above. Have specific asserts, not just a report. I think there was some miscommunication about this, so no worries :). It should make things a lot simpler too!
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 141 at r5 (raw file):
} public static (string Book, int Chapter, int StartVerse, int EndVerse, bool IsSingleVerse) ParseRange(string range)
This can be made private
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 162 at r5 (raw file):
/// Removes unwanted characters in a corpus string. /// </summary> public static string CleanString(string input, string[] unwanted)
Remind me why this is needed? You only use this once to remove the division symbol, but since you're removing it from both source and target, couldn't you just as well leave it in and the comparison would still be the same?
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
/// Replace multiple spaces with a single space. /// </summary> public static string NormalizeSpaces(string input)
Same as above - not sure this is really needed although I like the idea of cleaning up the data. Also, even if you did want to clean the data, why not just normalize the data in the files instead of renormalizing it each time you test it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 30 of 35 files reviewed, 11 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 141 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
This can be made private
Done
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 162 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Remind me why this is needed? You only use this once to remove the division symbol, but since you're removing it from both source and target, couldn't you just as well leave it in and the comparison would still be the same?
I removed it in the source in case it appeared in the source (just playing safe), but in actual sense, it appeared only in the target. That's why I couldn't leave it in, as that would mean the comparison would not be the same
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Same as above - not sure this is really needed although I like the idea of cleaning up the data. Also, even if you did want to clean the data, why not just normalize the data in the files instead of renormalizing it each time you test it?
I didn't know if it was right for me to do that in the SFM files, which was why I just handled it in the code. What I think would be better is making the NormalizeSpaces part of the CleanString since the normalization occurs the same places I needed to clean the string. What do you think @Enkidu93 ?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1388 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Thanks for the updates Mudi! As I look at them, I notice that you are just counting the number of mismatches, not verifying each proper match or mismatch. Can you verify each one in a way that makes it clear what each match or mismatch is intending to verify (either self-documented or with comments as needed)?
Made some changes
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1179 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I don't believe that this is testing what you think it's testing since you're just passing a number as the row ref, not the actual refs that would exist for these books' rows.
This test has been removed
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1193 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
If you have this test, I don't think you really need the previous one at all.
I agree.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1332 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
You can just iterate through the parallelCorpus - no need to call
ToArray
here, I don't think.
You are correct. Thanks
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1414 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
It's conceivable that this could get messed up during alignment but more so in the case when either the target or source is missing. I think you could remove this test and put more specific asserts in the previous test to better cover this.
I have removed this and I am refactoring the above test to better cover this
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1499 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Like above, this test should fail if there's an issue. You should have specific asserts.
Done
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1741 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Same as above. Have specific asserts, not just a report. I think there was some miscommunication about this, so no worries :). It should make things a lot simpler too!
Done
Thanks Eli for doing the review. Issues 1-5 I am satisfied with your response on. For Issue 6 (mapping onto existing verses), I would like at least one test to verify the actual behavior (whatever it is) and to make sure that it does not crash. An example test is in the doc for Option B - Gen 1:3-4 mapping onto Gen 1:1-2. |
Previously, mudiagaobrikisil wrote…
Thank you for the updates, but I would really like the mismatches spelled out that are expected, not just counted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi John. Thanks. I will add a test for that.
Reviewable status: 28 of 35 files reviewed, 11 unresolved discussions (waiting on @Enkidu93)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 28 of 35 files reviewed, 7 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 162 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
I removed it in the source in case it appeared in the source (just playing safe), but in actual sense, it appeared only in the target. That's why I couldn't leave it in, as that would mean the comparison would not be the same
See comment below.
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
I didn't know if it was right for me to do that in the SFM files, which was why I just handled it in the code. What I think would be better is making the NormalizeSpaces part of the CleanString since the normalization occurs the same places I needed to clean the string. What do you think @Enkidu93 ?
I think it's reasonable to combine those two functions. I'm still not convinced that removing the division signs needs to happen at all. If the source and target in the test are the same corpus, then they will both have the division symbols, so there's no reason to remove them. If they're different, then even after you remove the division symbols, they'll still be different. What am I missing?
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1179 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
This test has been removed
Thank you! Great.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1414 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
I have removed this and I am refactoring the above test to better cover this
Sounds good.
tests/SIL.Machine.Tests/Corpora/ParallelTextCorpusTests.cs
line 1741 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
Done
Like John was saying above, specific asserts about the mappings is what we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 28 of 35 files reviewed, 7 unresolved discussions (waiting on @Enkidu93)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I think it's reasonable to combine those two functions. I'm still not convinced that removing the division signs needs to happen at all. If the source and target in the test are the same corpus, then they will both have the division symbols, so there's no reason to remove them. If they're different, then even after you remove the division symbols, they'll still be different. What am I missing?
Yes you are correct. The issue was from the initial source and target files Peter sent. Removing it made the test fail as only the target had the division sign. For the new files he sent, just tested and there was no need to remove the signs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 28 of 35 files reviewed, 7 unresolved discussions (waiting on @mudiagaobrikisil)
tests/SIL.Machine.Tests/Corpora/CorporaTestHelpers.cs
line 174 at r5 (raw file):
Previously, mudiagaobrikisil wrote…
Yes you are correct. The issue was from the initial source and target files Peter sent. Removing it made the test fail as only the target had the division sign. For the new files he sent, just tested and there was no need to remove the signs.
Great! Thank you - sounds good.
Why has this not been merged, @mudiagaobrikisil? Anything I can do to help you wrap this up? The tests will also then need to be ported to machine.py, correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of us will need to take over this PR.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Enkidu93, @johnml1135, and @mudiagaobrikisil)
src/SIL.Machine/Scripture/ScVersExtensions.cs
line 10 at r9 (raw file):
Previously, mudiagaobrikisil wrote…
Done
This should be moved to the Corpora
directory where it is used.
I can wrap this up if you'd like since I was advising Mudi along the way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Enkidu93, @johnml1135, and @mudiagaobrikisil)
src/SIL.Machine/Scripture/ScVersExtensions.cs
line 10 at r9 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I can wrap this up if you'd like since I was advising Mudi along the way.
Thanks. That would be great.
d78e315
to
6bad0e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 36 of 38 files reviewed, 4 unresolved discussions (waiting on @ddaspit, @johnml1135, and @mudiagaobrikisil)
src/SIL.Machine/Scripture/ScVersExtensions.cs
line 10 at r9 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Thanks. That would be great.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't even tell in Reviewable which comments are blocking this. If you could review it when you get a chance, @ddaspit, we can wrap this up.
Reviewable status: 36 of 38 files reviewed, 4 unresolved discussions (waiting on @ddaspit, @johnml1135, and @mudiagaobrikisil)
I think this is still blocked awaiting your approval, @ddaspit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 26 of 30 files at r3, 3 of 5 files at r10, 2 of 3 files at r11, 1 of 1 files at r12, 1 of 1 files at r13, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
src/SIL.Machine/Scripture/ScVersExtensions.cs
line 10 at r9 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Done.
This file needs to be moved to tests/SIL.Machine.Tests/Corpora
.
tests/SIL.Machine.Tests/ScrVersExtensions.cs
line 3 at r13 (raw file):
using SIL.Scripture; namespace SIL.Machine.Tests;
The namespace for the test assembly is actually SIL.Machine
, not SIL.Machine.Tests
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 36 of 38 files reviewed, 5 unresolved discussions (waiting on @ddaspit, @johnml1135, and @mudiagaobrikisil)
src/SIL.Machine/Scripture/ScVersExtensions.cs
line 10 at r9 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
This file needs to be moved to
tests/SIL.Machine.Tests/Corpora
.
Done - thank you for catching this - I really thought I had put it there
tests/SIL.Machine.Tests/ScrVersExtensions.cs
line 3 at r13 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
The namespace for the test assembly is actually
SIL.Machine
, notSIL.Machine.Tests
.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r14, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
tests/SIL.Machine.Tests/ScrVersExtensions.cs
line 3 at r13 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Done
Since you moved the file to the Corpora
directory, the namespace should be updated to reflect that, i.e. SIL.Machine.Corpora
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 37 of 38 files reviewed, 4 unresolved discussions (waiting on @ddaspit, @johnml1135, and @mudiagaobrikisil)
tests/SIL.Machine.Tests/ScrVersExtensions.cs
line 3 at r13 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Since you moved the file to the
Corpora
directory, the namespace should be updated to reflect that, i.e.SIL.Machine.Corpora
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r15, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
investigate possible alignment and versification issues with deuterocanonical books
5505115
to
bbf14af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r13, 1 of 2 files at r14, 1 of 1 files at r15, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
@ddaspit, can you merge even though John is blocking? I don't have permissions, and I don't see a way to dismiss him from the review. |
I have reviewed this PR and it addresses all comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can dismiss John's comments.
Reviewed all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Enkidu93 There are two remaining comments that you added, so you will need to resolve them.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @johnml1135 and @mudiagaobrikisil)
Contains new tests to check for possible issues with alignment, mapping and versification of deuterocanonical books.
The tests are as follows:
It also contains some helper methods.
This change is