Skip to content

Re-enable rspec of failing tests #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

ronaldtse
Copy link
Contributor

For #23.

@ronaldtse
Copy link
Contributor Author

To run the tests: run bundle exec rake and the test suite will be run.

@ronaldtse ronaldtse mentioned this pull request Aug 20, 2021
ronaldtse added a commit that referenced this pull request Aug 20, 2021
@ronaldtse ronaldtse force-pushed the rt-fix-reconcile-rspec branch from 7cefa04 to 7b82682 Compare August 20, 2021 07:10
@ronaldtse
Copy link
Contributor Author

ronaldtse commented Aug 20, 2021

UPDATED failures:

1) Rababa::Diacritizer diacriticizes قطر
  Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

    expected: "قَطَر"
         got: "قَطُر"

    (compared using ==)
  # ./spec/rababa/diacritizer_spec.rb:82:in `block (3 levels) in <top (required)>'

2) Rababa::Diacritizer diacriticizes وقال ادخلوا مصر إن شاء الله آمنين
  Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

    expected: "وَقَالَ ادْخُلُوا مِصْرَ إِن شَاءَ اللَّهُ آمِنِينَ"
         got: "وَقَالَ اُدْخُلُوا مِصْرَ إِنْ شَاءَ اللَّهُ آمِنِيْن"

    (compared using ==)
  # ./spec/rababa/diacritizer_spec.rb:82:in `block (3 levels) in <top (required)>'

3) Rababa::Diacritizer diacriticizes يذهب المسلمون كل عام إلى المملكة العربية السعودية لأداء مناسك الحج
  Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

    expected: "يَذْهِبُ الْمُسْلِمُونَ كُلَّ عَامٍ إِلَى الْمَمْلَكَةِ الْعَرَبِيَّةِ السُّعُودِيَّةِ لِأَدَاءِ مَنَاسِكِ الْحَجِّ"
         got: "يُذْهَبُ المُسْلِمُونَ كُلَّ عَامٍ إِلَى المَمْلَكَةِ العَرَبِيَّةِ السُّعُودِيَّةِ لِأَدَاءِ مَنَاسِكِ الحَج"

    (compared using ==)
  # ./spec/rababa/diacritizer_spec.rb:82:in `block (3 levels) in <top (required)>'

4) Rababa::Diacritizer diacriticizes لقد كان في يوسف وإخوته آيات للسائلين
  Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

    expected: "لَقَدْ كَانَ فِي يُوسُفَ وَإِخْوَتِهِ آيَاتٌ لِلسَّائِلِينَ"
         got: "لَقَدْ كَانَ فِي يُوسُفَ وَإِخْوَتِهِ آيَاتٌ لِلسَّائِلِين"

    (compared using ==)
  # ./spec/rababa/diacritizer_spec.rb:82:in `block (3 levels) in <top (required)>'

5) Rababa::Diacritizer diacriticizes الحمد لله رب العالمين
  Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

    expected: "الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ"
         got: "الحَمْدُ لِلَّهِ رَبِّ العَالَمِين"

    (compared using ==)
  # ./spec/rababa/diacritizer_spec.rb:82:in `block (3 levels) in <top (required)>'

6) Rababa::Diacritizer diacriticizes وما كان الله ليعذبهم وأنت فيهم
  Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

    expected: "وَمَا كَانَ اللَّهُ لِيُعَذِّبَهُمْ وَأَنْتَ فِيهِمْ"
         got: "وَمَا كَانَ اللَّهُ لِيُعَذِّبَهُمْ وَأَنْتَ فِيهِم"

    (compared using ==)
  # ./spec/rababa/diacritizer_spec.rb:82:in `block (3 levels) in <top (required)>'

7) Rababa::Diacritizer diacriticizes نحن نقص عليك أحسن القصص
  Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

    expected: "نَحْنُ نَقُصُّ عَلَيْكَ أَحْسَنَ الْقَصَصِ"
         got: "نَحْنُ نَقُصّ عَلَيْكَ أَحْسَنَ القَصَص"

    (compared using ==)
  # ./spec/rababa/diacritizer_spec.rb:82:in `block (3 levels) in <top (required)>'

8) Rababa::Diacritizer diacriticizes سأذهب إلى برج eiffel
  Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

    expected: "سَأُذْهِبُ إِلَى بُرْجِ eiffel"
         got: "سَأَذْهَبُ إِلَى بُرْج eiffel"

    (compared using ==)
  # ./spec/rababa/diacritizer_spec.rb:82:in `block (3 levels) in <top (required)>'

9) Rababa::Diacritizer diacriticizes # گيله پسمير الجديد 34
  Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

    expected: "# گيَلِهُ پسُمِيْرٌ الجَدِيدُ 34"
         got: "# گيَلِهُ پسُمِيْرٌ الجَدِيد 34"

    (compared using ==)
  # ./spec/rababa/diacritizer_spec.rb:82:in `block (3 levels) in <top (required)>'

@ronaldtse
Copy link
Contributor Author

ronaldtse commented Aug 20, 2021

One transformation that seems to fail many examples is this one:

1. It converts "اللَّهُ" to "الللَّهُ": there is one character inserted in between?

@ronaldtse
Copy link
Contributor Author

  1. The other common failure is the missing diacritic at the last character of a sentence.

ronaldtse added a commit that referenced this pull request Aug 21, 2021
@ronaldtse ronaldtse force-pushed the rt-fix-reconcile-rspec branch from 7b82682 to 5be9033 Compare August 21, 2021 04:16
@ronaldtse
Copy link
Contributor Author

One transformation that seems to fail many examples is this one:

  1. It converts "اللَّهُ" to "الللَّهُ": there is one character inserted in between?

Confirm that this issue is fixed in 312ff4a.

ronaldtse added a commit that referenced this pull request Aug 23, 2021
@ronaldtse ronaldtse force-pushed the rt-fix-reconcile-rspec branch from 5be9033 to 1604730 Compare August 23, 2021 06:31
@ronaldtse ronaldtse force-pushed the rt-fix-reconcile-rspec branch from 1604730 to af6a9ec Compare August 23, 2021 06:44
@ronaldtse
Copy link
Contributor Author

@AhMohsen46 Upon looking closer there are some minor failures, can you comment on them individually? Do you think more training by @gilgameshjw can solve these issues?

  • Qatar:
    expected: "قَطَر"
    got: "قَطُر"

  • missing last diacritic: (7 tests)
    expected: "وَقَالَ ادْخُلُوا مِصْرَ إِن شَاءَ اللَّهُ آمِنِينَ"
    got: "وَقَالَ اُدْخُلُوا مِصْرَ إِنْ شَاءَ اللَّهُ آمِنِيْن"

  • missing diacritic in the middle of sentence (last char of word) (1 test)
    expected: "نَحْنُ نَقُصُّ عَلَيْكَ أَحْسَنَ الْقَصَصِ"
    got: "نَحْنُ نَقُصّ عَلَيْكَ أَحْسَنَ القَصَص"

@gilgameshjw
Copy link
Contributor

@AhMohsen46 Upon looking closer there are some minor failures, can you comment on them individually? Do you think more training by @gilgameshjw can solve these issues?

* Qatar:
  expected: "قَطَر"
  got: "قَطُر"

* missing last diacritic: (7 tests)
  expected: "وَقَالَ ادْخُلُوا مِصْرَ إِن شَاءَ اللَّهُ آمِنِينَ"
  got: "وَقَالَ اُدْخُلُوا مِصْرَ إِنْ شَاءَ اللَّهُ آمِنِيْن"

I have proposed as solution to create words dictionaries with our geodata and the data from the training dataset.

We could then match diacritized text to the closest real word candidate and thus increase accuracy, IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants