By the python difflib.SequenceMatcher
Download result from Releases page.
Steps:
- Import quran-simple.sql (from tanzil.net project) to a MySQL/MariaDB database with name 'quran'.
- Run removeBesmelah.sql
- Edit database connection config in findCommonPhrases.py if needed.
- Run findCommonPhrases.py
Result help:
- id: Row id.
- a_ayah: First ayah number.
- a_surah: First surah number.
- a_text: First ayah text to compare.
- b_ayah: Second ayah number.
- b_surah: Second surah number.
- b_text: Second ayah text to compare.
- issame: Two ayahs are same or not (1=same,0=not same).
- matchingblock: Common phrase of two ayahs.
- a_place: Common phrase location in first ayah.
- b_place: Common phrase location in second ayah.
- length: Length of common phrase (by words)
- ratio: Ratio of the phrase similarity (a number between 0 and 1. 1=same)