-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree spell checker #119
Tree spell checker #119
Conversation
Fix that HumanTypo gives correct words
Comment out rspec tests Move across HumanTypo files Pass first minitests Pass initial tests Align TreeSpellChecker with SpellChecker api Fix issue with array initialisation Working why easy cases are returning [] Pass all tests and first_times Refactor tests Working on human typo boundary cases Add test_word_error First working version, ATP Working on print out Working on re-arranging files to have tree_spell suffix Add combined to table Add explore test for HumanTypo Add separator argument to TSP Tidy up Remove pry Reconcile Gemfile
I have developed a prototype of the
I have used three criteria for the comparison, the percentage of first suggestions that were correct, the mean number of suggestions that were offered, and the percentage of failures i.e. number of correct suggestion returned. The code for the It can be seen that Tree spell checker normally just offers one suggestion, and that suggestion is correct about 92% of the time. The Standard spell checker is correct 99% of the time but on average offers twice as many suggestions. At present the failure rate of the Tree spell checker is high about 8%. I see evidence that much of this failure rate is just due to bugs in my code. However, part of the failure rate is due to a design decision that I have made. The decision is to assume that the input has valid tree syntax. Some errors will violate this assumption, e.g. an error that deletes a separator, providing an input such as So at this point, I would appreciate feedback on the following points
|
Thank you. I will try some enhancements to the algorithm, improve test coverage and refactor the code. Then I will push it to here for a detailed review. |
Thanks @obromios for all the work on compiling this PR! I haven't taken a deep dive into it yet but I think this looks very promising. I think the entire change is in a good shape and there's only a few changes needed to merge this PR (maybe just removing I'm going to take a week off for my open source projects in the week of April 7th and I should be able to take a look at this around that timeframe. Thank you again for such a wonderful proposal and even implementing it! |
…_backup into tree_spell_checker
I agree with the two changes you have suggested. I will also be adding more extensive tests and there are some bugs I need to track down. I will push a new version for you to review by April 7. |
Add test for too tiny input to HumanTypo Add rspec directory Add test for spec directory Eliminate correct word errors Correct special cases Choose only option if only one option Add extra words to special_words Measure execution speed Surprised to see tree is more than 10 time faster than standard Add comments to temp.md Add augment: true option tree_spell_checker.new Add tests for this and leafless word Change base_name to find_leaves Move HumanTypo Refactor HumanTypo Add substitution to HumanTypo Add subtitution to HumanTypo Tidy explore execution time Move test files into tree_spell folder move TreeSpellChecker into DidYouMean module Remove .DS_Store Test different lambdas for accuracy Tidy up explore and did_you_mean.rb Refactor check_element Refactor parse Refactor tree_spell_checker Complete first cut of documentation Update documentation Remove pry Tidy code
I have re-factored the code, improved test coverage, done more extensive analysis of accuracy, and measured the execution time. The results appear to be promising, for a typical More information on the algorithm and execution times can be found in This performance tests are done with I was not sure how this project handles detailed api documentation, so I placed the api documentation for |
I just realised that the documentation files were not included in my last push. The reason for this is that the /doc directory is in .gitignore. I am not sure why this is, but rather than change that, I have moved the three documentation files (file type *..md) to the test/tree_spell directory, pending a decision on the correct place to store them. Here is a summary of the performance of the augmented
It can be see the overall error rate is reduced by a factor of 5, and the mean number of suggestions reduced by a factor of 4. The execution speeds for the above were
so the execution time is also reduced by a factor of almost 10. Please let me know if you have any questions. |
This PR is to develop a spell checker that exploits a dictionary that has a tree structure. Example applications could be rake tasks e.g. rake db:migrate, and rspec tests, eg. rspec spec/models/user_model_spec.rb.