Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-release improvement #2

Open
saaj opened this issue Jun 9, 2014 · 1 comment
Open

Pre-release improvement #2

saaj opened this issue Jun 9, 2014 · 1 comment

Comments

@saaj
Copy link

saaj commented Jun 9, 2014

There're some changes that I think should be addressed before package Cheese-shop release.

Result ranking

There's no full-text result set ranking function out-of-the-box in SQLite. I think it makes sense to extent the scope of the package to address ranking as it is absolutely a topic of both "sqlite" and "fts".

All code is already out there. There's the article, even though it's about MIT-licensed package, peewee, the code can be easily extracted. Here's a gist with module and test case for it.

Because BM25 is a general language-independent ranking function its presence in the package makes it more complete.

Minimum documentation

README should be written to overview and cover basics. I can assist with it.

Also recipes for integration with tokenizers for major domains (CJK, Cyrillic, etc) is a good idea.

Minor

Underscore is undesired in a Python module name. I suggest to rename sqlite_tokenizer.py. "sqlite" part is the obvious context. tokenizer.py is better but not good anyway as it's not informative as the module doesn't provide real tokenizer per se, rather than a binding to register it. binding.py may be a better name, though you can try to coin a better one.

Make user symbols available from __init__.py so import sqlitefts is sufficient.

setup.py. url points to other package. "Operating System :: POSIX :: Linux" seems redundant with "Operating System :: OS Independent".

@hideaki-t
Copy link
Owner

Thanks,

Rnaking: I'll merge your implementation at gist. I'll add some test cases for CJK and other scoring functions
Document: yes, I know I need to write it. I'll finish it.
Minor: You're right. sqlitefts.sqlite_tokenizer is redundant. Let me think about it.
the URL was copied from another package, I totally forgot to change it...

hideaki-t added a commit that referenced this issue Jun 15, 2014
added minimum document
updated packaging stuff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants