Ranking documents of a query using BM25 Score in Document Ranking Phase and Rocchio Algorithm in Query Expansion Phase.
-
Create a folder name
data
and put query txt and doc txt in./data
folder -
Run
EE448.ipynb
to visual output -
The output ranked documents is in
./data/bm25_score.txt
Python >= 3.0
You can get dataset here or use you own data.
-
./data/query.txt
:query_id \t query_text
-
./data/doc.txt
:document_id \t document_text
-
Set expansion words in
util.py/findNewQuery/loopRange
to different value. If the documents is short, set loopRange to a smaller value. -
Set
k2
inscore.py/bm25
to larger value. -
Set
GAMMA
to 0.15 or 0 to enable positive feedback and negative feedback -
You may try different Score Function like TF-IDF to rank documents in
score.py