Skip to content

Commit

Permalink
Revert previous commit, winnow is correct as "narrow down"
Browse files Browse the repository at this point in the history
  • Loading branch information
insop committed Jul 25, 2021
1 parent 1be5992 commit 398e88c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion vsm_01_distributional.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@
"1. Scan through your corpus building a dictionary $d$ mapping word-pairs to co-occurrence values. Every time a pair of words $w$ and $w'$ occurs in the same context (as you defined it in 1), increment $d[(w, w')]$ by whatever value is determined by your weighting scheme. You'd increment by $1$ with the weighting scheme that simply counts co-occurrences.\n",
"\n",
"1. Using the count dictionary $d$ that you collected in 3, establish your full vocabulary $V$, an ordered list of words types. \n",
" 1. For large collections of documents, $|V|$ will typically be huge. You will probably want to window the vocabulary at this point. \n",
" 1. For large collections of documents, $|V|$ will typically be huge. You will probably want to winnow (narrow down) the vocabulary at this point. \n",
" 1. You might do this by filtering to a specific subset, or just imposing a minimum count threshold. \n",
" 1. You might impose a minimum count threshold even if $|V|$ is small — for words with very low counts, you simply don't have enough evidence to support good representations.\n",
" 1. For words outside the vocabulary you choose, you could ignore them entirely or accumulate all their values into a designated _UNK_ vector.\n",
Expand Down

0 comments on commit 398e88c

Please sign in to comment.