Skip to content

Commit

Permalink
Scrubber sample code now handles tokens correctly that also appear by…
Browse files Browse the repository at this point in the history
… their own.

When adding a token to the scrubber list like "two" that also exists as an individual term not just together with other tokens like "two sentences" the original code failed because the while loop consumed the whole span consisting of just one token. Fixed.
  • Loading branch information
0dB authored Aug 6, 2023
1 parent 7cc079b commit 9990a1e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion examples/sample.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -556,7 +556,7 @@
"@spacy.registry.misc(\"prefix_scrubber\")\n",
"def prefix_scrubber():\n",
"\tdef scrubber_func(span: Span) -> str:\n",
"\t\twhile span[0].text in (\"a\", \"the\", \"their\", \"every\", \"other\"):\n",
"\t\twhile len(span) > 1 and span[0].text in (\"a\", \"the\", \"their\", \"every\", \"other\"):\n",
"\t\t\tspan = span[1:]\n",
"\t\treturn span.text\n",
"\treturn scrubber_func\n",
Expand Down

0 comments on commit 9990a1e

Please sign in to comment.