Closed
Description
Description of Problem:
Currently, some of our tokenizers support the option case_sensitive
. If the user decides to set this option to False
all featurizer will use the lowercased tokens. This might not be ideal. For example, some of the features in the LexicalSyntacticFeaturizer
do not work if all tokens are lowercased.
It might be better to move the option case_sensitive
to the featurizers itself. Each featurizer can be configured separately.
Overview of the Solution:
Remove the option case_sensitive
from all tokenizers.
Add the option case_sensitive
to featurizers for which it makes sense.
Activity