Open
Description
The following snippet of code seems to correctly split on the hyphen in "year-end" in 3.9.2, but no longer in 4.4.0. Is this expected behavior?
public static void main(String[] args) {
String text = "year-end";
Properties props = new Properties();
props.setProperty("annotators", "tokenize,ssplit");
props.setProperty("tokenize.language", "en");
props.setProperty("tokenize.options", "splitHyphenated=true,invertible,ptb3Escaping=true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation ann = new Annotation(text);
pipeline.annotate(ann);
List<CoreLabel> tokens = ann.get(CoreAnnotations.TokensAnnotation.class);
System.out.println(tokens.stream().map(CoreLabel::originalText).collect(Collectors.toList()));
}
Old output: [year, -, end]
New output: [year-end]