Unexpected error thrown on tokenize

Description:
`edu.stanford.nlp.pipeline.StanfordCoreNLP` throws an error if you try to tokenize a string with all possible characters (`"... a b c d ..."`) divided by space. Probably it's also worth to mention that string without space between characters (`"...abcd..."`) is tokenized successfully.

Prerequisites:
- java `openjdk 17.0.2 2022-01-18`
- scala `2.13.8`
- lib `ivy"edu.stanford.nlp:stanford-corenlp:4.5.0"`

Minimal example:
```
import edu.stanford.nlp.pipeline.StanfordCoreNLP
import java.util.Properties
val pipeline = {
    val props = new Properties()
    props.setProperty("annotators", "tokenize")
    new StanfordCoreNLP(props)
}
val text = (Char.MinValue to Char.MaxValue).mkString(" ")
pipeline.processToCoreDocument(text)
```

Error:
```
java.lang.Error: Error: could not match input
  at edu.stanford.nlp.process.PTBLexer.zzScanError(PTBLexer.java:61605)
  at edu.stanford.nlp.process.PTBLexer.next(PTBLexer.java:63479)
  at edu.stanford.nlp.process.PTBTokenizer.getNext(PTBTokenizer.java:301)
  at edu.stanford.nlp.process.PTBTokenizer.getNext(PTBTokenizer.java:185)
  at edu.stanford.nlp.process.AbstractTokenizer.hasNext(AbstractTokenizer.java:69)
  at edu.stanford.nlp.process.AbstractTokenizer.tokenize(AbstractTokenizer.java:111)
  at edu.stanford.nlp.pipeline.TokenizerAnnotator.annotate(TokenizerAnnotator.java:420)
  at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76)
  at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:744)
  at edu.stanford.nlp.pipeline.StanfordCoreNLP.process(StanfordCoreNLP.java:793)
  ...
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected error thrown on tokenize #1298

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected error thrown on tokenize #1298

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions