Description
I divided the large English corpus into several subsets and ran multiple CorenLp commands simultaneously, but the following error always occurs after a period of time:
"""
Exception in thread "main" java.lang.RuntimeException: Error making document
at edu.stanford.nlp.coref.CorefSystem.annotate(CorefSystem.java:55)
at edu.stanford.nlp.pipeline.CorefAnnotator.annotate(CorefAnnotator.java:160)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:641)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:651)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1249)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1083)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1366)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1418)
Caused by: java.lang.IllegalArgumentException
at edu.stanford.nlp.semgraph.SemanticGraph.parentPairs(SemanticGraph.java:730)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.advance(GraphRelation.java:325)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.initialize(GraphRelation.java:1103)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.(GraphRelation.java:1084)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.(GraphRelation.java:310)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT.searchNodeIterator(GraphRelation.java:310)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChildIter(NodePattern.java:337)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.(NodePattern.java:332)
at edu.stanford.nlp.semgraph.semgrex.NodePattern.matcher(NodePattern.java:293)
at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.(CoordinationPattern.java:146)
at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern.matcher(CoordinationPattern.java:120)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChild(NodePattern.java:356)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.goToNextNodeMatch(NodePattern.java:455)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matches(NodePattern.java:572)
at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.find(SemgrexMatcher.java:193)
at edu.stanford.nlp.coref.data.Mention.findDependentVerb(Mention.java:1099)
at edu.stanford.nlp.coref.data.Mention.setDiscourse(Mention.java:318)
at edu.stanford.nlp.coref.data.Mention.process(Mention.java:235)
at edu.stanford.nlp.coref.data.Mention.process(Mention.java:241)
at edu.stanford.nlp.coref.data.DocumentPreprocessor.fillMentionInfo(DocumentPreprocessor.java:341)
at edu.stanford.nlp.coref.data.DocumentPreprocessor.initializeMentions(DocumentPreprocessor.java:169)
at edu.stanford.nlp.coref.data.DocumentPreprocessor.preprocess(DocumentPreprocessor.java:62)
at edu.stanford.nlp.coref.data.DocumentMaker.makeDocument(DocumentMaker.java:92)
at edu.stanford.nlp.coref.data.DocumentMaker.makeDocument(DocumentMaker.java:64)
at edu.stanford.nlp.coref.CorefSystem.annotate(CorefSystem.java:53)
... 8 more
"""
Is this due to memory constraints?
My parameter setting is:
"java -mx64g -cp "$DATA/corenlp/stanford-corenlp-4.1.0/" edu.stanford.nlp.pipeline.StanfordCoreNLP $"
and my command is:
sh ./corenlp.sh -fileList
-outputDirectory $DATA/output -outputFormat json
-annotators tokenize,ssplit,pos,lemma,ner,depparse,parse,coref
Besides, What should I set the -mx parameter to?