Description
When I run the following python code:
import stanza
from stanza.server import CoreNLPClient
text = "中国是一个伟大的国家。"
print(text)
with CoreNLPClient(
properties='chinese',
classpath=r'F:\StanfordCoreNLP\stanford-corenlp-4.2.2*',
strict=False,
start_server=stanza.server.StartServer.TRY_START ,
annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse'],
timeout=30000,
memory='16G') as client:
pattern = 'NP'
matches = client.tregex(text, pattern)
# You can access matches similarly
print(matches['sentences'][0]['0']['match'])
I got:
中国是一个伟大的国家。
2021-12-08 19:35:10 INFO: Using CoreNLP default properties for: chinese. Make sure to have chinese models jar (available for download here: https://stanfordnlp.github.io/CoreNLP/) in CLASSPATH
2021-12-08 19:35:10 INFO: Connecting to existing CoreNLP server at localhost:9000
2021-12-08 19:35:10 INFO: Connecting to existing CoreNLP server at localhost:9000
(NP (NNP �й���һ��ΰ��Ĺ���) (SYM ��))
Any idea about the garbage characters?
Process finished with exit code 0