Skip to content

Garbage characters in printed result #894

Closed
@jiangweiatgithub

Description

@jiangweiatgithub

When I run the following python code:

import stanza
from stanza.server import CoreNLPClient
text = "中国是一个伟大的国家。"
print(text)
with CoreNLPClient(
properties='chinese',
classpath=r'F:\StanfordCoreNLP\stanford-corenlp-4.2.2*',
strict=False,
start_server=stanza.server.StartServer.TRY_START ,
annotators=['tokenize','ssplit','pos','lemma','ner', 'parse', 'depparse'],
timeout=30000,
memory='16G') as client:

pattern = 'NP'
matches = client.tregex(text, pattern)
# You can access matches similarly
print(matches['sentences'][0]['0']['match'])

I got:
中国是一个伟大的国家。
2021-12-08 19:35:10 INFO: Using CoreNLP default properties for: chinese. Make sure to have chinese models jar (available for download here: https://stanfordnlp.github.io/CoreNLP/) in CLASSPATH
2021-12-08 19:35:10 INFO: Connecting to existing CoreNLP server at localhost:9000
2021-12-08 19:35:10 INFO: Connecting to existing CoreNLP server at localhost:9000
(NP (NNP �й���һ��ΰ��Ĺ���) (SYM ��))

Any idea about the garbage characters?

Process finished with exit code 0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions