-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【problem】mac版本一次性上传大批量txt文件不响应 #41
Comments
您好!感谢反馈, 要自己写代码处理可以下载源码,直接调接口,绕开这个检查。NeoSCA 0.1.0+ 没再更新它的 PyPI,需要用 Git 从 GitHub 下载。
pip3 install git+https://github.com/tanloong/neosca
import stanza
from neosca import STANZA_MODEL_DIR
stanza.download("en", model_dir=str(STANZA_MODEL_DIR), resources_url="stanfordnlp")
import csv
import io
from neosca.ns_io import Ns_IO
from neosca.ns_sca.ns_sca import Ns_SCA
from neosca.ns_lca.ns_lca import Ns_LCA
sca_kwargs = {
# 所有可选指标:["W", "S", "VP", "C", "T", "DC", "CT", "CP", "CN", "MLS", "MLT", "MLC", "C/S", "VP/T", "C/T", "DC/C", "DC/T", "T/S", "CT/T", "CP/T", "CP/C", "CN/T", "CN/C"]
# 不传入此参数时会统计所有可选指标
"selected_measures": ["MLS", "MLC", "MLT", "C/S"],
# 缓存中间文件,可节省下次在相同文件上的运行时间,缓存路径是 neosca 安装路径的 ns_data/cache
"is_cache": True,
# 是否使用历史缓存,当设为 True 且缓存文件非空同时最后修改时间晚于对应输入文件时会使用缓存
"is_use_cache": True,
}
sca_analyzer = Ns_SCA(**sca_kwargs)
lca_kwargs = {
# 暂时没有 selected_measures 选项,会统计所有可选指标
"wordlist": "bnc", # 或 "anc"
"tagset": "ud", # 或 "ptb"
"is_cache": True,
"is_use_cache": True,
}
lca_analyzer = Ns_LCA(**lca_kwargs)
# get_verified_ifile_list 会获取指定文件夹及其嵌套子文件夹下所有 NeoSCA 支持类型的文件 (txt/docx/odt),该文件夹下属于这些类型的无关文件要移走不然也会被分析。
# 这个函数不会检查文件名冲突。
file_paths = Ns_IO.get_verified_ifile_list(["./files"])
sname_value_map = {}
lname_value_map = {}
with io.StringIO() as sca_output, io.StringIO() as lca_output:
for file_path in file_paths:
sca_counter = sca_analyzer.run_on_file_or_subfiles(file_path)
sname_value_map: dict[str, str] = sca_counter.get_all_values(precision=4)
sca_values = sname_value_map.values()
sca_writer = csv.writer(sca_output)
sca_writer.writerow(sca_values)
# 保存 matches,会清空 ./sca_matches 原有文件
sca_counter.dump_matches("./sca_matches")
lca_counter = lca_analyzer.run_on_file_or_subfiles(file_path)
lname_value_map: dict[str, str] = lca_counter.get_all_values(precision=4)
lca_values = lname_value_map.values()
lca_writer = csv.writer(lca_output)
lca_writer.writerow(lca_values)
# 保存 matches,同样会清空 ./lca_matches 原有文件
lca_counter.dump_matches("./lca_matches")
with open("./neosca_sca_results.csv", "w") as f:
sca_writer = csv.writer(f)
sca_writer.writerow(sname_value_map.keys()) # 列名
f.write(sca_output.getvalue())
with open("./neosca_lca_results.csv", "w") as f:
lca_writer = csv.writer(f)
lca_writer.writerow(lname_value_map.keys())
f.write(lca_output.getvalue()) 或在终端通过 NeoSCA 的命令行界面分析文件。 python3 -m neosca sca ./files
python3 -m neosca lca ./files
# 使用 --help 查看帮助
# python3 -m neosca --help
# python3 -m neosca sca --help
# python3 -m neosca lca --help |
您好,我已经成功安装最新的neosca库并成功运行,非常感谢您的帮助! |
…een added fix performance issue in #41
开发者您好,非常感谢您开发并持续更新neosca这一工具,便于英文词法和句法复杂度分析。但在使用这一工具的过程中我遇到了一些小问题:我下载了mac版本的neosca app,但是当我一次性上传多个txt文件时app会长期不响应,我怀疑是因为内存问题导致,因此想问一下能否提供之前的python包版本,便于我自己写代码处理?再次感谢您的辛苦工作!
The text was updated successfully, but these errors were encountered: