12820000-Groups-Chinese-Korean-Parallel-Corpus-Data

Description

12,820,000 sets of parallel translation corpus between China and Korea, which are stored in txt files. It covers many fields including spoken language, traveling, news, and finance. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in the text data files as well as used in machine translation.

For more details, please refer to the link: https://www.nexdata.ai/datasets/nlu/1200?source=Github

Specifications

Storage format

TXT

Data content

Chinese-Korean Parallel Corpus Data

Data size

12.82 million pairs of Chinese-Korean Parallel Corpus Data. The Chinese sentences contain 25.7 characters on average.

Language

Chinese, Korean

Accuracy rate

90%

Application scenario

machine translation

Licensing Information

Commercial License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
zh-ko 样例展示.png		zh-ko 样例展示.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

12820000-Groups-Chinese-Korean-Parallel-Corpus-Data

Description

Specifications

Storage format

Data content

Data size

Language

Accuracy rate

Application scenario

Licensing Information

About

Releases

Packages

Nexdata-AI/12820000-Groups-Chinese-Korean-Parallel-Corpus-Data

Folders and files

Latest commit

History

Repository files navigation

12820000-Groups-Chinese-Korean-Parallel-Corpus-Data

Description

Specifications

Storage format

Data content

Data size

Language

Accuracy rate

Application scenario

Licensing Information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages