1980000-Groups-Chinese-Polish-Parallel-Corpus-Data

Description

1,980,000 sets of Chinese and Polish language parallel translation corpus, data storage format is txt document. Data cleaning, desensitization, and quality inspection have been carried out, which can be used as a basic corpus for text data analysis and in fields such as machine translation.

For more details, please refer to the link:https://www.nexdata.ai/datasets/nlu/1337?source=Github

Storage format

TXT

Data content

Chinese-Polish Parallel Corpus Data, content has been preliminarily categorized, covering the fields of technology, healthcare, tourism, spoken, news and military.

Data size

1.99 million pairs of Chinese-Polish Parallel Corpus Data.

Language

Chinese, Polish

Application scenario

machine translation

Licensing Information

Commercial License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
zh_pl_demo.png		zh_pl_demo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1980000-Groups-Chinese-Polish-Parallel-Corpus-Data

Description

Storage format

Data content

Data size

Language

Application scenario

Licensing Information

About

Releases

Packages

Nexdata-AI/1980000-Groups-Chinese-Polish-Parallel-Corpus-Data

Folders and files

Latest commit

History

Repository files navigation

1980000-Groups-Chinese-Polish-Parallel-Corpus-Data

Description

Storage format

Data content

Data size

Language

Application scenario

Licensing Information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages