Skip to content

This package integrates many basic Chinese NLP functions, making Python-based Chinese word processing and information extraction simple and convenient.

License

Notifications You must be signed in to change notification settings

chenmingxiang110/SimpleChinese

Repository files navigation

SimpleChinese

!!! This project is DEPRECATED. See 2nd edition at: SimpleChinese2.

Documentation Status Updates

Chinese text processing, representation, and visualization.

This package integrates many basic Chinese NLP functions, making Python-based Chinese word processing and information extraction simple and convenient.

Installation

To install SimpleChinese, run this command in your terminal:

$ pip install simplechinese

This is the preferred method to install SimpleChinese, as it will always install the most recent stable release.

If you don't have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for SimpleChinese can be downloaded from the `Github repo`_.

You can either clone the public repository:

$ git clone git://github.com/chenmingxiang110/simplechinese

Or download the `tarball`_:

$ curl -OJL https://github.com/chenmingxiang110/simplechinese/tarball/master

Once you have a copy of the source, you can install it with:

$ python setup.py install

Features

  1. Read the data from a csv file.
df = pd.read_csv("test.csv")

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/raw.png

  1. Clean the data.
sc.clean(df)

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/clean.png

The clean function does the following:

fillna(): Fill the N/As in a pandas.DataFrame with an empty string.

toLower(): Transform alphabets to their lowercases.

remove_punctuations(): Remove all the punctuations in a string or a pandas.DataFrame.

remove_space(): Remove all the spaces in a string or a pandas.DataFrame.

  1. Extract words from the data
sc.extract_words(sc.clean(df))

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/extract_words.png

  1. Vectorization
sc.pca(sc.tfidf(sc.clean(df).iloc[:,0]))

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/vectorization.png

  1. Word cloud
sc.wordcloud(sc.clean(df).iloc[:,0], font_path="yahei.ttc")

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/wordcloud.png

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

This package integrates many basic Chinese NLP functions, making Python-based Chinese word processing and information extraction simple and convenient.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published