快速开始 - Getting Started • 内容 - Table of Contents • 关于 - About • 鸣谢 - Acknowledgment • FAQ •
Made by ximing Xing • 🌌 https://ximingxing.github.io/
Machine-Learning-in-Action是基于Peter Harrington的<<机器学习实战>>这本书, 将书中的机器学习算法和案例以scikit-learn的代码组织形式呈现.
除了算法机器学习算法之外,更重要的是算法的使用场景,这个代码仓库中也提供机器学习实战案例。包括:百万英文新闻文本分类实战等.
Machine-Learning-in-Action is based on Peter Harrington`s <> , The machine learning algorithms and cases in the book are presented in the form of scikit-learn code organization.
-
Using Pycharm with conda plugin IDE makes getting started easier.
-
check out from version control.
-
chose Git.
-
-
python setup.py --develop
├── LICENSE
├── README.md
├── data
│ └── 20news-bydate_py3.pkz
├── examples
│ └── 20newsgroup_in_action
├── mlic
│ ├── cluster
│ ├── linear_model
│ ├── metrics
│ ├── naive_bayes
│ ├── neighbors
│ ├── neural_network
│ ├── svm
│ ├── tree
│ └── utils
├── requestments.txt
├── setup.py
└── tests
├── Bayes
├── KNN
└── Linear
完整的数据挖掘过程 :
-
网络爬虫 Network Crawler
- 按分类爬取环球网英文本新闻(处理静态网站与需要js渲染的内容)
- Scrapy-Splash based Crawler crawls information from globaltimes.cn
- CNN Crawler
- BBC Crawler
-
文本分类实战
- DataLoader : 20newsgroup
- Data preprocessing
- Data Cleaning: regular expression
- Data Cleaning: stop words
- Normalization: lemmatization
- Extracting Features : Word Dict and TF-IDF
- Model: bayes and svm
- Evaluation
你无需担心example/
与tests/
中案例所使用的数据集,因为数据集都是自动下载的.
如果有问题也希望你指出.
- 代码组织参考scikit-learn中的组织结构