Data Science Toolbox

This repo contains snippets of code with a little bit of theory and explanation that can be handy for beginning data scientists. It was created while I was attending Galvanize Data Science immersive program in Seattle. Code is in Python, specifically in ipython notebooks as they are easy to view on GitHub.

Pull requests with updates are welcome!

Theory

Hypothesis testing - t-test, z-test, AB testing. hypothesis_testing.ipynb
Probability and statistics - combinatorics, conditional probability, statistical distributions. probability_and_statistics.ipynb
Gradient descent - gradient_descent.ipynb

Machine Learning

ML algorithms - linear regression, logistic regression, decision trees, random forests, gradient boosting, PCA, SVD, NMF and more. ml_algorithms.ipynb
Recommenders - different graphlab recommenders. recommenders.ipynb

NLP

Doc2Vec - document similarity search using gensim. doc2vec.ipynb
Text summarization - text summarization and keyword extraction using gensim. text_summarization.ipynb
NearPy - locality sensitive hashing (LHS) for approximated nearest neighbor search. nearpy.ipynb

Development

Pipeline - pipeline, feature union, grid search. pipeline.ipynb
Map Reduce - Hadoop, Spark. map_reduce.ipynb
Scraping and MongoDB - requests, BeautifulSoup, pymongo. scraping_mongo.ipynb
AWS Deployment - setting up EC2 instance with PostgreSQL on it, running Flask app. aws.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Science Toolbox

Theory

Machine Learning

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
img		img
.gitignore		.gitignore
README.md		README.md
aws.md		aws.md
doc2vec.ipynb		doc2vec.ipynb
gradient_descent.ipynb		gradient_descent.ipynb
hypothesis_testing.ipynb		hypothesis_testing.ipynb
map_reduce.ipynb		map_reduce.ipynb
ml_algorithms.ipynb		ml_algorithms.ipynb
nearpy.ipynb		nearpy.ipynb
pipeline.ipynb		pipeline.ipynb
probability_and_statistics.ipynb		probability_and_statistics.ipynb
recommenders.ipynb		recommenders.ipynb
scraping_mongo.ipynb		scraping_mongo.ipynb
text_summarization.ipynb		text_summarization.ipynb

luckamolkova/data_science

Folders and files

Latest commit

History

Repository files navigation

Data Science Toolbox

Theory

Machine Learning

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages