Dr. Tirthajyoti Sarkar, Fremont, CA (Please feel free to add me on LinkedIn here)
- Python 3.5+
- NumPy (
$ pip install numpy
) - Pandas (
$ pip install pandas
) - Scikit-learn (
$ pip install scikit-learn
) - SciPy (
$ pip install scipy
) - Statsmodels (
$ pip install statsmodels
) - MatplotLib (
$ pip install matplotlib
) - Seaborn (
$ pip install seaborn
) - Sympy (
$ pip install sympy
)
You can start with this article that I wrote in Heartbeat magazine (on Medium platform):
Jupyter notebooks covering a wide range of functions and operations on the topics of NumPy, Pandans, Seaborn, matplotlib etc.
Tutorial-type notebooks covering regression, classification, clustering, dimensionality reduction, and some basic neural network algorithms
- Simple linear regression with t-statistic generation
-
Multiple ways to perform linear regression in Python and their speed comparison (check the article I wrote on freeCodeCamp)
-
Polynomial regression using scikit-learn pipeline feature (check the article I wrote on Towards Data Science)
-
Decision trees and Random Forest regression (showing how the Random Forest works as a robust/regularized meta-estimator rejecting overfitting)
-
Detailed visual analytics and goodness-of-fit diagnostic tests for a linear regression problem
- Logistic regression/classification
- k-nearest neighbor classification
- Decision trees and Random Forest Classification
- Support vector machine classification (check the article I wrote in Towards Data Science on SVM and sorting algorithm)
- Naive Bayes classification
- K-means clustering
- Affinity propagation (showing its time complexity and the effect of damping factor)
- Mean-shift technique (showing its time complexity and the effect of noise on cluster discovery)
- DBSCAN (showing how it can generically detect areas of high density irrespective of cluster shapes, which the k-means fails to do)
- Hierarchical clustering with Dendograms showing how to choose optimal number of clusters
- Principal component analysis
- Demo notebook to illustrate the superiority of deep neural network for complex nonlinear function approximation task
- Step-by-step building of 1-hidden-layer and 2-hidden-layer dense network using basic TensorFlow methods
-
How to use Sympy package to generate random datasets using symbolic mathematical expressions.
-
Here is my article on Medium on this topic: Random regression and classification problem generation with symbolic expression
-
Serving a linear regression model through a simple HTTP server interface. User needs to request predictions by executing a Python script. Uses
Flask
andGunicorn
. -
Serving a recurrent neural network (RNN) through a HTTP webpage, complete with a web form, where users can input parameters and click a button to generate text based on the pre-trained RNN model. Uses
Flask
,Jinja
,Keras
/TensorFlow
,WTForms
.