Machine-Learning-Regression

MODELS

Linear Regression - Simple and Multiple
Regularization - Ridge (L2), Lasso (L1)
Nearest neighbors and kernel regression

ALGORITHMS

Gradient Descent
Coordinate Descent

GENERAL CONCEPTS

Loss function
Bias-Variance Trade off
Cross Validation
Sparsity
Overfitting
Model selection
Feature selection

Information	Modules
Simple Regression	Module 1
Multiple Regression	Module 2
Assesing Performance	Module 3
Ridge Regression	Module 4
Feature Selection & Lasso	Module 5
Nearest Neighbor & Kernel Regression	Module 6

Simple Regression

1 input and fit a line to data. (intercept and the slope coefficients).

Cost of the line

Residual sum of squares (RSS) - Sum of the square of difference between the original value and the predicted value.
Use RSS to asses different fits to the model.
Choose the best fit on the training data that minimizes over the "intercept" and "slope".

Gradient Descent

Iterative Algorithm that moves in the direction of the negative gradient.
for convex functions it converges to the optimum.

Multiple Regression

Allows to fit more complicated relationships between single input and output. Example - polynomial regression, seasonality, etc.
It also incorporates more inputs and features and using these various inputs to compute the prediction.
It is the sum of the weighted collection of features h of inputs xi + epsilon (error / noise term).

Cost -> RSS for multiple regression

RSS for the coefficients -> sum of the square of the difference between the output and the predicted value.
Predicted value = transpose of the feature matrix and coeffcients.

Gradient Descent

The gradient is used for the closed-form solution as well. Complexity of inverse: O(D^3) -> D - #features.
Gradient of the RSS.
Requires a step-size.

Assesing Performance

Variours measure to assess the efficieny of the model fit.

Measuring Loss

It is a measure of how good the fit is performing.
It is the cost of using estimated parameters w-hat at x when y is true.
Absolute error - symmetric error - Absolute difference between true and predicted values.
Squared error - symmetric error - Squared difference between the actual and predicted values.

3 Measures of errors

Training Error - Average over the loss measure pf the training dataset. Not a good predictive performance on the model.
Generalization / True Error - Measure of how well the error is being predicted for every possible observation available. It can't be computed.
Test Error - Examines the traing data fit on the test set. It is a noisy approximation to the generalization error.

Error xs. Model complexity

Training error - decreases with model complexity.
Generalization error - decreases and then increases with model complexity.
Test error - noisy generalization of the true error.

Overfit

If the training error is decrease below certain amount and the true error increases.
At this point the magnitude of the coefficients increases.

3 source of prediction error

Noise - inherent to the model, cannot be controlled.
Bias - Measure of how well the model fits the true prediction / relationship by averaging over all possible training data sets.
Variance - Measure of how a fitted function vary from the training data set to training set of all size and observations.

Bias-Variance tradeoff

Require low bias and low variance to have good predictive performance.
Model complexity increases -> bias decreases and variance increases.
Mean Square Error (MSE) = bias-variance tradeoff = bias^2 + variance.

Model selection and Assessment

Fit the model on the training data set.
Select between different models on the validation set.
Test the performance on the test data.

Ridge Regression

As model complexity increases, the models become overfit.
Symptom of overfitting -> magnitude of coefficients increases.
It trades of between the bias and the variance.
Ridge total cost = measure of fit(RSS on training data) + measure of the magnitude of the coefficients.
It is the L2 regularization parameter = Rss(w) + lambda * ||w||^2

Coefficient path

The magnitude of the coefficients decreases with increases in the tuning parameter "lambda".

Ridge closed-form solution -> complexity O(D^3);

Cross-Validation

In case of insuuffient data to form a separate validation set.
Then perform k-fold cross validation.
Here the training set is divided into blocks and each block is treated as the validation set.
- training block -> parameters or coefficients are extimated.
- validation block the error is computed.
The average error across all validation set is computed.

Feature Selection & Lasso

Various methods to search over models with different number of features.

All Subset - exhaustive approach, where feature combinations with least RSS is chosen.
Greedy Algorithm - forward selection - suboptimal solution but eventually provides the desired model set and is more efficient.

Lasso objective function - L1 regularized regression

It leads to sparse solutions.
L1 norm = RSS(w) + lambda ||w||

Coefficient path

Here the coefficient path becomes sparser with increasing lambda value. This provideds better feature solutions.

Coordinate Descent

Better model since it is difficult to find the derivate of an absolute value. Need to use sub-gradients, alternative is coordinate descent.
Iterate through the different dimensions of the objective or different features of the regression model.
The coefficients for lasso was setup based on "soft-thresholding" - provides sparse solutions.

Nearest Neighbor & Kernel Regression - Nonparametric fits

1-NN - simple procedure

Look for the most similar dataset observation and base the predictions on it.

Weighted k-NN

weigh the more similar observations more than those less similar in the list of k-NN.
Average across the rating to form the estimated prediction.

Kernel Regression

Weight all the points rather than just weighting NN.
The kernels have a bandwidth - lambda, outside which the observations are 0. Within the range/bandwidth also the observations can decay based on how far they are from the target point.
It leads to local constant fits.
Parametric fits -> global constant fits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine-Learning-Regression

MODELS

ALGORITHMS

GENERAL CONCEPTS

Simple Regression

Cost of the line

Gradient Descent

Multiple Regression

Cost -> RSS for multiple regression

Gradient Descent

Assesing Performance

Measuring Loss

3 Measures of errors

Error xs. Model complexity

Overfit

3 source of prediction error

Bias-Variance tradeoff

Model selection and Assessment

Ridge Regression

Coefficient path

Ridge closed-form solution -> complexity O(D^3);

Cross-Validation

Feature Selection & Lasso

Lasso objective function - L1 regularized regression

Coefficient path

Coordinate Descent

Nearest Neighbor & Kernel Regression - Nonparametric fits

1-NN - simple procedure

Weighted k-NN

Kernel Regression

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
kc_house_data.gl		kc_house_data.gl
Overfitting_Demo_Ridge_Lasso.ipynb		Overfitting_Demo_Ridge_Lasso.ipynb
Philadelphia_Crime_Rate_noNA.csv		Philadelphia_Crime_Rate_noNA.csv
PhillyCrime.ipynb		PhillyCrime.ipynb
README.md		README.md
Week 1 - Simple Linear Regression.pdf		Week 1 - Simple Linear Regression.pdf
Week 1 - Simple Regression Assignment.ipynb		Week 1 - Simple Regression Assignment.ipynb
Week 2 - Multiple Regression Assignment-1.ipynb		Week 2 - Multiple Regression Assignment-1.ipynb
Week 2 - Multiple Regression Assignment-2.ipynb		Week 2 - Multiple Regression Assignment-2.ipynb
Week 2 - Multiple Regression.pdf		Week 2 - Multiple Regression.pdf
Week 3 - Assessing Performance.pdf		Week 3 - Assessing Performance.pdf
Week 3 - Polynomial Regression Assignment.ipynb		Week 3 - Polynomial Regression Assignment.ipynb
Week 4 - Ridge Regression Assignment 1.ipynb		Week 4 - Ridge Regression Assignment 1.ipynb
Week 4 - Ridge Regression Assignment 2.ipynb		Week 4 - Ridge Regression Assignment 2.ipynb
Week 4 - Ridge Regression.pdf		Week 4 - Ridge Regression.pdf
Week 5 - Feature Selection & Lasso.pdf		Week 5 - Feature Selection & Lasso.pdf
Week 5 - lasso assignment 1.ipynb		Week 5 - lasso assignment 1.ipynb
Week 5 - lasso assignment 2.ipynb		Week 5 - lasso assignment 2.ipynb
Week 6 - Local Regression Assignment.ipynb		Week 6 - Local Regression Assignment.ipynb
Week 6 - Nearest Neighbors & Kernel Regression.pdf		Week 6 - Nearest Neighbors & Kernel Regression.pdf
numpy-tutorial.ipynb		numpy-tutorial.ipynb

Amitha353/Machine-Learning-Regression

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning-Regression

MODELS

ALGORITHMS

GENERAL CONCEPTS

Simple Regression

Cost of the line

Gradient Descent

Multiple Regression

Cost -> RSS for multiple regression

Gradient Descent

Assesing Performance

Measuring Loss

3 Measures of errors

Error xs. Model complexity

Overfit

3 source of prediction error

Bias-Variance tradeoff

Model selection and Assessment

Ridge Regression

Coefficient path

Ridge closed-form solution -> complexity O(D^3);

Cross-Validation

Feature Selection & Lasso

Lasso objective function - L1 regularized regression

Coefficient path

Coordinate Descent

Nearest Neighbor & Kernel Regression - Nonparametric fits

1-NN - simple procedure

Weighted k-NN

Kernel Regression

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages