Similarity and Distance Measures

We can measure similarity between users and items

Assuming a n * m matrix in 2 d space, let's suppose rows to be users (U1, U2, ..Un) and columns to be items (I1, I2, .., Im).

This means a row shows the set of items (Iu) purchased by a user u and a column shows the set of users (Ui) purchasing an item i.

Euclidean Distance

D(User1, User2) = sqrt((difference of values between the rows)^2)

smaller distance = users have more similar purchase behaviors

larger distance = users have less similar purchase behaviors

Jaccard Similarity

J(A,B) = |A intersection B| / |A union B|

J(A,B) = # of common elements / # of unique elements in either set

maximum of 1 if two users purchased exactly the same set of items

minimum of 0 if two users purchased completely disjoint set of items

row = user-based recommendations

col = item-based recommendations

Cosine Similarity

cos(theta) = 1 (when theta = 0) means rated by same users and they all agree

cos(theta) = 0 (when theta = 90) means rated by different sets of users

cos(theta) = -1 (when theta = 180) means rated by same users but they completely disagree

Pearson Distance

r = covariance(A,B) / (std_dev(A) * std_dev(B))

r = 1: perfect positive correlation

r = -1: perfect negative correslation

r = 0: no correlation

measures how well two variables move together

adjusts for mean and variance differences between users

more robust to rating scale differences than cosine similarity

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
AgenticCommerce		AgenticCommerce
BankNotes		BankNotes
BinaryClassificationIMDB		BinaryClassificationIMDB
ComputerVision		ComputerVision
MathematicsOfNeuralNetworksAndDeepLearning		MathematicsOfNeuralNetworksAndDeepLearning
NaturalLanguageProcessing		NaturalLanguageProcessing
Nim		Nim
Optimize		Optimize
Practice		Practice
PredictingHousePrices		PredictingHousePrices
Python		Python
ReutersMulticlassClassification		ReutersMulticlassClassification
Shopping		Shopping
TensorFlow		TensorFlow
TicTacToe		TicTacToe
README.md		README.md
SpaceShipTitanicBest.py		SpaceShipTitanicBest.py
binaryClassificationLogisitcRegression.py		binaryClassificationLogisitcRegression.py
linearRegression.py		linearRegression.py
linearRegressionFromScratch.py		linearRegressionFromScratch.py
linearRegressionUsingGradientDescent		linearRegressionUsingGradientDescent
logisticRegressionFromScratch.py		logisticRegressionFromScratch.py
mutliclassLogisticRegression.py		mutliclassLogisticRegression.py
recommenderJaccard.py		recommenderJaccard.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Similarity and Distance Measures

Euclidean Distance

Jaccard Similarity

Cosine Similarity

Pearson Distance

About

Uh oh!

Releases

Packages

Uh oh!

Languages

koushalsmodi/MachineLearningProjects

Folders and files

Latest commit

History

Repository files navigation

Similarity and Distance Measures

Euclidean Distance

Jaccard Similarity

Cosine Similarity

Pearson Distance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages