Skip to content

A Machine Learning project: Matrix Factorization Distributed Stochastic Gradient Descent in pySpark

Notifications You must be signed in to change notification settings

williamqzc/Distributed-Matrix-Factorization

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Distributed-Matrix-Factorization

A Machine Learning project: Distributed Stochastic Gradient Descent method for Matrix Factorization in pySpark by Sudev Bohra

How to Run

  • Run
$ $(SPARK) dsgd_mf.py $(NUM_FACTOR) $(NUM_WORKER) $(NUM_ITER) $(BETA) $(LAMBDA) $(TRAINV) $(OUTPUTW) $(OUTPUTH)  

Example

$ spark-submit dsgd_mf.py 20 5 100 0.9 1.0 test.csv w.csv h.csv  

Input

Takes input of matrix V in sparse format. One row, col, value per line eg. the 2x2 identity matrix would look like

0,0,1
1,1,1

Output

Outputs file w.csv and h.csv in dense matrix format

About

A Machine Learning project: Matrix Factorization Distributed Stochastic Gradient Descent in pySpark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%