Skip to content

aosama/MachineLearningSamples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MachineLearningSamples

This repo hosts variety of examples based on Apache Spark MLIB.

Databricks Notebooks

Scala IDE Based Examples

A vanilla decision tree example.

How to get a stratified sample so the test and train datasets are sampled accross possible values.

How to index and encode categorical features.

How to handle multiple categorical and continuous features on a real-life data set. Uses the Census Income data set.

How to handle multiple categorical and continuous features on a real-life data set. Uses the Census Income data set.

Data Sets References

First line from adult.test file removed for loading into Spark.

Census Income data set citation: Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

About

MachineLearning examples using Spark MLIB and Databricks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •