Skip to content

ETL to train a model using Apache Spark

AngeloM15/mlSpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlSpark

Apache Spark application as end to end use case from data acquisition, transformation, model training and deployment.

Procedure

  1. setting.sh: Pull a Dataset of Accelerometer data from here.
  2. input.py: Load dataset and store it as a dataframe.
  3. transfrom.py: Transform the data into parquet files.
  4. train.py: Train and get the model file using pyspark ml.
  5. deploy.py: Deploy model into watson IBM Cloud.

Errors

Exception: Java gateway process exited before sending its port number

  1. Make sure you have JAVA8 (macOS)
brew tap adoptopenjdk/openjdk
brew install --cask adoptopenjdk8
  1. Find your JAVA8's home directory then add those two lines.
import os
os.environ['JAVA_HOME'] = "/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home"

About

ETL to train a model using Apache Spark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published