Skip to content

Tutorial Build Your First ML Model

Zhang Pengshan (David) edited this page Nov 29, 2016 · 42 revisions

Pipeline in Shifu

TODO (Add Shifu Pipeline Image)

How to Install Shifu

  • Get latest Shifu build from shifu download page.

    Two versions are supported, one is shifu--cdh-20.tar.gz which is for Hadoop version 1. The other one is shifu--hdp-yarn.tar.gz which is for Hadoop version 2 which is YARN platform and tested well from Hadoop 2.2.x to Hadoop 2.7.x.

  • Or build new package by [source code] (https://github.com/ShifuML/shifu) (Use 'mvn clean install' after you download source code)

  • Unzip Shifu package and then configure env parameters

    SHIFU_HOME=<folder you unzip Shifu package>

    PATH=${SHIFU_HOME}/bin:$PATH

  • Validate Your Installation

    shifu version

    Shifu version and build messages will be displayed in console

How to Run Shifu Pipeline

  • shifu new <dataset name>

    This command will create a new dataset folder for training, in the new folder, You will find some auto-created files:

    1. ModelConfig.json
    2. columns/categorical.column.names
    3. columns/Eval1score.meta.column.names
    4. columns/forceremove.column.names
    5. columns/forceselect.column.names
    6. columns/meta.column.names
  • cd <dataset name>;shifu init

  • shifu stats

  • shifu norm

  • shifu varsel

  • shifu train

  • shifu eval

Clone this wiki locally