Currently, Pytorch on angel supports a series of recommendation algorithms.
In detail, the following methods are currently implemented:
- FM from Steffen Rendle : Factorization Machines
- DeepFM from Huifeng Guo et al: DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
- AttentionFM from Jun Xiao et al: Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
- DCN from Ruoxi Wang et al: Deep & Cross Network for Ad Click Predictions
- DeepAndWide from Heng-Tze Cheng et al: Wide & Deep Learning for Recommender Systems
- PNN from Yanru Qu et al: Product-based Neural Networks for User Response Prediction
- XDeepFM from Jianxun Lian et al: xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
We use DeepFM as an example to illustrate the details process of running an algorithm. The methods are similar for other algorithms.
-
** Generate pytorch script model** First, go to directory of python/recommendation and execute the following command:
python deepfm.py --input_dim 148 --n_fields 13 --embedding_dim 10 --fc_dims 10 5 1
Some explanations for the parameters.
- input_dim: the feature dimension for the data
- n_fields: number of fields for data
- embedding_dim: dimension for embedding layer
- fc_dims: the dimensions for fc layers in deepfm. "10 5 1" indicates a two-layers mlp composed with one 10x5 layer and one 5x1 layer.
This python script will generate a TorchScript model with the structure of dataflow graph for deepfm. This file is named
deepfm.pt
. -
** Preparing the input data** The input data of DeepFM should be libsvm or libffm format. Each line of the input data represents one data sample.
label feature1:value1 feature2:value2
In Pytorch on angel, multi-hot field is allowed, which means some field can be appeared multi-times in one data example.
label field1:feature1:value1 field2:feature2:value2
-
** Training model** After obtaining the model file (deepfm.pt) and the input data, we can submit a task through Spark on Angel to train the model. The command is:
source ./spark-on-angel-env.sh $SPARK_HOME/bin/spark-submit \ --master yarn-cluster\ --conf spark.ps.instances=5 \ --conf spark.ps.cores=1 \ --conf spark.ps.jars=$SONA_ANGEL_JARS \ --conf spark.ps.memory=5g \ --conf spark.ps.log.level=INFO \ --conf spark.driver.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:.:./torch/angel_libtorch \ --conf spark.executor.extraJavaOptions=-Djava.library.path=$JAVA_LIBRARY_PATH:.:./torch/angel_libtorch \ --conf spark.executor.extraLibraryPath=./torch/angel_libtorch \ --conf spark.driver.extraLibraryPath=./torch/angel_libtorch \ --conf spark.executorEnv.OMP_NUM_THREADS=2 \ --conf spark.executorEnv.MKL_NUM_THREADS=2 \ --queue $queue \ --name "deepfm on angel" \ --jars $SONA_SPARK_JARS \ --archives angel_libtorch.zip#torch\ #path to c++ library files --files deepfm.pt \ #path to pytorch script model --driver-memory 5g \ --num-executors 5 \ --executor-cores 1 \ --executor-memory 5g \ --class com.tencent.angel.pytorch.examples.supervised.RecommendationExample \ ./pytorch-on-angel-*.jar \ # jar from Compiling java submodule trainInput:$input batchSize:128 torchModelPath:deepfm.pt \ stepSize:0.001 numEpoch:10 testRatio:0.1 \ angelModelOutputPath:$output \
Description for the parameters:
- trainInput: the input path (hdfs) for training data
- batchSize: batch size for each optimizing step
- torchModelPath: the name of the generated torch model
- stepSize: learning rate
- numEpoch: how many epoches for the training process
- testRatio: how many training examples are used for testing
- angelModelOutputPath: the output path (hdfs) for the training model