This repo is an end-to-end example that shows how to combine MuleSoft and Tensorflow to create an ML Pipeline and prediction API. It is based on a presentation I did at the 2018 Symposium on Data Science and Statistics. See the Presentation, which also contains a complete video and demo.
The churn prediction model is an adaptation of the example on the RStudio blog post Deep Learning With Keras To Predict Customer Churn. The data used for churn in this example comes from IBM Watson, described on the Watson blog Using Customer Behavior Data to Improve Customer Retention.
To run the Mule Applications, you need first need to install some prerequisites. Given the nature of this demo, there are a lot of moving parts.
- Clone the sdss-mule repo.
- Download and install Anypoint Studio 7.1.
- Download and install Java 1.8 or newer and Scala SBT
- Download and install R and optionally RStudio.
- Download and install Postgres.
- Set up a Salesforce Development Account.
- Create a new Bucket on AWS, and install the AWS CLI.
- Set up a trial Anypoint Platform account
The main idea of this demonstration is that Mule can accelerate the data science by making it easy to pull data from different systems. To demonstrate this, we will first split the data into three groups for customer data
, billing data
, and transaction data
, which will be imported into Salesforce
, Postgres
, and S3
, respectively. The demonstration will then show how to fetch data from these systems using Mule as part of an ML pipeline.
First, run the R function to prepare the data and install R packages. Set INSTALL_DIR
to be the base directory of this repo. Run the following in the terminal:
BASE_DIR="$HOME/sdss-mule"
Rscript -e "source('$BASE_DIR/model/prepare_training_data.R')"
Second, upload the customer_fields.csv
to Salesforce. Add the new fields to the Account object, which you can find under Setup > Object Manager
.
Name | Type |
---|---|
customerID | Text(10) (External ID) |
Dependents | Checkbox |
gender | Picklist with Male, Female |
SeniorCitizen | Checkbox |
Partner | Checkbox |
tenure | Number(4, 0) |
Then, using the Import Data Wizard, which you can find under Setup > Data
. If you run into trouble, you can use Setup > Data > Mass Delete Records
to remove accounts and try again.
Third, import billing_fields.csv
into Postgres. For convenience, we've created a SQL script do to the heavy work. Run the following in the terminal to import the data into postgres. You many need to adapt the command depending on where your postgres DB is running. If postgres isn't local, you'll need to copy billing_fields.csv
to a location on the server and edit the SQL script accordingly.
psql -f ./pipeline/dbschema.sql
Last, copy transactions_fields.csv
to your S3 bucket using your AWS CLI. Follow the instruction for configuring the CLI. Then run the command in the terminal, modifying it for your S3 bucket name.
aws s3 cp transactions_fields.csv s3://rasamule/