Skip to content

sharner/sdss-mule

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Churn Prediction API and Model

This repo is an end-to-end example that shows how to combine MuleSoft and Tensorflow to create an ML Pipeline and prediction API. It is based on a presentation I did at the 2018 Symposium on Data Science and Statistics. See the Presentation, which also contains a complete video and demo.

Data and Complete Example in R

The churn prediction model is an adaptation of the example on the RStudio blog post Deep Learning With Keras To Predict Customer Churn. The data used for churn in this example comes from IBM Watson, described on the Watson blog Using Customer Behavior Data to Improve Customer Retention.

Install the prerequisites

To run the Mule Applications, you need first need to install some prerequisites. Given the nature of this demo, there are a lot of moving parts.

  1. Clone the sdss-mule repo.
  2. Download and install Anypoint Studio 7.1.
  3. Download and install Java 1.8 or newer and Scala SBT
  4. Download and install R and optionally RStudio.
  5. Download and install Postgres.
  6. Set up a Salesforce Development Account.
  7. Create a new Bucket on AWS, and install the AWS CLI.
  8. Set up a trial Anypoint Platform account

Loading the data

The main idea of this demonstration is that Mule can accelerate the data science by making it easy to pull data from different systems. To demonstrate this, we will first split the data into three groups for customer data, billing data, and transaction data, which will be imported into Salesforce, Postgres, and S3, respectively. The demonstration will then show how to fetch data from these systems using Mule as part of an ML pipeline.

First, run the R function to prepare the data and install R packages. Set INSTALL_DIR to be the base directory of this repo. Run the following in the terminal:

BASE_DIR="$HOME/sdss-mule"
Rscript -e "source('$BASE_DIR/model/prepare_training_data.R')"

Second, upload the customer_fields.csv to Salesforce. Add the new fields to the Account object, which you can find under Setup > Object Manager.

Name Type
customerID Text(10) (External ID)
Dependents Checkbox
gender Picklist with Male, Female
SeniorCitizen Checkbox
Partner Checkbox
tenure Number(4, 0)

Then, using the Import Data Wizard, which you can find under Setup > Data. If you run into trouble, you can use Setup > Data > Mass Delete Records to remove accounts and try again.

Third, import billing_fields.csv into Postgres. For convenience, we've created a SQL script do to the heavy work. Run the following in the terminal to import the data into postgres. You many need to adapt the command depending on where your postgres DB is running. If postgres isn't local, you'll need to copy billing_fields.csv to a location on the server and edit the SQL script accordingly.

psql -f ./pipeline/dbschema.sql

Last, copy transactions_fields.csv to your S3 bucket using your AWS CLI. Follow the instruction for configuring the CLI. Then run the command in the terminal, modifying it for your S3 bucket name.

aws s3 cp transactions_fields.csv s3://rasamule/

About

An example building a churn prediction API with Mule and Keras.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published