Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R model deployment #195

Merged
merged 44 commits into from
Jun 23, 2017
Merged

R model deployment #195

merged 44 commits into from
Jun 23, 2017

Conversation

dubeyabhi07
Copy link
Contributor

@dubeyabhi07 dubeyabhi07 commented Jun 2, 2017

This pull request will integrate R model deployment with clipper.

dcrankshaw and others added 11 commits October 27, 2016 10:30
The container halts with following exception:
"Exception in thread "main" java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration."
Given maximum memory as 512m to solve the problem.
This dockerfile creates image with   R and python runtime supporting RPy2
to integrate start_R_model()
this file helps in scoring deployed R model
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@dubeyabhi07
Copy link
Contributor Author

Hi Dan
Will you be able to review it in next couple of days?

@dcrankshaw
Copy link
Contributor

Yeah I'll be able to review it today. Sorry about the delay, we were busy getting some stuff ready for Spark Summit but that's all done now.

Copy link
Contributor

@dcrankshaw dcrankshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start. I wasn't able to run the code because you're using an old version of pandas and some of the Python package names moved around. Once you update the code and write an integration test I'll review again.

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
from pandas import *
import pandas.rpy.common as com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module no longer exists (see the warning at the top of this page https://pandas.pydata.org/pandas-docs/stable/r_interface.html).

from numpy import *
import scipy as sp
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you filtering this warning?

@@ -58,20 +58,32 @@
EXTERNALLY_MANAGED_MODEL = "EXTERNAL"


from numpy import *
import scipy as sp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to import numpy and scipy here? If you do, don't change the numpy import to import numpy as np instead of importing *.

from rpy2.robjects.packages import importr
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
stats = importr('stats')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import the R specific imports at the beginning of the call to deploy_R_model. See the Spark method as an example.

class ClipperManagerException(Exception):
pass


class Clipper:
"""
Connection to a Clipper instance for administrative purposes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't delete all of these extra lines.


def predict_strings(self, inputs):
#print(inputs)
TESTDATA=StringIO(inputs[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What input schema does this method expect?

The name to assign this model.
version : int
The version to assign this model.
model_data : str or BaseEstimator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you expect the type of model_data to be?

model_path = os.environ["CLIPPER_MODEL_PATH"]

rds_names=[
l for l in os.listdir(model_path) if os.path.splitext(l)[-1] == ".rds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line doing?

@@ -0,0 +1,67 @@
FROM clipper/py-rpc:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line to the bin/build_docker_images.sh script to build this container

@@ -0,0 +1,240 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you turn this tutorial into an integration test? Check out the PySpark integration test for an example. Also, add a README.md in the containers/R directory that discusses how to deploy an R model, any restrictions on the model inputs or outputs, and the dependencies needed (both Python dependencies like rpy2 and R dependencies.

@dcrankshaw
Copy link
Contributor

jenkins test this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/464/
Test FAILed.

@dubeyabhi07
Copy link
Contributor Author

dubeyabhi07 commented Jun 22, 2017

It is showing PEP8 format violations for the code that I haven't even changed.

m-C02S6BY1G8WN:clipper_admin a0d00l9$ pep8 --first clipper_manager.py
clipper_manager.py:83:80: E501 line too long (93 > 79 characters)
clipper_manager.py:98:87: W291 trailing whitespace
clipper_manager.py:1153:39: E711 comparison to None should be 'if cond is None:'

Am I going in right direction for fixing error? If yes, then should I fix the format violations for only that part of code which is changed by me?

@Corey-Zumar
Copy link
Contributor

Corey-Zumar commented Jun 22, 2017

@dubeyabhi07 Your concerns regarding R data frames of different sizes resulting in variable latencies are correct. I've essentially reverted deploy_R_models.py to its previous behavior. The test now splits each large data frame into smaller data frames, each consisting of a single row. Each smaller frame is then csv-encoded. This way, the Rpy2 container still expects csv-encoded dataframes as inputs. Please let me know what you think.

@dubeyabhi07
Copy link
Contributor Author

This is all right with me @Corey-Zumar . Can you please go through my last comment regarding the failure of test.
@dcrankshaw

@Corey-Zumar
Copy link
Contributor

jenkins test this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/468/
Test FAILed.

@Corey-Zumar
Copy link
Contributor

jenkins test this please

Copy link
Contributor

@dcrankshaw dcrankshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! It's almost ready to go. I just added a couple cleanup comments (fixing typos, cleaning up imports, fixing method arguments).

In addition to the requirements of running clipper,

1. R must be installed (version:latest , >=3.4)
2. Python version must be >=2.7.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will only support Python 2.

# Create an R model with an RPy2 reference
model_RPy2 = ro.r('model_R <- lm(formula,data=dataset)')
```
- A previously trained and saved model (in .rds format) can also be loaded as RPy2 obeject :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "object"

Once a data frame has been string encoded, we can pass it to the container via the `requests.post()` method and obtain batched predictions for each data frame row.

This process is illustrated in the `predict_R_model()` method of
<clipper-root>/integration-tests/deploy_R_containers.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an actual markdown link here. Here's a help post on relative links in markdown: https://help.github.com/articles/about-readmes/#relative-links-and-image-paths-in-readme-files

@@ -0,0 +1,86 @@
from __future__ import print_function
from sklearn.externals import joblib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need joblib?

import rpc
import os
import numpy as np
from pandas import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to import pandas as pd

"http://localhost:1337/%s/predict" % app_name,
headers=headers,
data=json.dumps({
'uid': 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the uid field here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shows error in json parsing if I dont give uid field. Why so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're likely using an old Docker container. Try pulling the clipper/query_frontend docker container from docker hub again.

def call_predictions(query_string):
default = 0
url = "http://localhost:1337/%s/predict" % app_name
req_json = json.dumps({'uid': 0, 'input': query_string})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete uid field here.

name,
version,
model_data,
container_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete the container_name argument.

version,
model_data,
container_name,
input_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete the input type argument (it's always strings).

import rpy2.robjects as ro
from rpy2.robjects.packages import importr
base = importr('base')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you removed the container_name and input_type arguments from the method, set the variables here:

container_name = "clipper/r_python_container"
input_type = "strings"

Also make sure to update the examples and tests in the rest of the PR.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/469/
Test PASSed.

Copy link
Contributor

@Corey-Zumar Corey-Zumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dcrankshaw
Copy link
Contributor

jenkins ok to test

Copy link
Contributor

@dcrankshaw dcrankshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there

container_name : str
The Docker container image to use to run this model container.
input_type : str
"strings" (from which model specific dataframes can be derived for carrying out predictions).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete container_name and input_type parameter descriptions


```py
Clipper.deploy_R_model(
"example_model",1,model_RPy2,"strings"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this example to match the new signature of deploy_R_model (remove the strings argument. Also, please add spaces between the arguments.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/470/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/471/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/472/
Test PASSed.

@dcrankshaw
Copy link
Contributor

LGTM

@dcrankshaw dcrankshaw merged commit 7811207 into ucbrise:develop Jun 23, 2017
dcrankshaw pushed a commit that referenced this pull request Jun 27, 2017
* Initial commit

* heap size for clipper/spark-scala-container.

The container halts with following exception:
"Exception in thread "main" java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration."
Given maximum memory as 512m to solve the problem.

* created dockerfile

This dockerfile creates image with   R and python runtime supporting RPy2

* update clipper manager

to integrate start_R_model()

* Update clipper_manager.py

* Add files via upload

* Create R_model_support.py

this file helps in scoring deployed R model

* tutorial added

* removed license

* Delete R_model_deployment_tutorial-checkpoint.ipynb

* added fresh tutorial

* Delete R_model_deployment_tutorial.ipynb

* Update clipper_manager.py

* made R directory

* added test and updated admin file

* xyz

* deleted tutorials

* added readme

* prefinal

* final

* solved space issues due to text editor

* deleted undesired .Rhistory and other files

* Create clipper_manager.py

* Update README

* Fix container prediction behavior, simplify csv encoding

* Format code

* Update r_python_container.py

* modified test

* Revert to row-by-row prediction style using dataframe splitting

* Remove unused line. format code

* Format code

* typo and method-args fix

* fix

* removed whitespaces

(cherry picked from commit 7811207)
feynmanliang added a commit to feynmanliang/clipper that referenced this pull request Jul 3, 2017
* develop:
  Wording fix (ucbrise#234)
  RPC container content fix (ucbrise#232)
  [CLIPPER-227] Fix EWMA behavior for meters (ucbrise#228)
  One-line app registration and model deployment (ucbrise#223)
  R model deployment (ucbrise#195)
  Allow model versions to be strings (ucbrise#197)
  Base64 decoding for JSON byte data (ucbrise#214)
  Restarting on containers is no longer the default behavior (ucbrise#213)
  fixed backslash escape issue for removing remote containers (ucbrise#210)
  removed pip install findspark from run_unittests.sh (ucbrise#211)
  Fix example code in README (ucbrise#205)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants