R model deployment #195

dubeyabhi07 · 2017-06-02T11:29:53Z

This pull request will integrate R model deployment with clipper.

The container halts with following exception: "Exception in thread "main" java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration." Given maximum memory as 512m to solve the problem.

This dockerfile creates image with R and python runtime supporting RPy2

to integrate start_R_model()

this file helps in scoring deployed R model

AmplabJenkins · 2017-06-02T11:30:04Z

Can one of the admins verify this patch?

dubeyabhi07 · 2017-06-07T06:38:39Z

Hi Dan
Will you be able to review it in next couple of days?

dcrankshaw · 2017-06-07T16:20:29Z

Yeah I'll be able to review it today. Sorry about the delay, we were busy getting some stuff ready for Spark Summit but that's all done now.

dcrankshaw

This is a good start. I wasn't able to run the code because you're using an old version of pandas and some of the Python package names moved around. Once you update the code and write an integration test I'll review again.

dcrankshaw · 2017-06-08T21:38:41Z

clipper_admin/clipper_manager.py

+import warnings
+warnings.filterwarnings("ignore", category=FutureWarning)
+from pandas import *
+import pandas.rpy.common as com


This module no longer exists (see the warning at the top of this page https://pandas.pydata.org/pandas-docs/stable/r_interface.html).

dcrankshaw · 2017-06-08T21:38:56Z

clipper_admin/clipper_manager.py

+from numpy import *
+import scipy as sp
+import warnings
+warnings.filterwarnings("ignore", category=FutureWarning)


Why are you filtering this warning?

dcrankshaw · 2017-06-08T21:40:27Z

clipper_admin/clipper_manager.py

@@ -58,20 +58,32 @@
 EXTERNALLY_MANAGED_MODEL = "EXTERNAL"


+from numpy import *
+import scipy as sp


Do you need to import numpy and scipy here? If you do, don't change the numpy import to import numpy as np instead of importing *.

dcrankshaw · 2017-06-08T21:42:30Z

clipper_admin/clipper_manager.py

+from rpy2.robjects.packages import importr
+import rpy2.robjects as ro
+from rpy2.robjects.packages import importr
+stats = importr('stats')


Import the R specific imports at the beginning of the call to deploy_R_model. See the Spark method as an example.

dcrankshaw · 2017-06-08T21:42:56Z

clipper_admin/clipper_manager.py

 class ClipperManagerException(Exception):
    pass


 class Clipper:
    """
    Connection to a Clipper instance for administrative purposes.
-


Please don't delete all of these extra lines.

dcrankshaw · 2017-06-08T21:53:16Z

containers/R_Python/r.py

+
+    def predict_strings(self, inputs):
+        #print(inputs)
+        TESTDATA=StringIO(inputs[0])


What input schema does this method expect?

dcrankshaw · 2017-06-08T21:53:41Z

clipper_admin/clipper_manager.py

+            The name to assign this model.
+        version : int
+            The version to assign this model.
+        model_data : str or BaseEstimator


What do you expect the type of model_data to be?

dcrankshaw · 2017-06-08T21:57:41Z

containers/R_Python/r.py

+    model_path = os.environ["CLIPPER_MODEL_PATH"]
+
+    rds_names=[
+           l for l in os.listdir(model_path) if os.path.splitext(l)[-1] == ".rds"


What is this line doing?

dcrankshaw · 2017-06-08T22:01:50Z

containers/R_Python/Dockerfile

@@ -0,0 +1,67 @@
+FROM clipper/py-rpc:latest


Add a line to the bin/build_docker_images.sh script to build this container

dcrankshaw · 2017-06-08T22:02:59Z

examples/tutorial_for_R/R_model_tutorial.ipynb

@@ -0,0 +1,240 @@
+{


Can you turn this tutorial into an integration test? Check out the PySpark integration test for an example. Also, add a README.md in the containers/R directory that discusses how to deploy an R model, any restrictions on the model inputs or outputs, and the dependencies needed (both Python dependencies like rpy2 and R dependencies.

dcrankshaw · 2017-06-22T18:06:21Z

jenkins test this please

AmplabJenkins · 2017-06-22T18:10:28Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/464/
Test FAILed.

dubeyabhi07 · 2017-06-22T19:30:46Z

It is showing PEP8 format violations for the code that I haven't even changed.

m-C02S6BY1G8WN:clipper_admin a0d00l9$ pep8 --first clipper_manager.py
clipper_manager.py:83:80: E501 line too long (93 > 79 characters)
clipper_manager.py:98:87: W291 trailing whitespace
clipper_manager.py:1153:39: E711 comparison to None should be 'if cond is None:'

Am I going in right direction for fixing error? If yes, then should I fix the format violations for only that part of code which is changed by me?

Corey-Zumar · 2017-06-22T23:54:10Z

@dubeyabhi07 Your concerns regarding R data frames of different sizes resulting in variable latencies are correct. I've essentially reverted deploy_R_models.py to its previous behavior. The test now splits each large data frame into smaller data frames, each consisting of a single row. Each smaller frame is then csv-encoded. This way, the Rpy2 container still expects csv-encoded dataframes as inputs. Please let me know what you think.

dubeyabhi07 · 2017-06-23T10:49:58Z

This is all right with me @Corey-Zumar . Can you please go through my last comment regarding the failure of test.
@dcrankshaw

Corey-Zumar · 2017-06-23T17:47:15Z

jenkins test this please

AmplabJenkins · 2017-06-23T17:50:32Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/468/
Test FAILed.

Corey-Zumar · 2017-06-23T18:11:15Z

jenkins test this please

dcrankshaw

This is great! It's almost ready to go. I just added a couple cleanup comments (fixing typos, cleaning up imports, fixing method arguments).

dcrankshaw · 2017-06-23T18:14:41Z

containers/R/README.md

+In addition to the requirements of running clipper, 
+
+1. R must be installed (version:latest , >=3.4)
+2. Python version must be >=2.7. 


Note that this will only support Python 2.

dcrankshaw · 2017-06-23T18:21:48Z

containers/R/README.md

+# Create an R model with an RPy2 reference
+model_RPy2 = ro.r('model_R <- lm(formula,data=dataset)') 
+```
+- A previously trained and saved model (in .rds format) can also be loaded as RPy2 obeject :


Typo in "object"

dcrankshaw · 2017-06-23T18:25:03Z

containers/R/README.md

+Once a data frame has been string encoded, we can pass it to the container via the `requests.post()` method and obtain batched predictions for each data frame row.
+
+This process is illustrated in the `predict_R_model()` method of 
+<clipper-root>/integration-tests/deploy_R_containers.py


Create an actual markdown link here. Here's a help post on relative links in markdown: https://help.github.com/articles/about-readmes/#relative-links-and-image-paths-in-readme-files

dcrankshaw · 2017-06-23T18:25:25Z

containers/R/r_python_container.py

@@ -0,0 +1,86 @@
+from __future__ import print_function
+from sklearn.externals import joblib


Do you need joblib?

dcrankshaw · 2017-06-23T18:26:06Z

containers/R/r_python_container.py

+import rpc
+import os
+import numpy as np
+from pandas import *


change this to import pandas as pd

dcrankshaw · 2017-06-23T18:31:44Z

integration-tests/deploy_R_models.py

+                "http://localhost:1337/%s/predict" % app_name,
+                headers=headers,
+                data=json.dumps({
+                    'uid': 0,


Remove the uid field here.

It shows error in json parsing if I dont give uid field. Why so?

You're likely using an old Docker container. Try pulling the clipper/query_frontend docker container from docker hub again.

dcrankshaw · 2017-06-23T18:33:57Z

integration-tests/deploy_R_models.py

+def call_predictions(query_string):
+    default = 0
+    url = "http://localhost:1337/%s/predict" % app_name
+    req_json = json.dumps({'uid': 0, 'input': query_string})


Delete uid field here.

dcrankshaw · 2017-06-23T18:35:21Z

clipper_admin/clipper_manager.py

+                       name,
+                       version,
+                       model_data,
+                       container_name,


Delete the container_name argument.

dcrankshaw · 2017-06-23T18:35:40Z

clipper_admin/clipper_manager.py

+                       version,
+                       model_data,
+                       container_name,
+                       input_type,


Delete the input type argument (it's always strings).

dcrankshaw · 2017-06-23T18:37:47Z

clipper_admin/clipper_manager.py

+        import rpy2.robjects as ro
+        from rpy2.robjects.packages import importr
+        base = importr('base')
+


Because you removed the container_name and input_type arguments from the method, set the variables here:

container_name = "clipper/r_python_container" input_type = "strings"

Also make sure to update the examples and tests in the rest of the PR.

AmplabJenkins · 2017-06-23T18:49:08Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/469/
Test PASSed.

Corey-Zumar

LGTM

dcrankshaw · 2017-06-23T19:55:47Z

jenkins ok to test

dcrankshaw

Almost there

dcrankshaw · 2017-06-23T19:58:54Z

clipper_admin/clipper_manager.py

+        container_name : str
+            The Docker container image to use to run this model container.
+        input_type : str
+            "strings" (from which model specific dataframes can be derived for carrying out predictions). 


Delete container_name and input_type parameter descriptions

dcrankshaw · 2017-06-23T19:59:48Z

containers/R/README.md

+
+```py
+Clipper.deploy_R_model(
+   "example_model",1,model_RPy2,"strings"


Fix this example to match the new signature of deploy_R_model (remove the strings argument. Also, please add spaces between the arguments.

AmplabJenkins · 2017-06-23T20:00:30Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/470/
Test FAILed.

AmplabJenkins · 2017-06-23T20:20:20Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/471/
Test FAILed.

AmplabJenkins · 2017-06-23T21:31:48Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/472/
Test PASSed.

dcrankshaw · 2017-06-23T21:35:29Z

LGTM

* Initial commit * heap size for clipper/spark-scala-container. The container halts with following exception: "Exception in thread "main" java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration." Given maximum memory as 512m to solve the problem. * created dockerfile This dockerfile creates image with R and python runtime supporting RPy2 * update clipper manager to integrate start_R_model() * Update clipper_manager.py * Add files via upload * Create R_model_support.py this file helps in scoring deployed R model * tutorial added * removed license * Delete R_model_deployment_tutorial-checkpoint.ipynb * added fresh tutorial * Delete R_model_deployment_tutorial.ipynb * Update clipper_manager.py * made R directory * added test and updated admin file * xyz * deleted tutorials * added readme * prefinal * final * solved space issues due to text editor * deleted undesired .Rhistory and other files * Create clipper_manager.py * Update README * Fix container prediction behavior, simplify csv encoding * Format code * Update r_python_container.py * modified test * Revert to row-by-row prediction style using dataframe splitting * Remove unused line. format code * Format code * typo and method-args fix * fix * removed whitespaces (cherry picked from commit 7811207)

* develop: Wording fix (ucbrise#234) RPC container content fix (ucbrise#232) [CLIPPER-227] Fix EWMA behavior for meters (ucbrise#228) One-line app registration and model deployment (ucbrise#223) R model deployment (ucbrise#195) Allow model versions to be strings (ucbrise#197) Base64 decoding for JSON byte data (ucbrise#214) Restarting on containers is no longer the default behavior (ucbrise#213) fixed backslash escape issue for removing remote containers (ucbrise#210) removed pip install findspark from run_unittests.sh (ucbrise#211) Fix example code in README (ucbrise#205)

dcrankshaw and others added 11 commits October 27, 2016 10:30

Initial commit

80ca026

Merge branch 'develop' into develop

99f5d33

created dockerfile

cb7756e

This dockerfile creates image with R and python runtime supporting RPy2

update clipper manager

199ca85

to integrate start_R_model()

Update clipper_manager.py

1b01809

Add files via upload

4240a5b

Create R_model_support.py

ff3580c

this file helps in scoring deployed R model

tutorial added

d3d6ca9

Merge remote-tracking branch 'upstream/master' into develop

b9d651b

Merge remote-tracking branch 'upstream/develop' into develop

b413b58

Abhishek Dubey and others added 4 commits June 2, 2017 17:19

removed license

daecade

Delete R_model_deployment_tutorial-checkpoint.ipynb

d8f1d13

added fresh tutorial

1d0abcf

Delete R_model_deployment_tutorial.ipynb

e81f0ba

dcrankshaw assigned dubeyabhi07 Jun 2, 2017

dcrankshaw self-requested a review June 2, 2017 16:14

dcrankshaw added status: needs review type: enhancement labels Jun 2, 2017

dubeyabhi07 added 3 commits June 5, 2017 09:58

Merge branch 'develop' into develop

d9eef9e

Update clipper_manager.py

f72141d

Merge branch 'develop' into develop

9cba595

dcrankshaw requested changes Jun 8, 2017

View reviewed changes

dcrankshaw added component: model container status: needs revision and removed status: needs review labels Jun 8, 2017

made R directory

c32d6a3

Corey-Zumar added 2 commits June 22, 2017 16:45

Revert to row-by-row prediction style using dataframe splitting

1289396

Remove unused line. format code

6a69b3a

Corey-Zumar added 2 commits June 23, 2017 11:09

Format code

db520e5

Merge branch 'develop' into develop

35fd78e

dcrankshaw requested changes Jun 23, 2017

View reviewed changes

typo and method-args fix

fa82010

Corey-Zumar approved these changes Jun 23, 2017

View reviewed changes

dcrankshaw requested changes Jun 23, 2017

View reviewed changes

fix

f044be7

removed whitespaces

343e98b

dcrankshaw approved these changes Jun 23, 2017

View reviewed changes

dcrankshaw merged commit 7811207 into ucbrise:develop Jun 23, 2017

dcrankshaw mentioned this pull request Jun 30, 2017

First class support for deploying R models #242

Closed

feynmanliang mentioned this pull request Jul 3, 2017

Deploy containers to Kubernetes #206

Closed

		@@ -0,0 +1,86 @@
		from __future__ import print_function
		from sklearn.externals import joblib

R model deployment #195

R model deployment #195

Conversation

dubeyabhi07 commented Jun 2, 2017 • edited Loading

AmplabJenkins commented Jun 2, 2017

dubeyabhi07 commented Jun 7, 2017

dcrankshaw commented Jun 7, 2017

dcrankshaw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcrankshaw commented Jun 22, 2017

AmplabJenkins commented Jun 22, 2017

dubeyabhi07 commented Jun 22, 2017 • edited Loading

Corey-Zumar commented Jun 22, 2017 • edited Loading

dubeyabhi07 commented Jun 23, 2017

Corey-Zumar commented Jun 23, 2017

AmplabJenkins commented Jun 23, 2017

Corey-Zumar commented Jun 23, 2017

dcrankshaw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Jun 23, 2017

Corey-Zumar left a comment

Choose a reason for hiding this comment

dcrankshaw commented Jun 23, 2017

dcrankshaw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Jun 23, 2017

AmplabJenkins commented Jun 23, 2017

AmplabJenkins commented Jun 23, 2017

dcrankshaw commented Jun 23, 2017

dubeyabhi07 commented Jun 2, 2017 •

edited

Loading

dubeyabhi07 commented Jun 22, 2017 •

edited

Loading

Corey-Zumar commented Jun 22, 2017 •

edited

Loading