-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R model deployment #195
R model deployment #195
Conversation
The container halts with following exception: "Exception in thread "main" java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration." Given maximum memory as 512m to solve the problem.
This dockerfile creates image with R and python runtime supporting RPy2
to integrate start_R_model()
this file helps in scoring deployed R model
Can one of the admins verify this patch? |
Hi Dan |
Yeah I'll be able to review it today. Sorry about the delay, we were busy getting some stuff ready for Spark Summit but that's all done now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start. I wasn't able to run the code because you're using an old version of pandas and some of the Python package names moved around. Once you update the code and write an integration test I'll review again.
clipper_admin/clipper_manager.py
Outdated
import warnings | ||
warnings.filterwarnings("ignore", category=FutureWarning) | ||
from pandas import * | ||
import pandas.rpy.common as com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module no longer exists (see the warning at the top of this page https://pandas.pydata.org/pandas-docs/stable/r_interface.html).
clipper_admin/clipper_manager.py
Outdated
from numpy import * | ||
import scipy as sp | ||
import warnings | ||
warnings.filterwarnings("ignore", category=FutureWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you filtering this warning?
clipper_admin/clipper_manager.py
Outdated
@@ -58,20 +58,32 @@ | |||
EXTERNALLY_MANAGED_MODEL = "EXTERNAL" | |||
|
|||
|
|||
from numpy import * | |||
import scipy as sp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to import numpy and scipy here? If you do, don't change the numpy import to import numpy as np
instead of importing *
.
clipper_admin/clipper_manager.py
Outdated
from rpy2.robjects.packages import importr | ||
import rpy2.robjects as ro | ||
from rpy2.robjects.packages import importr | ||
stats = importr('stats') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import the R specific imports at the beginning of the call to deploy_R_model
. See the Spark method as an example.
clipper_admin/clipper_manager.py
Outdated
class ClipperManagerException(Exception): | ||
pass | ||
|
||
|
||
class Clipper: | ||
""" | ||
Connection to a Clipper instance for administrative purposes. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't delete all of these extra lines.
containers/R_Python/r.py
Outdated
|
||
def predict_strings(self, inputs): | ||
#print(inputs) | ||
TESTDATA=StringIO(inputs[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What input schema does this method expect?
clipper_admin/clipper_manager.py
Outdated
The name to assign this model. | ||
version : int | ||
The version to assign this model. | ||
model_data : str or BaseEstimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you expect the type of model_data
to be?
containers/R_Python/r.py
Outdated
model_path = os.environ["CLIPPER_MODEL_PATH"] | ||
|
||
rds_names=[ | ||
l for l in os.listdir(model_path) if os.path.splitext(l)[-1] == ".rds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this line doing?
containers/R_Python/Dockerfile
Outdated
@@ -0,0 +1,67 @@ | |||
FROM clipper/py-rpc:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a line to the bin/build_docker_images.sh
script to build this container
@@ -0,0 +1,240 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you turn this tutorial into an integration test? Check out the PySpark integration test for an example. Also, add a README.md in the containers/R
directory that discusses how to deploy an R model, any restrictions on the model inputs or outputs, and the dependencies needed (both Python dependencies like rpy2
and R dependencies.
jenkins test this please |
Test FAILed. |
It is showing PEP8 format violations for the code that I haven't even changed.
Am I going in right direction for fixing error? If yes, then should I fix the format violations for only that part of code which is changed by me? |
@dubeyabhi07 Your concerns regarding R data frames of different sizes resulting in variable latencies are correct. I've essentially reverted |
This is all right with me @Corey-Zumar . Can you please go through my last comment regarding the failure of test. |
jenkins test this please |
Test FAILed. |
jenkins test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! It's almost ready to go. I just added a couple cleanup comments (fixing typos, cleaning up imports, fixing method arguments).
containers/R/README.md
Outdated
In addition to the requirements of running clipper, | ||
|
||
1. R must be installed (version:latest , >=3.4) | ||
2. Python version must be >=2.7. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will only support Python 2.
containers/R/README.md
Outdated
# Create an R model with an RPy2 reference | ||
model_RPy2 = ro.r('model_R <- lm(formula,data=dataset)') | ||
``` | ||
- A previously trained and saved model (in .rds format) can also be loaded as RPy2 obeject : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in "object"
containers/R/README.md
Outdated
Once a data frame has been string encoded, we can pass it to the container via the `requests.post()` method and obtain batched predictions for each data frame row. | ||
|
||
This process is illustrated in the `predict_R_model()` method of | ||
<clipper-root>/integration-tests/deploy_R_containers.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create an actual markdown link here. Here's a help post on relative links in markdown: https://help.github.com/articles/about-readmes/#relative-links-and-image-paths-in-readme-files
containers/R/r_python_container.py
Outdated
@@ -0,0 +1,86 @@ | |||
from __future__ import print_function | |||
from sklearn.externals import joblib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need joblib?
containers/R/r_python_container.py
Outdated
import rpc | ||
import os | ||
import numpy as np | ||
from pandas import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to import pandas as pd
integration-tests/deploy_R_models.py
Outdated
"http://localhost:1337/%s/predict" % app_name, | ||
headers=headers, | ||
data=json.dumps({ | ||
'uid': 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the uid
field here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shows error in json parsing if I dont give uid
field. Why so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're likely using an old Docker container. Try pulling the clipper/query_frontend
docker container from docker hub again.
integration-tests/deploy_R_models.py
Outdated
def call_predictions(query_string): | ||
default = 0 | ||
url = "http://localhost:1337/%s/predict" % app_name | ||
req_json = json.dumps({'uid': 0, 'input': query_string}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete uid
field here.
clipper_admin/clipper_manager.py
Outdated
name, | ||
version, | ||
model_data, | ||
container_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete the container_name argument.
clipper_admin/clipper_manager.py
Outdated
version, | ||
model_data, | ||
container_name, | ||
input_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete the input type argument (it's always strings).
import rpy2.robjects as ro | ||
from rpy2.robjects.packages import importr | ||
base = importr('base') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you removed the container_name and input_type arguments from the method, set the variables here:
container_name = "clipper/r_python_container"
input_type = "strings"
Also make sure to update the examples and tests in the rest of the PR.
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jenkins ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there
clipper_admin/clipper_manager.py
Outdated
container_name : str | ||
The Docker container image to use to run this model container. | ||
input_type : str | ||
"strings" (from which model specific dataframes can be derived for carrying out predictions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete container_name and input_type parameter descriptions
containers/R/README.md
Outdated
|
||
```py | ||
Clipper.deploy_R_model( | ||
"example_model",1,model_RPy2,"strings" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix this example to match the new signature of deploy_R_model
(remove the strings argument. Also, please add spaces between the arguments.
Test FAILed. |
Test FAILed. |
Test PASSed. |
LGTM |
* Initial commit * heap size for clipper/spark-scala-container. The container halts with following exception: "Exception in thread "main" java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration." Given maximum memory as 512m to solve the problem. * created dockerfile This dockerfile creates image with R and python runtime supporting RPy2 * update clipper manager to integrate start_R_model() * Update clipper_manager.py * Add files via upload * Create R_model_support.py this file helps in scoring deployed R model * tutorial added * removed license * Delete R_model_deployment_tutorial-checkpoint.ipynb * added fresh tutorial * Delete R_model_deployment_tutorial.ipynb * Update clipper_manager.py * made R directory * added test and updated admin file * xyz * deleted tutorials * added readme * prefinal * final * solved space issues due to text editor * deleted undesired .Rhistory and other files * Create clipper_manager.py * Update README * Fix container prediction behavior, simplify csv encoding * Format code * Update r_python_container.py * modified test * Revert to row-by-row prediction style using dataframe splitting * Remove unused line. format code * Format code * typo and method-args fix * fix * removed whitespaces (cherry picked from commit 7811207)
* develop: Wording fix (ucbrise#234) RPC container content fix (ucbrise#232) [CLIPPER-227] Fix EWMA behavior for meters (ucbrise#228) One-line app registration and model deployment (ucbrise#223) R model deployment (ucbrise#195) Allow model versions to be strings (ucbrise#197) Base64 decoding for JSON byte data (ucbrise#214) Restarting on containers is no longer the default behavior (ucbrise#213) fixed backslash escape issue for removing remote containers (ucbrise#210) removed pip install findspark from run_unittests.sh (ucbrise#211) Fix example code in README (ucbrise#205)
This pull request will integrate R model deployment with clipper.