-
Notifications
You must be signed in to change notification settings - Fork 18
Closed
Description
Is your feature request related to a problem? Please describe.
When you deploy a Vetiver model to Connect that uses a "custom" object in the pipeline the model will deploy, but when you open the API it will fail.
Describe the solution you'd like
I would like to be able to deploy a Vetiver model that uses custom sklearn transformers.
Describe alternatives you've considered
- You could package up the custom transformer as a python package. In your model deployment code, you could import the custom transformer. Then, when vetiver deploys to Connect it will install the custom python package and have access to the transformer. However, this has major downsides: users need to know how to make a Python package, they need to be able to deploy the package somewhere that they can access both in their development and Connect environment. Posit Package Manager serves this use case, but many users will not have access to this.
- Maybe you could define the custom transformer in another file (e.g. transformer.py). If you upload that file to Connect as one of the extra files maybe it will be able to import it? I think it will not work though because vetiver writes
api.pyfile for you.
I am not sure what the "best" solution is. I would love to hear what you have seen other users do, or how you would approach :)
Additional context
Here is an example script:
Click to expand example script
# %% [markdown]
# # Initial Model Fit
# %% [markdown]
# In this notebook we fit a simple machine learning model to predict prepayments for student loans. Towards this end we use the **scikit-learn** package. Once our model is fit we deploy it to Posit Connect using the **vetiver** package.
# %% [markdown]
# ## Initial Setup
# %% [markdown]
# Let's begin by loading some packages that we will need.
# %%
import pandas as pd
import sklearn
import pins
import vetiver
# %% [markdown]
# Next, let's read-in the `CONNECT_SERVER` and `CONNECT_API_KEY` environment variables.
# %%
import os
import dotenv
dotenv.load_dotenv(override=True)
rsc_server = os.environ['CONNECT_SERVER']
rsc_key = os.environ['CONNECT_API_KEY']
# %% [markdown]
# ## Reading-In Training Data
# %% [markdown]
# We can now read-in our training data.
# %%
df_train = pd.read_csv('data/student-loan-2022-12-01.csv')
df_train
# %% [markdown]
# Let's separate features and labels.
# %%
df_X = df_train.drop(columns=['paid_label'])
df_y = df_train[['paid_label']]
# %% [markdown]
# ## Defining the Modeling Pipeline
# %% [markdown]
# Next, we identify the columns of the `df_train` that we would like to use as predictors. We are going to ignore `trade_date` because it is simply there so we know which month the data is coming from. We are also going to igore `mos_to_repay` because it is zero for all but a few observations.
# %%
features = ['loan_age', 'cosign', 'income_annual', 'upb', 'monthly_payment',
'fico', 'origbalance', 'repay_status', 'mos_to_balln']
# %% [markdown]
# In order to
# %%
from sklearn.base import BaseEstimator, TransformerMixin
class FeatureSelector(BaseEstimator, TransformerMixin):
def __init__(self, columns):
self.columns = columns
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
return X[self.columns]
# %%
FeatureSelector(features).fit_transform(df_train).head()
# %%
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
model = Pipeline(steps=[
('feature_selector', FeatureSelector(features)),
('decision_tree', DecisionTreeClassifier())
])
# %% [markdown]
# ## Fit the Model
# %%
model.fit(df_X, df_y)
# %% [markdown]
# ## Vetiver
# %% [markdown]
# ### Create a **vetiver** Model
# %%
from vetiver import VetiverModel
meta = {'training_data': df_train['trade_date'][0]}
v = VetiverModel(
model,
model_name = "user.name/student_loan_python",
#prototype_data = df_X,
metadata = meta,
)
v
# %% [markdown]
# ### Pin (Store and Version) the Model
# %%
from vetiver import vetiver_pin_write
model_board = pins.board_rsconnect(server_url=rsc_server, api_key=rsc_key, allow_pickle_read=True)
vetiver_pin_write(model_board, v)
# %%
model_board.pin_versions('user.name/student_loan_python')
# %% [markdown]
# ### Create a REST API
# %%
from rsconnect.api import RSConnectServer
connect_server = RSConnectServer(url=rsc_server, api_key=rsc_key)
vetiver.deploy_rsconnect(
connect_server=connect_server,
board=model_board,
pin_name="user.name/student_loan_python",
version=model_board.pin_versions('user.name/student_loan_python').tail(1)['version'].iloc[0],
#app_id='d42d839a-0672-4747-9773-174d73eff647', # <-- how would I know this for the initial deployment?
title="Student Loan - Model - FastAPI",
extra_files=['requirements.txt'],
)
# %%The relevant code chunk is this:
from sklearn.base import BaseEstimator, TransformerMixin
class FeatureSelector(BaseEstimator, TransformerMixin):
def __init__(self, columns):
self.columns = columns
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
return X[self.columns]
# %%
FeatureSelector(features).fit_transform(df_train).head()
# %%
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
model = Pipeline(steps=[
('feature_selector', FeatureSelector(features)),
('decision_tree', DecisionTreeClassifier())
])When you deploy this model to Connect, Connect does not know what FeatureSelector is, and will fail to start the API.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels