You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run the scan with a testing dataset of 100 samples and ~3700 features and an OOM error occured.
I have utilized a pipeline with Data transformers, featurewiz feature selection and XGBoost model.
I have run the library in 2 other use cases with <100 number of features and it runs smoothly without any issue therefore the issue i am suspecting is related with the vast amount of features
Running on 48GB Ram.
Standalone code OR list down the steps to reproduce the issue
import pandas as pd
import numpy as np
from giskard import Dataset, Model, scan
# Class to create the model and Dataset
class VulnerabilityDetection:
def __init__(self, df: pd.DataFrame, model_instance):
self.model_instance = model_instance
self.df = df
def gisk_dataset(self):
CATEGORICAL_COLUMNS = list(self.df[self.df.columns[self.df.dtypes =='object']].columns)
giskard_dataset = Dataset(
df=self.df,
target="Target",
name="",
cat_columns=CATEGORICAL_COLUMNS,
)
return giskard_dataset
def gisk_model(self):
model_inst = self.model_instance
def prediction_function(df: pd.DataFrame) -> np.ndarray:
return model_inst.predict_proba(df)
giskard_model = Model(
model=prediction_function,
model_type="classification",
name="Vulnerability Detection Model",
classification_labels=model_inst.classes_,
feature_names=self.df.columns
)
return giskard_model
# Execution
import pickle
df = pd.read_csv("MyData")
with open("XGBoost_pipeline.pkl", 'rb') as file:
xg_pipeline = pickle.load(file)
vd = VulnerabilityDetection(df, xg_pipeline)
gisk_dataset =vd.gisk_dataset()
gisk_model =vd.gisk_model()
Relevant log output
Actually the Notebook from VSCode crushed with OOM error
The text was updated successfully, but these errors were encountered:
The dataset i am using is related to radiomics (Medical Imaging) where all the features are contributing at model's decision and therefore i cannot isolate specific features. Maybe updating the logic behind the scan would be beneficiary.
For instance for a large number of features procced with batch processing and at the end merge the scan results into the total
Hey @dzaridis, are you still facing this issue?
Which error did you get? Indeed, it seems to be a problem with the vast amount of features, specially when using xgboost
Issue Type
Bug
Source
source
Giskard Library Version
2.14.0
Giskard Hub Version
2.14.0
OS Platform and Distribution
Linux Ubuntu 20.04
Python version
3.9
Installed python packages
Current Behaviour?
Standalone code OR list down the steps to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: