Skip to content

Py 1: Loading the Data

Joshua Levy edited this page Dec 4, 2019 · 3 revisions

First, we load the necessary software dependencies:

import pandas as pd, numpy as np, pickle
from interactiontransformer.InteractionTransformer import InteractionTransformer, run_shap
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from imblearn.ensemble import BalancedRandomForestClassifier
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import matplotlib as mpl
import scipy
from sklearn.metrics import roc_auc_score
import warnings
warnings.filterwarnings("ignore")
mpl.rcParams['figure.dpi'] = 300
sns.set(style='white',font_scale=0.5)

Then, let's access some test data using this command:

df=pd.read_csv('../test_data/epistasis.test.csv')
X,y=df.iloc[:,:-1],df.iloc[:,-1]

Now that our data is loaded, we can split the data up into training and test sets, though if you are performing traditional statistical analyses, you do not need to conduct this split:

X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=42,stratify=y,shuffle=True)

Now, we will fit the interaction transformer!

Clone this wiki locally