Skip to content

Commit ffba201

Browse files
Split data into training and testing sets with stratification
1 parent 786c3e6 commit ffba201

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

CW2 (2).py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,17 @@ def statistics_data(data):
5959
def split_data(data, test_size=0.3, random_state=1):
6060
x_train, x_test, y_train, y_test=None, None, None, None
6161
np.random.seed(1)
62-
# Insert your code here for task 4
63-
62+
# Split the data into labels and features. X : Features , Y : Labels
63+
X = data.iloc[:, :-1] # Select all columns except the last one as features
64+
y = data.iloc[:, -1] # Select the last column as the label
65+
66+
# Split the data into training and testing sets, ensuring stratification
67+
x_train, x_test, y_train, y_test = train_test_split(
68+
X, y,
69+
test_size=test_size,
70+
random_state=random_state,
71+
stratify=y
72+
)
6473
return x_train, x_test, y_train, y_test
6574

6675
# Task 5 [10 marks]: Train a decision tree model with cost complexity parameter of 0

0 commit comments

Comments
 (0)