Skip to content

Commit 84133e9

Browse files
committed
update
1 parent 1cdeb46 commit 84133e9

File tree

1 file changed

+10
-6
lines changed

1 file changed

+10
-6
lines changed

nlp_class/spam2.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,15 +35,19 @@
3535
df['b_labels'] = df['labels'].map({'ham': 0, 'spam': 1})
3636
Y = df['b_labels'].values
3737

38+
# split up the data
39+
df_train, df_test, Ytrain, Ytest = train_test_split(df['data'], Y, test_size=0.33)
40+
3841
# try multiple ways of calculating features
39-
# tfidf = TfidfVectorizer(decode_error='ignore')
40-
# X = tfidf.fit_transform(df['data'])
42+
tfidf = TfidfVectorizer(decode_error='ignore')
43+
Xtrain = tfidf.fit_transform(df_train)
44+
Xtest = tfidf.transform(df_test)
45+
46+
# count_vectorizer = CountVectorizer(decode_error='ignore')
47+
# Xtrain = count_vectorizer.fit_transform(df_train)
48+
# Xtest = count_vectorizer.transform(df_test)
4149

42-
count_vectorizer = CountVectorizer(decode_error='ignore')
43-
X = count_vectorizer.fit_transform(df['data'])
4450

45-
# split up the data
46-
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, Y, test_size=0.33)
4751

4852
# create the model, train it, print scores
4953
model = MultinomialNB()

0 commit comments

Comments
 (0)