-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why some fundament algorithms like LR DT RF is comparable with DES methods on my dataset. #259
Comments
Hello, It is impossible to say why without knowing more the data and all the methodological steps used to run the algorithms. Did you normalized all your data before applying dynamic selection? Did you try different approaches like DES base on clustering to see if that would give you better performance? |
Dataset: http://bit.ly/xMLdataset (a binary classification task), I ran logistic regression (from sklearn) on this dataset and compare with DES methods (code copy from documentation) no normalized no any preprocessing just original dataset split into train_test dataset and I found there is no obvious performance improving in using DES methods. |
def AUC_plot(algorithmName, test_y, pred_y_prob):
# print(algorithmName, "AUC图像绘制:")
fpr, tpr, thresholds = roc_curve(test_y, pred_y_prob)
auc = roc_auc_score(test_y, pred_y_prob)
plt.plot(fpr, tpr)
plt.title(algorithmName+" AUC=%.4f" % (auc))
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.fill_between(fpr, tpr, where=(tpr > 0), color='green', alpha=0.5)
plt.show()
# 输出打印算法性能
def print_performance(algorithm_name, test_y, pred_y, pred_y_prob):
# TP(True Positive) 预测正确的1
# FN(False Negative) 预测为-1,真实为1
# FP(False Positive) 预测为1,真实为-1
# TN(True Negative) 预测为-1,真实为-1
TP = []
FN = []
FP = []
TN = []
for i in range(len(pred_y)):
if pred_y[i] == 1 and test_y[i] == 1:
TP.append(i)
elif pred_y[i] == 0 and test_y[i] == 1:
FN.append(i)
elif pred_y[i] == 1 and test_y[i] == 0:
FP.append(i)
elif pred_y[i] == 0 and test_y[i] == 0:
TN.append(i)
accuracy = (len(TP)+len(TN))/(len(TP)+len(FP)+len(TN)+len(FN))
precision = len(TP) / (len(TP) + len(FP))
recall = len(TP) / (len(TP) + len(FN))
F1_score = 2 * ((precision*recall)/(precision+recall))
print(algorithm_name, ':')
print('Accuracy:', accuracy)
print('Precision:', precision)
print('Recall:', recall)
print('F1-SCORE:', F1_score)
AUC_plot(algorithm_name, test_y, pred_y_prob)
print('\n')
if __name__ == '__main__':
dataset = pd.read_csv('data/heloc_dataset_v2.csv')
X_train, X_test, y_train, y_test = train_test_split(dataset.drop(['target'],axis=1), dataset['target'], test_size=0.30, random_state=666)
com_lr = LogisticRegression(max_iter=10000)
com_lr.fit(X_train, y_train)
print_performance('LR compare', np.array(y_test), com_lr.predict(X_test), com_lr.predict_proba(X_test)[:,1])
pool_classifiers = BaggingClassifier(base_estimator=DecisionTreeClassifier(),
n_estimators=100,
random_state=666)
X_train, X_dsel, y_train, y_dsel = train_test_split(X_train, y_train,
test_size=0.50,
random_state=666)
pool_classifiers.fit(X_train, y_train)
meta = METADES(pool_classifiers, random_state=666)
names = ['META-DES']
methods = [meta]
# Fit the DS techniques
scores = []
for method, name in zip(methods, names):
method.fit(X_dsel, y_dsel)
scores.append(method.score(X_test, y_test))
print_performance(name, np.array(y_test), method.predict(X_test), method.predict_proba(X_test)[:,1]) as you can see from the picture above, LR is logistic regression in sklearn, nearly all performance terms on META-DES are not good as logistic regression. I wonder how this would happened? |
I mean, the des method does not improve or even worse in the indicators run by my data set.
The text was updated successfully, but these errors were encountered: