-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hello !
I'm trying to use the HVDM metric in KNN from sklearn, but unfortunately it keeps raising problems in the fitting part. I've tried to print things inside the code in order to understand what was going on but it only raises new questions.
I'm working on the "heart" dataset, and i'm doing some feature selection, and i try to see the accuracy of the model using an increasing number of features.
So, i initialize the metric as follows :
hvdm_metric = hvdm(np.concatenate((X[:,features_subset],np.array([Y]).T),axis=1),[len(features_subset)], ind_map, nan_equivalents = [nan_eqv])
(so i first concatenate the data and the output as HVDM needs the output in the data, ind_map is just the mapping of the categorical features to the feature subset)
Then I get an error when using nn.fit(X_T[:,features_subset],Y_T) : It says it tries to divide by 0. From every thing I had tried before, I thought it came from a numerical feature that was wrongly indicated as categorical, but unfortunately it is not that. It tries to compute the distance from each of my output to the mean of the ouputs, but the output is categorical so that's really weird. I guess it comes from the fact that my output Y is included in the data when i initialize the metric...
Do you have any working example of the use of HVDM in knn ? Or any idea on how to avoid that ?
Thanks a lot for your package and you making me discover these metrics !