An implementation of laplacian score by Python since all the code on GitHub are either too complicated or unavailable
laplacian_score(
df_arr: numpy array,
label: None,
**kwargs
)
Arguments:
df_arr
: A numpy array represent datalabel
: Default=None
, Label if the data is a supervised data.k_nearest
: Default=5
, if the data is an unsupervised data, use k_nearest to find the edge of distance graph, this parameter takes no action iflabel
parameter exists.
from sklearn import datasets
import pandas as pd
import numpy as np
iris = datasets.load_iris()
df = pd.DataFrame(iris["data"], columns=iris["feature_names"])
target = iris["target"]
print(df.head())
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) |
---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 |
4.9 | 3.0 | 1.4 | 0.2 |
4.7 | 3.2 | 1.3 | 0.2 |
>>> laplacian_score(np.array(df), k_nearest=3)
array([0.04132189, 0.17426192, 0.01415396, 0.03982228])
The value above corresponding to laplacian score of each features, the smaller the score, the higher chance be selected
>>> laplacian_score(np.array(df), label=target)
array([0.60421927, 0.78376157, 0.1138573 , 0.10572175])