-
Notifications
You must be signed in to change notification settings - Fork 333
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update .gitignore from local repository
Unnecessary directories were added to the .gitignore file in order to lessen the mess in the remote repo. Author: ljvmiranda921
- Loading branch information
ljvmiranda921
committed
Aug 4, 2017
1 parent
8be5e6b
commit 87b819f
Showing
6 changed files
with
1,063 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
/pyswarms.rst | ||
/pyswarms.*.rst | ||
/modules.rst | ||
/examples/.ipynb_checkpoints | ||
.editorconfig | ||
.vscode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,298 @@ | ||
|
||
Feature Subset Selection | ||
======================== | ||
|
||
In this example, we'll be using the optimizer | ||
``pyswarms.discrete.BinaryPSO`` to perform feature subset selection to | ||
improve classifier performance. But before we jump right on to the | ||
coding, let's first explain some relevant concepts: | ||
|
||
A short primer on feature selection | ||
----------------------------------- | ||
|
||
The idea for feature subset selection is to be able to find the best | ||
features that are suitable to the classification task. We must | ||
understand that not all features are created equal, and some may be more | ||
relevant than others. Thus, if we're given an array of features, how can | ||
we know the most optimal subset? (yup, this is a rhetorical question!) | ||
|
||
For a Binary PSO, the position of the particles are expressed in two | ||
terms: ``1`` or ``0`` (or on and off). If we have a particle :math:`x` | ||
on :math:`d`-dimensions, then its position can be defined as: | ||
|
||
.. math:: x = [x_1, x_2, x_3, \dots, x_d] ~~~\text{where}~~~ x_i \in {0,1} | ||
|
||
In this case, the position of the particle for each dimension can be | ||
seen as a simple matter of on and off. | ||
|
||
Feature selection and the objective function | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Now, suppose that we're given a dataset with :math:`d` features. What | ||
we'll do is that we're going to *assign each feature as a dimension of a | ||
particle*. Hence, once we've implemented Binary PSO and obtained the | ||
best position, we can then interpret the binary array (as seen in the | ||
equation above) simply as turning a feature on and off. | ||
|
||
As an example, suppose we have a dataset with 5 features, and the final | ||
best position of the PSO is: | ||
|
||
.. code:: python | ||
>>> optimizer.best_pos | ||
np.array([0, 1, 1, 1, 0]) | ||
>>> optimizer.best_cost | ||
0.00 | ||
Then this means that the second, third, and fourth (or first, second, | ||
and third in zero-index) that are turned on are the selected features | ||
for the dataset. We can then train our classifier using only these | ||
features while dropping the others. How do we then define our objective | ||
function? (Yes, another rhetorical question!). We can design our own, | ||
but for now I'll be taking an equation from the works of `Vieira, | ||
Mendoca, Sousa, et al. | ||
(2013) <http://www.sciencedirect.com/science/article/pii/S1568494613001361>`__. | ||
|
||
.. math:: f(X) = \alpha(1-P) + (1-\alpha) \left(1 - \dfrac{N_f}{N_t}\right) | ||
|
||
Where :math:`\alpha` is a hyperparameter that decides the tradeoff | ||
between the classifier performance :math:`P`, and the size of the | ||
feature subset :math:`N_f` with respect to the total number of features | ||
:math:`N_t`. The classifier performance can be the accuracy, F-score, | ||
precision, and so on. | ||
|
||
.. code:: ipython3 | ||
import sys | ||
sys.path.append('../') | ||
.. code:: ipython3 | ||
# Import modules | ||
import numpy as np | ||
import seaborn as sns | ||
import pandas as pd | ||
# Import PySwarms | ||
import pyswarms as ps | ||
# Some more magic so that the notebook will reload external python modules; | ||
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython | ||
%load_ext autoreload | ||
%autoreload 2 | ||
%matplotlib inline | ||
Generating a toy dataset using scikit-learn | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
We'll be using ``sklearn.datasets.make_classification`` to generate a | ||
100-sample, 15-dimensional dataset with three classes. We will then plot | ||
the distribution of the features in order to give us a qualitative | ||
assessment of the feature-space. | ||
|
||
For our toy dataset, we will be rigging some parameters a bit. Out of | ||
the 10 features, we'll have only 5 that are informative, 5 that are | ||
redundant, and 2 that are repeated. Hopefully, we get to have Binary PSO | ||
select those that are informative, and prune those that are redundant or | ||
repeated. | ||
|
||
.. code:: ipython3 | ||
from sklearn.datasets import make_classification | ||
X, y = make_classification(n_samples=100, n_features=15, n_classes=3, | ||
n_informative=4, n_redundant=1, n_repeated=2, | ||
random_state=1) | ||
.. code:: ipython3 | ||
# Plot toy dataset per feature | ||
df = pd.DataFrame(X) | ||
df['labels'] = pd.Series(y) | ||
sns.pairplot(df, hue='labels'); | ||
.. image:: output_6_0.png | ||
|
||
|
||
As we can see, there are some features that causes the two classes to | ||
overlap with one another. These might be features that are better off | ||
unselected. On the other hand, we can see some feature combinations | ||
where the two classes are shown to be clearly separated. These features | ||
can hopefully be retained and selected by the binary PSO algorithm. | ||
|
||
We will then use a simple logistic regression technique using | ||
``sklearn.linear_model.LogisticRegression`` to perform classification. A | ||
simple test of accuracy will be used to assess the performance of the | ||
classifier. | ||
|
||
Writing the custom-objective function | ||
------------------------------------- | ||
|
||
As seen above, we can write our objective function by simply taking the | ||
performance of the classifier (in this case, the accuracy), and the size | ||
of the feature subset divided by the total (that is, divided by 10), to | ||
return an error in the data. We'll now write our custom-objective | ||
function | ||
|
||
.. code:: ipython3 | ||
from sklearn import linear_model | ||
# Create an instance of the classifier | ||
classifier = linear_model.LogisticRegression() | ||
# Define objective function | ||
def f_per_particle(m, alpha): | ||
"""Computes for the objective function per particle | ||
Inputs | ||
------ | ||
m : numpy.ndarray | ||
Binary mask that can be obtained from BinaryPSO, will | ||
be used to mask features. | ||
alpha: float (default is 0.5) | ||
Constant weight for trading-off classifier performance | ||
and number of features | ||
Returns | ||
------- | ||
numpy.ndarray | ||
Computed objective function | ||
""" | ||
total_features = 15 | ||
# Get the subset of the features from the binary mask | ||
if np.count_nonzero(m) == 0: | ||
X_subset = X | ||
else: | ||
X_subset = X[:,m==1] | ||
# Perform classification and store performance in P | ||
classifier.fit(X_subset, y) | ||
P = (classifier.predict(X_subset) == y).mean() | ||
# Compute for the objective function | ||
j = (alpha * (1.0 - P) | ||
+ (1.0 - alpha) * (1 - (X_subset.shape[1] / total_features))) | ||
return j | ||
.. code:: ipython3 | ||
def f(x, alpha=0.88): | ||
"""Higher-level method to do classification in the | ||
whole swarm. | ||
Inputs | ||
------ | ||
x: numpy.ndarray of shape (n_particles, dims) | ||
The swarm that will perform the search | ||
Returns | ||
------- | ||
numpy.ndarray of shape (n_particles, ) | ||
The computed loss for each particle | ||
""" | ||
n_particles = x.shape[0] | ||
j = [f_per_particle(x[i], alpha) for i in range(n_particles)] | ||
return np.array(j) | ||
Using Binary PSO | ||
---------------- | ||
|
||
With everything set-up, we can now use Binary PSO to perform feature | ||
selection. For now, we'll be doing a global-best solution by setting the | ||
number of neighbors equal to the number of particles. The | ||
hyperparameters are also set arbitrarily. Moreso, we'll also be setting | ||
the distance metric as 2 (truth is, it's not really relevant because | ||
each particle will see one another). | ||
|
||
.. code:: ipython3 | ||
# Initialize swarm, arbitrary | ||
options = {'c1': 0.5, 'c2': 0.5, 'w':0.9, 'k': 30, 'p':2} | ||
# Call instance of PSO | ||
dims = 15 # dimensions should be the number of features | ||
optimizer.reset() | ||
optimizer = ps.discrete.BinaryPSO(n_particles=30, dims=dims, **options) | ||
# Perform optimization | ||
cost, pos = optimizer.optimize(f, print_step=100, iters=1000, verbose=2) | ||
.. parsed-literal:: | ||
Iteration 1/1000, cost: 0.2776 | ||
Iteration 101/1000, cost: 0.2792 | ||
Iteration 201/1000, cost: 0.2624 | ||
Iteration 301/1000, cost: 0.2632 | ||
Iteration 401/1000, cost: 0.2544 | ||
Iteration 501/1000, cost: 0.3208 | ||
Iteration 601/1000, cost: 0.2376 | ||
Iteration 701/1000, cost: 0.2944 | ||
Iteration 801/1000, cost: 0.3224 | ||
Iteration 901/1000, cost: 0.3464 | ||
================================ | ||
Optimization finished! | ||
Final cost: 0.0000 | ||
Best value: 0.000000 1.000000 0.000000 1.000000 0.000000 1.000000 ... | ||
We can then train the classifier using the positions found by running | ||
another instance of logistic regression. We can compare the performance | ||
when we're using the full set of features | ||
|
||
.. code:: ipython3 | ||
# Create two instances of LogisticRegression | ||
classfier = linear_model.LogisticRegression() | ||
# Get the selected features from the final positions | ||
X_selected_features = X[:,pos==1] # subset | ||
# Perform classification and store performance in P | ||
classifier.fit(X_selected_features, y) | ||
# Compute performance | ||
subset_performance = (c1.predict(X_selected_features) == y).mean() | ||
print('Subset performance: %.3f' % (subset_performance)) | ||
.. parsed-literal:: | ||
Subset performance: 0.680 | ||
Another important advantage that we have is that we were able to reduce | ||
the features (or do dimensionality reduction) on our data. This can save | ||
us from the `curse of | ||
dimensionality <http://www.stat.ucla.edu/~sabatti/statarray/textr/node5.html>`__, | ||
and may in fact speed up our classification. | ||
|
||
Let's plot the feature subset that we have: | ||
|
||
.. code:: ipython3 | ||
# Plot toy dataset per feature | ||
df1 = pd.DataFrame(X_selected_features) | ||
df1['labels'] = pd.Series(y) | ||
sns.pairplot(df1, hue='labels') | ||
.. parsed-literal:: | ||
<seaborn.axisgrid.PairGrid at 0x1975d4da400> | ||
.. image:: output_17_1.png | ||
|
||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.