-
Notifications
You must be signed in to change notification settings - Fork 391
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
376 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,361 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Introduction\n", | ||
"\n", | ||
"$\\newcommand{\\G}{\\mathcal{G}}$\n", | ||
"$\\newcommand{\\V}{\\mathcal{V}}$\n", | ||
"$\\newcommand{\\E}{\\mathcal{E}}$\n", | ||
"$\\newcommand{\\R}{\\mathbb{R}}$\n", | ||
"\n", | ||
"This notebook shows how to apply our graph ConvNet ([paper] & [code]), or any other, to your structured or unstructured data. For this example, we assume that we have $n$ samples $x_i \\in \\R^{d_x}$ arranged in a data matrix $$X = [x_1, ..., x_n]^T \\in \\R^{n \\times d_x}.$$ Each sample $x_i$ is associated with a vector $y_i \\in \\R^{d_y}$ for a regression task or a label $y_i \\in \\{0,\\ldots,C\\}$ for a classification task.\n", | ||
"\n", | ||
"[paper]: https://arxiv.org/abs/1606.09375\n", | ||
"[code]: https://github.com/mdeff/cnn_graph\n", | ||
"\n", | ||
"From there, we'll structure our data with a graph $\\G = (\\V, \\E, A)$ where $\\V$ is the set of $d_x = |\\V|$ vertices, $\\E$ is the set of edges and $A \\in \\R^{d_x \\times d_x}$ is the adjacency matrix. That matrix represents the weight of each edge, i.e. $A_{i,j}$ is the weight of the edge connecting $v_i \\in \\V$ to $v_j \\in \\V$. The weights of that feature graph thus represent pairwise relationships between features $i$ and $j$. We call that regime **signal classification / regression**, as the samples $x_i$ to be classified or regressed are graph signals.\n", | ||
"\n", | ||
"Some applications of that regime:\n", | ||
"* Task classification for task fMRI: the graph is a functional or anatomical connectome. Graph signals are activations measured by fMRI while the subject is performing some task. The goal is to find which task.\n", | ||
"* Anomaly detection: given a transportation, energy or communication network and some traffic measures, predict whether something is going wrong.\n", | ||
"* Text classification: each document is modeled as a graph signal (bag-of-words or TF-IDF) and each node represents a word. The graph represents the vocabulary, where edge weights indicate the similarity between words." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Other modelling possibilities include:\n", | ||
"1. Using a data graph, i.e. an adjacency matrix $A \\in \\R^{n \\times n}$ which represents pairwise relationships between samples $x_i \\in \\R^{d_x}$. The problem is here to predict a graph signal $y \\in \\R^{n \\times d_y}$ given a graph characterized by $A$ and some graph signals $X \\in \\R^{n \\times d_x}$. We call that regime **node classification / regression**, as we classify or regress nodes instead of signals.\n", | ||
" 1. Example application: text classification where each document is modeled as a node and each hyperlink or citation is an edge between them. The graph signals may be bag-of-words or TF-IDF representations of the documents.\n", | ||
" 2. [Kipf & Weiling (2016)][kipf_weiling] uses a first-order approximation of our spectral graph convolution for semi-supervised node classification.\n", | ||
"2. Another problem of interest is whole graph classification, with or without signals on top. We'll call that third regime **graph classification / regression**. The problem here is to classify or regress a whole graph $A_i \\in \\R^{n \\times n}$ (with or without an associated data matrix $X_i \\in \\R^{n \\times d_x}$) into $y_i \\in \\R^{d_y}$. In case we have no signal, we can use a constant vector $X_i = 1_n$ of size $n$.\n", | ||
" 1. Example application: predict some characteristic of a chemical compound given its arrangement.\n", | ||
"\n", | ||
"[kipf_weiling]: https://arxiv.org/abs/1609.02907" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"#import graph, coarsening, utils\n", | ||
"%run -n models.ipynb # import models\n", | ||
"import numpy as np\n", | ||
"import matplotlib.pyplot as plt\n", | ||
"import shutil\n", | ||
"%matplotlib inline\n", | ||
"\n", | ||
"\n", | ||
"%load_ext autoreload\n", | ||
"%autoreload 1\n", | ||
"%aimport graph\n", | ||
"%aimport coarsening\n", | ||
"%aimport utils" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# 1 Data\n", | ||
"\n", | ||
"For the purpose of the demo, let's create a random data matrix $X \\in \\R^{n \\times d_x}$ and somehow infer a label $y_i = f(x_i)$." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"d = 100 # Dimensionality.\n", | ||
"n = 10000 # Number of samples.\n", | ||
"c = 5 # Number of feature communities.\n", | ||
"\n", | ||
"# Data matrix, structured in communities (feature-wise).\n", | ||
"X = np.random.normal(0, 1, (n, d)).astype(np.float32)\n", | ||
"X += np.linspace(0, 1, c).repeat(d // c)\n", | ||
"\n", | ||
"# Noisy non-linear target.\n", | ||
"w = np.random.normal(0, .02, d)\n", | ||
"t = X.dot(w) + np.random.normal(0, .001, n)\n", | ||
"t = np.tanh(t)\n", | ||
"plt.figure(figsize=(15, 5))\n", | ||
"plt.plot(t, '.')\n", | ||
"\n", | ||
"# Classification.\n", | ||
"y = np.ones(t.shape, dtype=np.uint8)\n", | ||
"y[t > t.mean() + 0.4 * t.std()] = 0\n", | ||
"y[t < t.mean() - 0.4 * t.std()] = 2\n", | ||
"print('Class imbalance: ', np.unique(y, return_counts=True)[1])" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Then split this dataset into training, validation and testing sets." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"n_train = n // 2\n", | ||
"n_val = n // 10\n", | ||
"\n", | ||
"X_train = X[:n_train]\n", | ||
"X_val = X[n_train:n_train+n_val]\n", | ||
"X_test = X[n_train+n_val:]\n", | ||
"\n", | ||
"y_train = y[:n_train]\n", | ||
"y_val = y[n_train:n_train+n_val]\n", | ||
"y_test = y[n_train+n_val:]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# 2 Graph\n", | ||
"\n", | ||
"The second thing we need is a **graph between features**, i.e. an adjacency matrix $A \\in \\mathbb{R}^{d_x \\times d_x}$.\n", | ||
"Structuring data with graphs is very flexible: it can accomodate both structured and unstructured data.\n", | ||
"1. **Structured data**.\n", | ||
" 1. The data is structured by an Euclidean domain, e.g. $x_i$ represents an image, a sound or a video. We can use a classical ConvNet with 1D, 2D or 3D convolutions or a graph ConvNet with a line or grid graph (however losing the orientation).\n", | ||
" 2. The data is structured by a graph, e.g. the data lies on a transportation, energy, brain or social network.\n", | ||
"2. **Unstructured data**. We could use a fully connected network, but the learning and computational complexities are gonna be large. An alternative is to construct a sparse similarity graph between features (or between samples) and use a graph ConvNet, effectively structuring the data and drastically reducing the number of parameters through weight sharing. As for classical ConvNets, the number of parameters are independent of the input size.\n", | ||
"\n", | ||
"There are many ways, supervised or unsupervised, to construct a graph given some data. And better the graph, better the performance ! For this example we'll define the adjacency matrix as a simple similarity measure between features. Below are the choices one has to make when constructing such a graph.\n", | ||
"1. The distance function. We'll use the Euclidean distance $d_{ij} = \\|x_i - x_j\\|_2$.\n", | ||
"2. The kernel. We'll use the Gaussian kernel $a_{ij} = \\exp(d_{ij}^2 / \\sigma^2)$.\n", | ||
"3. The type of graph. We'll use a $k$ nearest neigbors (kNN) graph." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"dist, idx = graph.distance_scipy_spatial(X_train.T, k=10, metric='euclidean')\n", | ||
"A = graph.adjacency(dist, idx).astype(np.float32)\n", | ||
"\n", | ||
"assert A.shape == (d, d)\n", | ||
"print('d = |V| = {}, k|V| < |E| = {}'.format(d, A.nnz))\n", | ||
"plt.spy(A, markersize=2, color='black');" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"To be able to pool graph signals, we need first to coarsen the graph, i.e. to find which vertices to group together. At the end we'll have multiple graphs, like a pyramid, each at one level of resolution. The finest graph is where the input data lies, the coarsest graph is where the data at the output of the graph convolutional layers lie. That data, of reduced spatial dimensionality, can then be fed to a fully connected layer.\n", | ||
"\n", | ||
"The parameter here is the number of times to coarsen the graph. Each coarsening approximately reduces the size of the graph by a factor two. Thus if you want a pooling of size 4 in the first layer followed by a pooling of size 2 in the second, you'll need to coarsen $\\log_2(4+2) = 3$ times.\n", | ||
"\n", | ||
"After coarsening we rearrange the vertices (and add fake vertices) such that pooling a graph signal is analog to pooling a 1D signal. See the [paper] for details.\n", | ||
"\n", | ||
"[paper]: https://arxiv.org/abs/1606.09375" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"graphs, perm = coarsening.coarsen(A, levels=3, self_connections=False)\n", | ||
"\n", | ||
"X_train = coarsening.perm_data(X_train, perm)\n", | ||
"X_val = coarsening.perm_data(X_val, perm)\n", | ||
"X_test = coarsening.perm_data(X_test, perm)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We finally need to compute the graph Laplacian $L$ for each of our graphs (the original and the coarsened versions), defined by their adjacency matrices $A$. The sole parameter here is the type of Laplacian, e.g. the combinatorial Laplacian, the normalized Laplacian or the random walk Laplacian." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"L = [graph.laplacian(A, normalized=True) for A in graphs]\n", | ||
"graph.plot_spectrum(L)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# 3 Graph ConvNet\n", | ||
"\n", | ||
"Here we apply the graph convolutional neural network to signals lying on graphs. After designing the architecture and setting the hyper-parameters, the model takes as inputs the data matrix $X$, the target $y$ and a list of graph Laplacians $L$, one per coarsening level.\n", | ||
"\n", | ||
"The data, architecture and hyper-parameters are absolutely *not engineered to showcase performance*. Its sole purpose is to illustrate usage and functionality." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Clean up the data generated by TensorFlow.\n", | ||
"shutil.rmtree('summaries/demo', ignore_errors=True)\n", | ||
"shutil.rmtree('checkpoints/demo', ignore_errors=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"params = dict()\n", | ||
"params['dir_name'] = 'demo'\n", | ||
"params['num_epochs'] = 40\n", | ||
"params['batch_size'] = 100\n", | ||
"params['eval_frequency'] = 200\n", | ||
"\n", | ||
"# Building blocks.\n", | ||
"params['filter'] = 'chebyshev5'\n", | ||
"params['brelu'] = 'b1relu'\n", | ||
"params['pool'] = 'apool1'\n", | ||
"\n", | ||
"# Number of classes.\n", | ||
"C = y.max() + 1\n", | ||
"assert C == np.unique(y).size\n", | ||
"\n", | ||
"# Architecture.\n", | ||
"params['F'] = [32, 64] # Number of graph convolutional filters.\n", | ||
"params['K'] = [20, 20] # Polynomial orders.\n", | ||
"params['p'] = [4, 2] # Pooling sizes.\n", | ||
"params['M'] = [512, C] # Output dimensionality of fully connected layers.\n", | ||
"\n", | ||
"# Optimization.\n", | ||
"params['regularization'] = 5e-4\n", | ||
"params['dropout'] = 1\n", | ||
"params['learning_rate'] = 1e-3\n", | ||
"params['decay_rate'] = 0.95\n", | ||
"params['momentum'] = 0.9\n", | ||
"params['decay_steps'] = n_train / params['batch_size']" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"model = cgcnn(L, **params)\n", | ||
"accuracy, loss, t_step = model.fit(X_train, y_train, X_val, y_val)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# 4 Evaluation\n", | ||
"\n", | ||
"We often want to monitor:\n", | ||
"1. The convergence, i.e. the training loss and the classification accuracy on the validation set.\n", | ||
"2. The performance, i.e. the classification accuracy on the testing set (to be compared with the training set accuracy to spot overfitting).\n", | ||
"\n", | ||
"The `model_perf` class in [utils.py](utils.py) can be used to compactly evaluate multiple models." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"fig, ax1 = plt.subplots(figsize=(15, 5))\n", | ||
"ax1.plot(accuracy, 'b.-')\n", | ||
"ax1.set_ylabel('validation accuracy', color='b')\n", | ||
"ax2 = ax1.twinx()\n", | ||
"ax2.plot(loss, 'g.-')\n", | ||
"ax2.set_ylabel('training loss', color='g')\n", | ||
"plt.show()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"print('Time per step: {:.2f} ms'.format(t_step*1000))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"res = model.evaluate(X_test, y_test)\n", | ||
"print(res[0])" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.4.3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |