Created using Colaboratory

Machine-Learning-Tokyo · Apr 26, 2020 · d0e54e1 · d0e54e1
1 parent 0c9b9ad
commit d0e54e1
Showing 1 changed file with 593 additions and 0 deletions.
diff --git a/Implementations/ShuffleNet/ShuffleNet_implementation.ipynb b/Implementations/ShuffleNet/ShuffleNet_implementation.ipynb
@@ -0,0 +1,593 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "ShuffleNet_implementation.ipynb",
+      "provenance": [],
+      "collapsed_sections": [],
+      "authorship_tag": "ABX9TyOrutzNgJDcisuQT8AODeZf",
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/Machine-Learning-Tokyo/CNN-Architectures/blob/master/Implementations/ShuffleNet/ShuffleNet_implementation.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "d8rFfy-yy5YZ",
+        "colab_type": "text"
+      },
+      "source": [
+        "# Implementation of ShuffleNet\n",
+        "\n",
+        "We will use the [tensorflow.keras Functional API](https://www.tensorflow.org/guide/keras/functional) to build ShuffleNet from the original paper: “[ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083)” by Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun.\n",
+        "\n",
+        "---\n",
+        "\n",
+        "In the paper we can read:\n",
+        "\n",
+        ">**[i]** “The first building block in each stage is applied with stride = 2. Other hyper-parameters within a stage stay the same, and for the next stage the output channels are doubled”.\n",
+        ">\n",
+        ">**[ii]** “Similar to [9], we set the number of bottleneck channels to 1/4 of the output channels for each ShuffleNet unit\"\n",
+        ">\n",
+        ">**[iii]** \"we add a Batch Normalization layer [15] after each of the convolutions to make end-to-end training easier.\"\n",
+        ">\n",
+        ">**[iv]** \"Note that for Stage 2, we do not apply group convolution on the first pointwise layer because the number of input channels is relatively small.\"\n",
+        "\n",
+        "<br>\n",
+        "\n",
+        "We will also make use of the following Table **[v]**:\n",
+        "\n",
+        "<img src=https://raw.githubusercontent.com/Machine-Learning-Tokyo/DL-workshop-series/master/Part%20I%20-%20Convolution%20Operations/images/ShuffleNet/ShuffleNet.png width=\"600\">\n",
+        "\n",
+        "<br>\n",
+        "\n",
+        "as well the following Diagrams **[vi]**\n",
+        "\n",
+        "<img src=https://raw.githubusercontent.com/Machine-Learning-Tokyo/DL-workshop-series/master/Part%20I%20-%20Convolution%20Operations/images/ShuffleNet/ShuffleNet_diagram_1.png width=\"600\">\n",
+        "\n",
+        "<sub>Figure 2. ShuffleNet Units. a) bottleneck unit [9] with depthwise convolution (DWConv) [3, 12]; b) ShuffleNet unit with pointwise group convolution (GConv) and channel shuffle; c) ShuffleNet unit with stride = 2.</sub>\n",
+        "\n",
+        "and **[vii]**\n",
+        "\n",
+        "<img src=https://raw.githubusercontent.com/Machine-Learning-Tokyo/DL-workshop-series/master/Part%20I%20-%20Convolution%20Operations/images/ShuffleNet/ShuffleNet_diagram_2.png width=\"600\">\n",
+        "\n",
+        "<sub>Figure 1. Channel shuffle with two stacked group convolutions. GConv stands for group convolution. a) two stacked convolution layers with the same number of groups. Each output channel only relates to the input channels within the group. No cross talk; b) input and output channels are fully related when GConv2 takes data from different groups after GConv1; c) an equivalent implementation to b) using channel shuffle.</sub>\n",
+        "\n",
+        "---\n",
+        "\n",
+        "## Network architecture\n",
+        "Based on **[v]** the model starts with a stem of Convolution-Max Pool and continues with a number of **Stages** before the final Global Pool-Fully Connected layers.\n",
+        "\n",
+        "Each **Stage** consists of two parts:\n",
+        "1. One **Shufflenet block** with strides 2\n",
+        "2. a number of repeated **Shufflenet blocks** with strides 1\n",
+        "\n",
+        "Each one of the right most columns of **[v]** corresponds to a model architecture with different number of internal groups (g). In our case we are going to implement the \"*g = 8*\" model, however the code will be general enough to support any other combination of number of:\n",
+        "- groups\n",
+        "- stages\n",
+        "- repetitions per stage\n",
+        "\n",
+        "### Shufflenet block\n",
+        "The Shufflenet block is the building block of this network. Similar to the ResNet block there are two variations of the block based on whether the spatial dimensions of the input tensor change (strides = 2) or not (strides = 1).\n",
+        "\n",
+        "In the first case we apply a 3x3 Average Pool with strides 2 at the shortcut connection as depicted at **[vi]**.\n",
+        "\n",
+        "The main branch of the block consists of:\n",
+        "1. 1x1 **Group Convolution** with 1/4 filters (GConv) followed by Batch Normalization and ReLU\n",
+        "2. **Channel Shuffle** operation (**[ii]**)\n",
+        "3. 3x3 DepthWise Convolution (with or w/o strides=2) followed by Batch Normalizaion\n",
+        "4. 1x1 **Group Convolution** followed by Batch Normalizaion\n",
+        "\n",
+        "The tensors of the main branch and the shortcut connection are then concatenated and a ReLU activation is applied to the output.\n",
+        "\n",
+        "### Group Convolution\n",
+        "The idea of *Group Convolution* is to separate the input tensor to g sub-tensors each one with $1/g$ distinct channels of the initial tesnsor. Then we apply a 1x1 Convolution to each sub-tensor and finally we concatenate all the sub-tensors together (**[vii]**).\n",
+        "\n",
+        "\n",
+        "### Channel Shuffle\n",
+        "Channel shuffle is an operation of shuffling the channels of the input tensor as shown at **[7b,c]**.\n",
+        "In order to shuffle the channels we\n",
+        "1. reshape the input tensor:\n",
+        ">from: `width x height x channels`\n",
+        ">\n",
+        ">to: `width x height x groups x (channels/groups)`\n",
+        "\n",
+        "2. prermute the last two dimensions\n",
+        "3. reshape the tensor to the original shape\n",
+        "\n",
+        "A simple example of the results of this operation can be seen at the following application of the operation on a 6-element array\n",
+        "\n",
+        "$$\n",
+        "\\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6\n",
+        "\\end{matrix}\n",
+        "$$\n",
+        "1. reshape to $groups \\times \\frac{n}{groups} (groups=2)$\n",
+        "<br>\n",
+        "<br>\n",
+        "$$\n",
+        "\\begin{matrix} \n",
+        "1 & 2 & 3 \\\\\n",
+        "4 & 5 & 6\n",
+        "\\end{matrix}\n",
+        "$$\n",
+        "<br>\n",
+        "2. prermute the dimensions\n",
+        "$$\n",
+        "\\begin{matrix} \n",
+        "1 & 4 \\\\\n",
+        "2 & 5 \\\\\n",
+        "3 & 6\n",
+        "\\end{matrix}\n",
+        "$$\n",
+        "<br>\n",
+        "3. reshape to the original shape\n",
+        "$$\n",
+        "\\begin{matrix} 1 & 4 & 2 & 5 & 3 & 6\n",
+        "\\end{matrix}\n",
+        "$$\n",
+        "\n",
+        "---\n",
+        "\n",
+        "## Workflow\n",
+        "We will:\n",
+        "1. import the neccesary layers\n",
+        "2. write a helper function for the **Stage**\n",
+        "3. write a helper function for the **Shufflenet block**\n",
+        "4. write a helper function for the **Group Convolution**\n",
+        "5. write a helper function for the **Channel Shuffle**\n",
+        "6. write the stem of the model\n",
+        "7. use the helper function to write the main part of the model\n",
+        "8. write the last part of the model and build it\n",
+        "\n",
+        "---\n",
+        "\n",
+        "### 1. Imports"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "vQswnhr_y0yd",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "from tensorflow.keras.layers import Input, Conv2D, DepthwiseConv2D, \\\n",
+        "     Dense, Concatenate, Add, ReLU, BatchNormalization, AvgPool2D, \\\n",
+        "     MaxPool2D, GlobalAvgPool2D, Reshape, Permute, Lambda"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "5aXdPws21LDo",
+        "colab_type": "text"
+      },
+      "source": [
+        "### 2. Stage\n",
+        "The Stage function will:\n",
+        "- take as inputs:\n",
+        "  - a tensor (**`x`**)\n",
+        "  - the number of channels (also called filters) (**`channels`**)\n",
+        "  - the number of repetitions of the second part of the stage (**`repetitions`**)\n",
+        "  - the number of groups for the Group Convolution blocks (**`groups`**)\n",
+        "- run:\n",
+        "  - apply a Shufflenet block with strides=2\n",
+        "  - apply **`repetitions`** times a Shufflenet block with strides=1\n",
+        "- return the tensor"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "6RgA6Qry1KK5",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "def stage(x, channels, repetitions, groups):\n",
+        "    x = shufflenet_block(x, channels=channels, strides=2, groups=groups)\n",
+        "    for i in range(repetitions):\n",
+        "        x = shufflenet_block(x, channels=channels, strides=1, groups=groups)\n",
+        "    return x"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "n_meO4Dj1RIw",
+        "colab_type": "text"
+      },
+      "source": [
+        "### 3. Shufflenet block\n",
+        "The Shufflenet block will:\n",
+        "- take as inputs:\n",
+        "  - a tensor (**`tensor`**)\n",
+        "  - the number of channels (**`channels`**)\n",
+        "  - the strides (**`strides`**)\n",
+        "  - the number of groups for the Group Convolution blocks (**`groups`**)\n",
+        "- run:\n",
+        "  - apply a Group Convolution block with 1/4 **`channels`** channels followed by *Batch Normalizaion-ReLU*\n",
+        "  - apply **`Channel Shuffle`** to this tensor\n",
+        "  - apply a *Depthwise Convolution* layer followed by *Batch Normalizaion*\n",
+        "  - if **`strides`** is 2:\n",
+        "    - subtract from **`channels`** the number of channels of **`tensor`** so that after the concatenation the output tensor will have **`channels`** channels\n",
+        "  - apply a Group Convolution block with **`channels`** channels  followed by *Batch Normalizaion*\n",
+        "  - if **`strides`** is 1:\n",
+        "    - *add* this tensor with the input **`tensor`**\n",
+        "  - else:\n",
+        "    - apply a 3x3 *Average Pool* with strides 2 (**[vi]**) to the input **`tensor`** and *concatenate* it with this tensor\n",
+        "  - apply *ReLU* activation to the tensor\n",
+        "- return the tensor\n",
+        "\n",
+        "Note that according to **[iv]** we should not apply Group Convolution to the first inupt (24 channels) and apply only the Convolution operation instead which we can code with a simple `if-else` statement. However, for the sake of clarity of the code we ommit it."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "MidGWtI01QNG",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "def shufflenet_block(tensor, channels, strides, groups):\n",
+        "    x = gconv(tensor, channels=channels // 4, groups=groups)\n",
+        "    x = BatchNormalization()(x)\n",
+        "    x = ReLU()(x)\n",
+        "\n",
+        "    x = channel_shuffle(x, groups)\n",
+        "    x = DepthwiseConv2D(kernel_size=3, strides=strides, padding='same')(x)\n",
+        "    x = BatchNormalization()(x)\n",
+        "\n",
+        "    if strides == 2:\n",
+        "        channels = channels - tensor.get_shape().as_list()[-1]\n",
+        "    x = gconv(x, channels=channels, groups=groups)\n",
+        "    x = BatchNormalization()(x)\n",
+        "\n",
+        "    if strides == 1:\n",
+        "        x = Add()([tensor, x])\n",
+        "    else:\n",
+        "        avg = AvgPool2D(pool_size=3, strides=2, padding='same')(tensor)\n",
+        "        x = Concatenate()([avg, x])\n",
+        "\n",
+        "    output = ReLU()(x)\n",
+        "    return output"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "LKeGLmw61ZH3",
+        "colab_type": "text"
+      },
+      "source": [
+        "### 4. Group Convolution\n",
+        "The Group Convolution function will:\n",
+        "- take as inputs:\n",
+        "  - a tensor (**`tensor`**)\n",
+        "  - the number of channels of the output tensor (**`channels`**)\n",
+        "  - the number of groups (**`groups`**)\n",
+        "- run:\n",
+        "  - get the number of channels (**`input_ch`**) of the input tensor using the get_shape() method\n",
+        "  - calculate the number of channels per group (**`group_ch`**) by dividing **`input_ch`** by **`groups`**\n",
+        "  - calculate how many channels will have each group after the Convolution layer. It will be equal to **`channels`** divided by **`groups`**\n",
+        "  - for every group:\n",
+        "    - get the **`group_tensor`** which will be a sub-tensor of **`tensor`** with specific channels\n",
+        "    - apply a 1x1 Convolution layer with **`output_ch`** channels\n",
+        "    - add the tensor to a list (**`groups_list`**)\n",
+        "  - *Concatenate* all the tensors of **`groups_list`** to one tensor\n",
+        "- return the tensor\n",
+        "\n",
+        "Note that there is a commented line in the code bellow. One can get a slice of a tensor by using the simple slicing notation `a[:, b:c, d:e]` but the code takes too long to run (as it is in the case of tensorflow.slice()). By using a Lambda layer and applying it on the tensor we have the same result but much faster."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "ZuQR_0RA1X6B",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "def gconv(tensor, channels, groups):\n",
+        "    input_ch = tensor.get_shape().as_list()[-1]\n",
+        "    group_ch = input_ch // groups\n",
+        "    output_ch = channels // groups\n",
+        "    groups_list = []\n",
+        "\n",
+        "    for i in range(groups):\n",
+        "        group_tensor = tensor[:, :, :, i * group_ch: (i+1) * group_ch]\n",
+        "        # group_tensor = Lambda(lambda x: x[:, :, :, i * group_ch: (i+1) * group_ch])(tensor)\n",
+        "        group_tensor = Conv2D(output_ch, 1)(group_tensor)\n",
+        "        groups_list.append(group_tensor)\n",
+        "\n",
+        "    output = Concatenate()(groups_list)\n",
+        "    return output"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "8MBbbsKc1gRp",
+        "colab_type": "text"
+      },
+      "source": [
+        "### 5. Channel Shuffle\n",
+        "The Channel Shuffle function will:\n",
+        "- take as inputs:\n",
+        "  - a tensor (**`x`**)\n",
+        "  - the number of groups (**`groups`**)\n",
+        "- run:\n",
+        "  - get the dimensions (**`width, height, channels`**) of the input tensor. Note that the first number of `x.get_shape().as_list()` will be the batch size.\n",
+        "  - calculate the number of channels per group (**`group_ch`**)\n",
+        "  - reshape **`x`** to **`width`** x **`height`** x **`group_ch`** x **`groups`**\n",
+        "  - permute the last two dimensions of the tensor (**`group_ch`** x **`groups`** -> **`groups`** x **`group_ch`**)\n",
+        "  - reshape **`x`** to its original shape (**`width`** x **`height`** x **`channels`**)\n",
+        "- return the tensor"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "-BB7EX861e5Y",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "def channel_shuffle(x, groups):  \n",
+        "    _, width, height, channels = x.get_shape().as_list()\n",
+        "    group_ch = channels // groups\n",
+        "\n",
+        "    x = Reshape([width, height, group_ch, groups])(x)\n",
+        "    x = Permute([1, 2, 4, 3])(x)\n",
+        "    x = Reshape([width, height, channels])(x)\n",
+        "    return x"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "j1eWD3pG1oxV",
+        "colab_type": "text"
+      },
+      "source": [
+        "### 6. Stem of the model\n",
+        "Now we can start coding the model. We will start with the model's stem. According to **[v]** the first layer of the model is a 3x3 Convolution layer with 24 filters followed by (**[iii]**) a BatchNormalization and a ReLU activation.\n",
+        "\n",
+        "The next layer is a 3x3 Max Pool with strides 2."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "ewhpWVNU1mEZ",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "input = Input([224, 224, 3])\n",
+        "x = Conv2D(filters=24, kernel_size=3, strides=2, padding='same')(input)\n",
+        "x = BatchNormalization()(x)\n",
+        "x = ReLU()(x)\n",
+        "x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ekh6w_tf1s-v",
+        "colab_type": "text"
+      },
+      "source": [
+        "### 7. Main part of the model\n",
+        "The main part of the model consists of **`Stage`** blocks. We first define the hyperparameters **`repetitions`**, **`initial_channels`** acoording to **[v]** and **`groups`**. Then for each number of repetitions we calculate the number of channels according to **[i]** and apply the `stage()` function on the tensor."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "YsksUjOG1qtD",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "repetitions = 3, 7, 3\n",
+        "initial_channels = 384\n",
+        "groups = 8\n",
+        "\n",
+        "for i, reps in enumerate(repetitions):\n",
+        "    channels = initial_channels * (2**i)\n",
+        "    x = stage(x, channels, reps, groups)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "XmKUOV5i1xKZ",
+        "colab_type": "text"
+      },
+      "source": [
+        "### 8. Rest of the model\n",
+        "The model closes with a Global Pool layer and a Fully Connected one with 1000 classes (**[v]**)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "rgl8DSfw1vWq",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "x = GlobalAvgPool2D()(x)\n",
+        "output = Dense(1000, activation='softmax')(x)\n",
+        "\n",
+        "from tensorflow.keras import Model\n",
+        "model = Model(input, output)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "YyCh_YAy1zS9",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "from tensorflow.keras.utils import plot_model\n",
+        "plot_model(model, show_shapes=True)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "nAtmHbd714j4",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Final code"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "ej7vmMrD13pv",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "from tensorflow.keras.layers import Input, Conv2D, DepthwiseConv2D, \\\n",
+        "     Dense, Concatenate, Add, ReLU, BatchNormalization, AvgPool2D, \\\n",
+        "     MaxPool2D, GlobalAvgPool2D, Reshape, Permute, Lambda\n",
+        "\n",
+        "\n",
+        "def stage(x, channels, repetitions, groups):\n",
+        "    x = shufflenet_block(x, channels=channels, strides=2, groups=groups)\n",
+        "    for i in range(repetitions):\n",
+        "        x = shufflenet_block(x, channels=channels, strides=1, groups=groups)\n",
+        "    return x\n",
+        "\n",
+        "\n",
+        "def shufflenet_block(tensor, channels, strides, groups):\n",
+        "    x = gconv(tensor, channels=channels // 4, groups=groups)\n",
+        "    x = BatchNormalization()(x)\n",
+        "    x = ReLU()(x)\n",
+        " \n",
+        "    x = channel_shuffle(x, groups)\n",
+        "    x = DepthwiseConv2D(kernel_size=3, strides=strides, padding='same')(x)\n",
+        "    x = BatchNormalization()(x)\n",
+        " \n",
+        "    if strides == 2:\n",
+        "        channels = channels - tensor.get_shape().as_list()[-1]\n",
+        "    x = gconv(x, channels=channels, groups=groups)\n",
+        "    x = BatchNormalization()(x)\n",
+        " \n",
+        "    if strides == 1:\n",
+        "        x = Add()([tensor, x])\n",
+        "    else:\n",
+        "        avg = AvgPool2D(pool_size=3, strides=2, padding='same')(tensor)\n",
+        "        x = Concatenate()([avg, x])\n",
+        " \n",
+        "    output = ReLU()(x)\n",
+        "    return output\n",
+        "\n",
+        "\n",
+        "def gconv(tensor, channels, groups):\n",
+        "    input_ch = tensor.get_shape().as_list()[-1]\n",
+        "    group_ch = input_ch // groups\n",
+        "    output_ch = channels // groups\n",
+        "    groups_list = []\n",
+        " \n",
+        "    for i in range(groups):\n",
+        "        # group_tensor = tensor[:, :, :, i * group_ch: (i+1) * group_ch]\n",
+        "        group_tensor = Lambda(lambda x: x[:, :, :, i * group_ch: (i+1) * group_ch])(tensor)\n",
+        "        group_tensor = Conv2D(output_ch, 1)(group_tensor)\n",
+        "        groups_list.append(group_tensor)\n",
+        " \n",
+        "    output = Concatenate()(groups_list)\n",
+        "    return output\n",
+        "\n",
+        "\n",
+        "def channel_shuffle(x, groups):  \n",
+        "    _, width, height, channels = x.get_shape().as_list()\n",
+        "    group_ch = channels // groups\n",
+        " \n",
+        "    x = Reshape([width, height, group_ch, groups])(x)\n",
+        "    x = Permute([1, 2, 4, 3])(x)\n",
+        "    x = Reshape([width, height, channels])(x)\n",
+        "    return x\n",
+        "\n",
+        "\n",
+        "input = Input([224, 224, 3])\n",
+        "x = Conv2D(filters=24, kernel_size=3, strides=2, padding='same')(input)\n",
+        "x = BatchNormalization()(x)\n",
+        "x = ReLU()(x)\n",
+        "x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)\n",
+        "\n",
+        "\n",
+        "repetitions = 3, 7, 3\n",
+        "initial_channels = 384\n",
+        "groups = 8\n",
+        " \n",
+        "for i, reps in enumerate(repetitions):\n",
+        "    channels = initial_channels * (2**i)\n",
+        "    x = stage(x, channels, reps, groups)\n",
+        "\n",
+        "\n",
+        "x = GlobalAvgPool2D()(x)\n",
+        "output = Dense(1000, activation='softmax')(x)\n",
+        " \n",
+        "from tensorflow.keras import Model\n",
+        "model = Model(input, output)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "CT33k-SY2D-K",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Model diagram\n",
+        "\n",
+        "<img src=\"https://raw.githubusercontent.com/Machine-Learning-Tokyo/CNN-Architectures/master/Implementations/ShuffleNet/ShuffleNet_diagram.svg?sanitize=true\">"
+      ]
+    }
+  ]
+}