Skip to content

Commit d1a1d63

Browse files
author
Yongyao Jiang
committed
Corrected typo
1 parent 4717588 commit d1a1d63

File tree

2 files changed

+8
-8
lines changed

2 files changed

+8
-8
lines changed

guide/14-deep-learning/how-ssd-works.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@
5050
"metadata": {},
5151
"source": [
5252
"It's natural to think of building an object detection model on the top of an image classification model. Once we have a good image classifier, a simple way to detect objects is to slide a 'window' across the image and classify whether the image in that window (cropped out region of the image) is of the desired type. Sounds simple! Well, there are at least two problems: \n",
53-
"- (1) How do you know the **size of the window** so that it always contains the object? Different types of objects (palm tree and swimming pool), even the same type of objects (e.g. a smalle building and a large buidling) can be of varying sizes as well. \n",
53+
"- (1) How do you know the **size of the window** so that it always contains the object? Different types of objects (palm tree and swimming pool), even the same type of objects (e.g. a small building and a large buidling) can be of varying sizes as well. \n",
5454
"- (2) **Aspect ratio** (the ratio of height to width of a bounding box). A lot of objects can be present in various shapes like a building footprint will have a different aspect ratio than a palm tree.\n",
5555
"\n",
5656
"To solve these problems, we would have to try out different sizes/shapes of sliding window, which is very computationally intensive, especially with deep neural network. \n",
@@ -118,7 +118,7 @@
118118
"source": [
119119
"### Anchor box\n",
120120
"\n",
121-
"Each grid cell in SSD can be assigned with multiple anchor/prior boxes. These anchor boxes are pre-defined and each one is responsible for a size and shape within a grid cell. For example, the person in the image below corresponds to the taller anchor box while the car corresponds to the wider box.\n",
121+
"Each grid cell in SSD can be assigned with multiple anchor/prior boxes. These anchor boxes are pre-defined and each one is responsible for a size and shape within a grid cell. For example, the swimming pool in the image below corresponds to the taller anchor box while the building corresponds to the wider box.\n",
122122
"\n",
123123
"<img src=\"img/anchorbox.png\" height=\"480\" width=\"480\">\n",
124124
"<center>Figure 5. Example of two anchor boxes</center>\n",
@@ -149,7 +149,7 @@
149149
"<img src=\"img/receptive1.png\" height=\"500\" width=\"500\">\n",
150150
"<center>Figure 7. Visualizing CNN feature maps and receptive field</center>\n",
151151
"\n",
152-
"Receptive field is the central premise of the SSD architecture as it enables us to detect objects at different scales and output a tighter bounding box. Why? As you might still remember, the ResNet34 backbone outputs a 256 7x7 feature maps for an input image. If we specify a 4x4 grid, the simpliest approach is just to apply a convolution to this feature map and convert it to 4x4. This approach can actually work to some extent and is exatcly the idea of YOLO (You Only Look Once). The extra step taken by SSD is that it applies more convolutional layers to the backbone feature map and has each of these convolution layers output a object detection results. __As earlier layers bearing smaller receptive field can represent smaller sized objects, predictions from earlier layers help in dealing with smaller sized objects__.\n",
152+
"Receptive field is the central premise of the SSD architecture as it enables us to detect objects at different scales and output a tighter bounding box. Why? As you might still remember, the ResNet34 backbone outputs a 256 7x7 feature maps for an input image. If we specify a 4x4 grid, the simplest approach is just to apply a convolution to this feature map and convert it to 4x4. This approach can actually work to some extent and is exatcly the idea of YOLO (You Only Look Once). The extra step taken by SSD is that it applies more convolutional layers to the backbone feature map and has each of these convolution layers output a object detection results. __As earlier layers bearing smaller receptive field can represent smaller sized objects, predictions from earlier layers help in dealing with smaller sized objects__.\n",
153153
"\n",
154154
"Because of this, SSD allows us to define __a hierarchy of grid cells__ at different layers. For example, we could use a 4x4 grid to find smaller objects, a 2x2 grid to find mid sized objects and a 1x1 grid to find objects that cover the entire image. "
155155
]

guide/14-deep-learning/object-detection.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@
129129
"\n",
130130
"The `export_training_data()` method generates training samples for training deep learning models, given the input imagery, alongwith labeled vector data or classified images. Deep learning training samples are small subimages, called image chips, and contain the feature or class of interest. This tool creates folders containing image chips for training the model, labels and metadata files and stores them in the raster store of your enterprise GIS. The image chips are often small (e.g. 256x256), unless the training sample size is large. These training samples support model training workflows using the `arcgis.learn` package as well as by third-party deep learning libraies, such as TensorFlow or PyTorch. The supported models in `arcgis.learn` accept the **[PASCAL_VOC_rectangles](http://host.robots.ox.ac.uk/pascal/VOC/databases.html)** format for object detection models, a standardized image dataset for object class recognition. The label files are XML files containing information about image name, class value, and bounding boxes.\n",
131131
"\n",
132-
"In order to take advantage of pretrained models that have been trained on large image collections (e.g. ImageNet), we have to pick 3 bands from a multispectral imagery as those pretrained models are trained with images that only 3 RGB channels. The `extract_bands()` method can be used to specify which 3 bands should be extracted for fine tuning the models:"
132+
"In order to take advantage of pretrained models that have been trained on large image collections (e.g. ImageNet), we have to pick 3 bands from a multispectral imagery as those pretrained models are trained with images that have only 3 RGB channels. The `extract_bands()` method can be used to specify which 3 bands should be extracted for fine tuning the models:"
133133
]
134134
},
135135
{
@@ -248,7 +248,7 @@
248248
"source": [
249249
"### Find the good learning rate\n",
250250
"\n",
251-
"Now we have define a model architecture, we can start to train it. This process involves setting a good [learning rate](https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10). Picking a very small learning rate leads to very slow training of the model, while picking one that is too high can prevent the model from converging and 'overshoot' the minima where the loss (or error rate) is lowest. `arcgis.learn` includes fast.ai's learning rate finder, accessible through the model's `lr_find()` method, that helps in picking a good learning rate, without needing to experiment with several learning rates and picking from among them. "
251+
"Now we have defined a model architecture, we can start to train it. This process involves setting a good [learning rate](https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10). Picking a very small learning rate leads to very slow training of the model, while picking one that is too high can prevent the model from converging and 'overshoot' the minima where the loss (or error rate) is lowest. `arcgis.learn` includes fast.ai's learning rate finder, accessible through the model's `lr_find()` method, that helps in picking a good learning rate, without needing to experiment with several learning rates and picking from among them. "
252252
]
253253
},
254254
{
@@ -289,7 +289,7 @@
289289
"source": [
290290
"### Train the model\n",
291291
"\n",
292-
"As dicussed earlier, the idea of transfer learning is to fine-tune earlier layers of the pretrained model and focuses on training the newly added layers, meaning we need two different learning rates to better fit the model. We have already selected a good learning rate to train the later layers above (i.e. 0.02). An empirical value of lower learning rate for fine-tuning the ealier layers is usually one tenth of the higher rate. We choose 0.001 to be more careful not to disturb the weights of the pretrained backbone by too much. It can be adjusted depending upon how different the imagery is from natural images on which the backbone network is trained.\n",
292+
"As dicussed earlier, the idea of transfer learning is to fine-tune earlier layers of the pretrained model and focus on training the newly added layers, meaning we need two different learning rates to better fit the model. We have already selected a good learning rate to train the later layers above (i.e. 0.02). An empirical value of lower learning rate for fine-tuning the ealier layers is usually one tenth of the higher rate. We choose 0.001 to be more careful not to disturb the weights of the pretrained backbone by too much. It can be adjusted depending upon how different the imagery is from natural images on which the backbone network is trained.\n",
293293
"\n",
294294
"Training the network is an iterative process. We can train the model using its `fit()` method till the validation loss (or error rate) continues to go down with each training pass also known as epoch. This is indicative of the model learning the task. "
295295
]
@@ -611,7 +611,7 @@
611611
" conda install -c fastai fastai=1.0.39\n",
612612
" conda install -c arcgis arcgis=1.6.0 --no-pin \n",
613613
"\n",
614-
"The code below shows how we can use distributed raster analytics to automate the detection of well pade for different dates, across a large geographical area and create a feature layer of well pad detections that can be used for further analysis within ArcGIS. "
614+
"The code below shows how we can use distributed raster analytics to automate the detection of well pad for different dates, across a large geographical area and create a feature layer of well pad detections that can be used for further analysis within ArcGIS. "
615615
]
616616
},
617617
{
@@ -713,7 +713,7 @@
713713
"name": "python",
714714
"nbconvert_exporter": "python",
715715
"pygments_lexer": "ipython3",
716-
"version": "3.7.2"
716+
"version": "3.6.7"
717717
},
718718
"toc": {
719719
"base_numbering": 1,

0 commit comments

Comments
 (0)