Skip to content

Commit

Permalink
Merge pull request #1954 from chrisdubois/minor-text-edits
Browse files Browse the repository at this point in the history
Minor edits in documentation.
  • Loading branch information
antinucleon committed Apr 25, 2016
2 parents 1124e17 + f71bae0 commit f317c96
Show file tree
Hide file tree
Showing 4 changed files with 45 additions and 45 deletions.
48 changes: 24 additions & 24 deletions doc/python/io.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@
MXNet Python Data Loading API
=============================
* [Introduction](#introduction) introduces the main feature of data loader in MXNet.
* [Parameters For Data Iterator](#parameters-for-data-iterator) clarifies the different usages for dataiter parameters.
* [Introduction](#introduction) introduces the main features of data loaders in MXNet.
* [Parameters For Data Iterator](#parameters-for-data-iterator) clarifies the different usages for data iterator parameters.
* [Create A Data Iterator](#create-a-data-iterator) introduces how to create a data iterator in MXNet python.
* [How To Get Data](#how-to-get-data) introduces the data resource and data preparation tools.
* [IO API Reference](#io-api-reference) reference for the IO API and their explanation.
* [IO API Reference](#io-api-reference) provides a useful reference for the IO API.

Introduction
------------
This page will introduce data input method in MXNet. MXNet use iterator to provide data to the neural network. Iterators do some preprocessing and generate batch for the neural network.
This page will introduce data input methods in MXNet. MXNet uses iterators to provide data to the neural network. Iterators do some preprocessing and generate batches for the neural network.

* We provide basic iterators for MNIST image and RecordIO image.
* To hide the IO cost, prefetch strategy is used to allow parallelism of learning process and data fetching. Data will automatically fetched by an independent thread.
* We provide basic iterators for MNIST images and RecordIO images.
* To hide the IO cost, a prefetch strategy is used to allow parallelism in the learning process as well as data fetching. Data is then automatically fetched by an independent thread.

Parameters For Data Iterator
----------------------------
Parameters For Data Iterators
-----------------------------

Generally to create a data iterator, you need to provide five kinds of parameters:

* **Dataset Param** gives the basic information for the dataset, e.g. file path, input shape.
* **Batch Param** gives the information to form a batch, e.g. batch size.
* **Augmentation Param** tells which augmentation operations(e.g. crop, mirror) should be taken on an input image.
* **Backend Param** controls the behavior of the backend threads to hide data loading cost.
* **Auxiliary Param** provides options to help checking and debugging.
* **Augmentation Param** tells which augmentation operations (e.g. crop, mirror) should be taken on an input image.
* **Backend Param** controls the behavior of the backend threads to hide data loading costs.
* **Auxiliary Param** provides options that are useful for debugging.

Usually, **Dataset Param** and **Batch Param** MUST be given, otherwise data batch can't be create. Other parameters can be given according to algorithm and performance need. Examples and detail explanation of the options will be provided in the later Section.
Usually, **Dataset Param** and **Batch Param** are required, otherwise a data batch can't be created. Other parameters can be given, depending on the needs of the algorithm. Examples and detailed explanation of the options will be provided in later sections.

Create A Data Iterator
----------------------
Expand Down Expand Up @@ -81,32 +81,32 @@ The following code gives an example of creating a Cifar data iterator.
>>> prefetch_buffer=1)
```

From the above code, we could find how to create a data iterator. First, you need to explicitly point out what kind of data(MNIST, ImageRecord etc) to be fetched. Then provide the options about the dataset, batching, image augmentation, multi-tread processing and prefetching. Our code will automatically check the validity of the params, if a compulsary param is missing, an error will occur.
The above code illustrates how to create a data iterator. First, you need to explicitly point out what kind of data (MNIST, ImageRecord etc) to fetch. Then you need to provide options about the dataset, batching, image augmentation, multi-thread processing and prefetching. MXNet will automatically check the validity of the params; if a required parameter is missing, an error will occur.

How To Get Data
---------------

We provide the [script](../../tests/python/common/get_data.py) to download MNIST data and Cifar10 ImageRecord data. If you would like to create your own dataset, Image RecordIO data format is recommended.
We provide the [script](../../tests/python/common/get_data.py) to download MNIST data and Cifar10 ImageRecord data. If you would like to create your own dataset, the Image RecordIO data format is recommended.

## Create Dataset Using RecordIO
## Create A Dataset Using RecordIO

RecordIO implements a file format for a sequence of records. We recommend storing images as records and pack them together. The benefits are:
RecordIO implements a file format for a sequence of records. We recommend storing images as records and packing them together. The benefits are:

* Storing images in compacted format, e.g. JPEG, for records can have different size. Compacted format will greatly reduce the dataset size in disk.
* Packing data together allow continous reading on the disk.
* RecordIO has a simple way of partition, which makes it easier for distributed setting. Example about this will be provided later.
* Storing images in compacted format, e.g. JPEG, for records can have different size. A compacted format will greatly reduce the dataset size on disk.
* Packing data together allows continous reading from the disk.
* RecordIO has a simple partitioning scheme, which makes it easier for distributed settings. Examples will be provided later.

We provide the [im2rec tool](../../tools/im2rec.cc) to create Image RecordIO dataset by yourself. Here's the walkthrough:

### 0.Before you start
### 0. Before you start
Make sure you have downloaded the data. You don't need to resize the images by yourself, currently ```im2rec``` could resize it automatically. You could check the promoting message of ```im2rec``` for details.

### 1.Make the image list
### 1. Make the image list
After you get the data, you need to make a image list file first. The format is
```
integer_image_index \t label_index \t path_to_image
```
In general, the program will take a list of names of all image, shuffle them, then separate them into training files name list and testing file name list. Write down the list in the format.
In general, the program will take a list of names of all image, shuffle them, then separate them into training file name list and testing file name list. Write down the list in the format.

A sample file is provided here
```bash
Expand All @@ -123,8 +123,8 @@ A sample file is provided here

```

### 2.Make the binary file
To generate binary image, you need to use *im2rec* in the tool folder. The im2rec will take the path of _image list file_ you generated just now, _root path_ of the images and the _output file path_ as input. These processes usually take several hours, so be patient. :)
### 2. Make the binary file
To generate binary images, you need to use *im2rec* in the tool folder. The im2rec will take the path of _image list file_ you generated just now, _root path_ of the images and the _output file path_ as input. These processes usually take several hours, so be patient. :)

A sample command:
```bash
Expand Down
8 changes: 4 additions & 4 deletions doc/python/model.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ modules to make neural network training easy.
Train a Model
-------------
To train a model, you can follow two steps, first a configuration using symbol,
then call ```model.Feedforward.create``` to create a model for you.
The following example creates a two layer neural networks.
then call ```model.FeedForward.create``` to create a model for you.
The following example creates a two layer neural network.

```python
# configure a two layer neuralnetwork
Expand Down Expand Up @@ -46,7 +46,7 @@ For more information, you can refer to [Model API Reference](#model-api-referenc
Save the Model
--------------
It is important to save your work after the job done.
To save the model, you can directly pickle it if you like the pythonic way.
To save the model, you can directly pickle it in a pythonic way.
We also provide a save and load function.

```python
Expand Down Expand Up @@ -76,7 +76,7 @@ model = mx.model.FeedForward.create(
iter_end_callback=mx.callback.do_checkpoint(prefix),
...)
```
You can load the model checkpoint later using ```Feedforward.load```.
You can load the model checkpoint later using ```FeedForward.load```.

Use Multiple Devices
--------------------
Expand Down
6 changes: 3 additions & 3 deletions doc/python/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ can write a program as if it were only a single thread, and MXNet will
automatically dispatch it to multiple devices such as multiple GPU cards or multiple
machines.

This is achieved by lazy evaluation. Any operation we write down is issued to a
This is achieved by lazy evaluation. Any operation we write down is issued to an
internal engine, and then returned. For example, if we run `a += 1`, it
returns immediately after pushing the plus operation to the engine. This
asynchronicity allows us to push more operations to the engine, so it can determine
Expand Down Expand Up @@ -239,7 +239,7 @@ and may accept other hyperparameters such as the number of hidden neurons (*num_
or the activation type (*act_type*).

The symbol can be viewed simply as a function taking several arguments whose
names are automatically generated and can be got by
names are automatically generated and can be obtained by

```python
>>> net.list_arguments()
Expand Down Expand Up @@ -480,7 +480,7 @@ update on key: 9
```

### Multiple machines
Base on parameter server. The `updater` will runs on the server nodes.
Based on parameter server. The `updater` will run on the server nodes.
This section will be updated when the distributed version is ready.


Expand Down
28 changes: 14 additions & 14 deletions docs/packages/python/symbol_in_pictures.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
Symbolic Configuration and Execution in Pictures
================================================
This is a self-contained tutorial that explains the Symbolic construction and execution in pictures.
You are recommend to read this together with [Symbolic API](symbol.md).
You are recommended to read this together with [Symbolic API](symbol.md).

Compose Symbols
---------------
The symbols are description of computation we want to do. The symbolic construction API generates the computation
graph that describes the need of computation. The following picture is how we compose symbols to describe basic computations.
Symbols are a description of computation we want to do. The symbolic construction API generates the computation
graph that describes what computation is needed. The following picture shows how we compose symbols to describe basic computations.

![Symbol Compose](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/symbol/compose_basic.png)

- The ```mxnet.symbol.Variable``` function creates argument nodes that represents inputs to the computation.
- The ```mxnet.symbol.Variable``` function creates argument nodes that represent input to the computation.
- The Symbol is overloaded with basic element-wise arithmetic operations.

Configure Neural Nets
---------------------
Besides fine-grained operations, mxnet also provide a way to perform big operations that is analogy to layers in neural nets.
Besides fine-grained operations, mxnet also provide a way to perform big operations that is analogous to layers in neural nets.
We can use these operators to describe a neural net configuration.

![Net Compose](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/symbol/compose_net.png)
Expand All @@ -30,8 +30,8 @@ The following is an example of configuring multiple input neural nets.

Bind and Execute Symbol
-----------------------
When we need to execute a symbol graph. We call bind function to bind ```NDArrays``` to the argument nodes
to get a ```Executor```.
When we need to execute a symbol graph, we call the bind function to bind ```NDArrays``` to the argument nodes
to obtain an ```Executor```.

![Bind](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/symbol/bind_basic.png)

Expand All @@ -47,30 +47,30 @@ get outputs of both.

![MultiOut](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/symbol/executor_multi_out.png)

But always remember, only bind what you need, so system can do more optimizations for you.
But remember: only bind what you need, so that the system can do more optimizations for you.


Calculate Gradient
Calculate the Gradient
------------------
You can specify gradient holder NDArrays in bind, then call ```Executor.backward``` after ```Executor.forward```
In the bind function, you can specify NDArrays that will hold gradients. Calling ```Executor.backward``` after ```Executor.forward```
will give you the corresponding gradients.

![Gradient](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/symbol/executor_backward.png)


Simple Bind Interface for Neural Nets
-------------------------------------
Sometimes it is tedious to pass the argument NDArrays to the bind function. Especially when you are binding a big
graph like neural nets. ```Symbol.simple_bind``` provides a way to simplify
Sometimes it is tedious to pass the argument NDArrays to the bind function, especially when you are binding a big
graph. ```Symbol.simple_bind``` provides a way to simplify
the procedure. You only need to specify input data shapes, and the function will allocate the arguments, and bind
the Executor for you.

![SimpleBind](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/symbol/executor_simple_bind.png)

Auxiliary States
----------------
Auxiliary states are just like arguments, except that you cannot take gradient of them. These are states that may
not be part of computation, but can be helpful to track. You can pass the auxiliary state in the same way as arguments.
Auxiliary states are just like arguments, except that you cannot take the gradient of them. These are states that may
not be part of the computation, yet can be helpful to track. You can pass the auxiliary state in the same way as arguments.

![SimpleBind](https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/symbol/executor_aux_state.png)

Expand Down

0 comments on commit f317c96

Please sign in to comment.