Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

DataBatch and NDArrayIter doc modified #6091

Merged
merged 13 commits into from
May 10, 2017
Merged

Conversation

Roshrini
Copy link
Member

@Roshrini Roshrini commented May 3, 2017

@@ -80,7 +80,22 @@ def get_list(shapes, types):
return [DataDesc(x[0], x[1]) for x in shapes]

class DataBatch(object):
"""A data batch.
"""Returns a batch of data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a class, you are not describing what is does but what it encapsulates.

"""A data batch.
"""Returns a batch of data.

MXNet's data iterator returns a batch of data in each `next` call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in -> for

If not provided, the order of arg_names of the executor is assumed.
When working with Module this is the order of the data_names argument.
The *i*-th element describes the name and shape of ``data[i]``.
If not provided, by default the order of `arg_names` of the executor is assumed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma after default.

"""Returns a batch of data.

MXNet's data iterator returns a batch of data in each `next` call.
This data often contains `batch_size` number of examples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever be different from batch_size?

examples read is less than the batch size.
The number of examples padded at the end of a batch. It is used when the
total number of examples read is not divisible by the `batch_size`.
These are ignored in the result.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"These extra padded examples are ignored during processing."

Is the above a correct statement to make?

MXNet's data iterator returns a batch of data for each `next` call.
This data contains `batch_size` number of examples.

If the input data consists of images then, these images should be stored in a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comma issue -> " If the input data consists of images, then these images..."

>>> labels = np.ones([10, 1])
>>> dataiter = mx.io.NDArrayIter(datas, labels, 3, True, last_batch_handle='discard')
>>> dataiter
<mxnet.io.NDArrayIter object at 0x10bb2fd90>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it descriptive here to have an example where every component of every example has value 1?

>>> for batch in dataiter:
... batchidx += 1
...
>>> batchidx # Padding added after the examples read are over
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this more clear? "Padding added after the examples read are over"

This data contains `batch_size` number of examples.

If the input data consists of images then, these images should be stored in a
4-D matrix of shape ``(batch_size, num_channel, height, width)``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the layout. if provide_data gives DataDesc(layout='NHWC') then its (batch_size,, height, width, num_channel)


Example usage:
----------
>>> class CustomBatch(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't want users to do this. Use mx.io.DataBatch

index : numpy.array, optional
The example indices in this batch.
bucket_key : int, optional
The key of the bucket, used for bucket IO.
The bucket key, used for bucketing module.
provide_data : list of (name, shape), optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is deprecated.
It should be a list of DataDesc now

@Roshrini
Copy link
Member Author

Roshrini commented May 4, 2017

Addressed all comments.

When working with Module this is the order of the label_names argument.
The bucket key, used for bucketing module.
provide_data : list of `DataDesc`, optional
A list of `DataDesc` objects having attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain what the DataDescs are for

>>> labels = np.ones([10, 1])
>>> dataiter = mx.io.NDArrayIter(data, labels, 3, True, last_batch_handle='discard')
>>> dataiter
<mxnet.io.NDArrayIter object at 0x10bb2fd90>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we generally don't need to show object pointers like this. Users know what they are because they are just constructed in previous line.

name, shape, type and layout information of the data.
provide_label : list of `DataDesc`, optional
A list of `DataDesc` objects. `DataDesc` is used to store
name, shape, type and layout information of the data.
Copy link
Contributor

@piiswrong piiswrong May 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data -> label
The i-th elements describes the ... of data[i].
keep this sentence

If `layout` is set to 'NHWC' then, images should be stored in a 4-D matrix
of shape ``(batch_size, height, width, num_channel)``.
The channels are often in RGB order.

Parameters
----------
data : list of NDArray
Copy link
Member

@nswamy nswamy May 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list of NDArray, each array containing batch_size examples.

If `layout` is set to 'NHWC' then, images should be stored in a 4-D matrix
of shape ``(batch_size, height, width, num_channel)``.
The channels are often in RGB order.

Parameters
----------
data : list of NDArray
A list of input data.
label : list of NDArray, optional
Copy link
Member

@nswamy nswamy May 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list of NDArray, each array often containing a 1-dimensional array.

@piiswrong piiswrong merged commit b798eca into apache:master May 10, 2017
bikestra pushed a commit to bikestra/mxnet that referenced this pull request May 10, 2017
* DataBatch and NDArrayIter doc modified

* fixes after review

* fixes after review

* wording changed

* some more fixes

* improvement

* desc fix

* Datadesc info added

* minor addition

* fix

* fix

* fix after review
rishita pushed a commit to rishita/mxnet that referenced this pull request May 10, 2017
* DataBatch and NDArrayIter doc modified

* fixes after review

* fixes after review

* wording changed

* some more fixes

* improvement

* desc fix

* Datadesc info added

* minor addition

* fix

* fix

* fix after review
piiswrong pushed a commit that referenced this pull request May 11, 2017
* updated docstring for set_lr_mult and set_wd_mult

* updated docstring per review

* Fixed imdecode crash bug when flag=0 (#6134)

* Fix (#6131)

* Docs for MXRecordIO, MXIndexedRecordIO modified (#6013)

* docs for MXIndexedRecordIO modified

* changes after review

* recordIO doc modified

* changes after review

* lint error

* minor change

* minor change after review

* empty commit to retrigger build

* changes after review

* Update documentation for mx.callback.Speedometer. (#6058)

* Update documentation for mx.callback.Speedometer.

* Minor doc changes.

* Use module instead of model in example code.

* update doc for Load (#6092)

* Installation instructions for MacOS and Cloud (#6012)

* Fix NDArray bool checking (#6130)

* fix shape order bug (#6136)

* TOC click unfold (#6133)

* [doc] new sphnix plugin  (#6105)

* update doc

* rm

* update

* update ndarray

* update mds

* update

* update

* update

* update

* update

* update

* update image.md and others

* update

* [doc] use debug mode to build (#6151)

* move ctc loss to contrib (#6154)

* Fix for invalid numpy float indexing (#6144)

* Fix python3 compatibilities (#6143)

* [doc] small changes to tutorials (#6164)

* [doc] Fix left toc link (#6162)

* [example]ADD practical functions and options for speech_recognition example (#6141)

* ADD practical functions and options for speech_recognition example

* add missing stt_bi_graphemes_util.py and deepspeech.cfg template

* Added reflection padding (#6123)

* Added reflection padding

* Lint fix

* Added 5d reflection padding

* Added failure in forward/backward for input dimensions other than 4 of 5

* Improved sanity check readability

* Fixing LICENSE file and adding NOTICE (#6172)

* Creating NOTICE. 

When code moves to Apache, it will need adjusting to the Apache format.

* Replacing source header with full license text

* doc improvement - softmax, metrics, and initializer (#5945)

* doc improvement, softmaxoutput, initializer-constant, minor fixes

* doc improvement, metrics

* fix softmax doc, fix metric lint

* softmax more fixes

* add doc change in initializer.py. some minor fix in softmax_cross_entropy

* doc change in initializer.py

* fix grammer

* fix

* fix

* fix

* minor fix

* fix

* minor fix

* DataBatch and NDArrayIter doc modified (#6091)

* DataBatch and NDArrayIter doc modified

* fixes after review

* fixes after review

* wording changed

* some more fixes

* improvement

* desc fix

* Datadesc info added

* minor addition

* fix

* fix

* fix after review

* [Scala] Change version to 0.9.5-SNAPSHOT (#6173)

* [scala] change version to 0.9.5-SNAPSHOT

* API doc improvement Dropout and SoftmaxActivation (#6088)

* doc improve for dropout oper

* doc improve for SoftmaxActivation oper

* fix

* fix

* Update documentation for mx.callback.do_checkpoint (#6059)

* Update documentation for mx.callback.do_checkpoint

* Use module instead of model for example code.

* Update documentation for plot_graph. (#6098)

* Update documentation for plot_graph.

* Minor doc fix.

* Restruct get started (#6167)

* Change get started page

* Small fix

* Improve

* Update documentation of Initializer.dumps() (#6128)

* Doc Improvement - RMSProp and RMSPropAlex (#6107)

* rmsprop

* rmsprop alex

* add link in optimizer.py

* fix

* fix

* missed fix..

* Docforcs,fft,ifft (#6145)

* fft.cc

* add all

* changed the description of set_lr_mult and set_wd_mult

* Explicitly specify quiet in R install_version (#6171)
saurabh3949 pushed a commit to saurabh3949/mxnet that referenced this pull request May 23, 2017
* DataBatch and NDArrayIter doc modified

* fixes after review

* fixes after review

* wording changed

* some more fixes

* improvement

* desc fix

* Datadesc info added

* minor addition

* fix

* fix

* fix after review
saurabh3949 pushed a commit to saurabh3949/mxnet that referenced this pull request May 23, 2017
* updated docstring for set_lr_mult and set_wd_mult

* updated docstring per review

* Fixed imdecode crash bug when flag=0 (apache#6134)

* Fix (apache#6131)

* Docs for MXRecordIO, MXIndexedRecordIO modified (apache#6013)

* docs for MXIndexedRecordIO modified

* changes after review

* recordIO doc modified

* changes after review

* lint error

* minor change

* minor change after review

* empty commit to retrigger build

* changes after review

* Update documentation for mx.callback.Speedometer. (apache#6058)

* Update documentation for mx.callback.Speedometer.

* Minor doc changes.

* Use module instead of model in example code.

* update doc for Load (apache#6092)

* Installation instructions for MacOS and Cloud (apache#6012)

* Fix NDArray bool checking (apache#6130)

* fix shape order bug (apache#6136)

* TOC click unfold (apache#6133)

* [doc] new sphnix plugin  (apache#6105)

* update doc

* rm

* update

* update ndarray

* update mds

* update

* update

* update

* update

* update

* update

* update image.md and others

* update

* [doc] use debug mode to build (apache#6151)

* move ctc loss to contrib (apache#6154)

* Fix for invalid numpy float indexing (apache#6144)

* Fix python3 compatibilities (apache#6143)

* [doc] small changes to tutorials (apache#6164)

* [doc] Fix left toc link (apache#6162)

* [example]ADD practical functions and options for speech_recognition example (apache#6141)

* ADD practical functions and options for speech_recognition example

* add missing stt_bi_graphemes_util.py and deepspeech.cfg template

* Added reflection padding (apache#6123)

* Added reflection padding

* Lint fix

* Added 5d reflection padding

* Added failure in forward/backward for input dimensions other than 4 of 5

* Improved sanity check readability

* Fixing LICENSE file and adding NOTICE (apache#6172)

* Creating NOTICE. 

When code moves to Apache, it will need adjusting to the Apache format.

* Replacing source header with full license text

* doc improvement - softmax, metrics, and initializer (apache#5945)

* doc improvement, softmaxoutput, initializer-constant, minor fixes

* doc improvement, metrics

* fix softmax doc, fix metric lint

* softmax more fixes

* add doc change in initializer.py. some minor fix in softmax_cross_entropy

* doc change in initializer.py

* fix grammer

* fix

* fix

* fix

* minor fix

* fix

* minor fix

* DataBatch and NDArrayIter doc modified (apache#6091)

* DataBatch and NDArrayIter doc modified

* fixes after review

* fixes after review

* wording changed

* some more fixes

* improvement

* desc fix

* Datadesc info added

* minor addition

* fix

* fix

* fix after review

* [Scala] Change version to 0.9.5-SNAPSHOT (apache#6173)

* [scala] change version to 0.9.5-SNAPSHOT

* API doc improvement Dropout and SoftmaxActivation (apache#6088)

* doc improve for dropout oper

* doc improve for SoftmaxActivation oper

* fix

* fix

* Update documentation for mx.callback.do_checkpoint (apache#6059)

* Update documentation for mx.callback.do_checkpoint

* Use module instead of model for example code.

* Update documentation for plot_graph. (apache#6098)

* Update documentation for plot_graph.

* Minor doc fix.

* Restruct get started (apache#6167)

* Change get started page

* Small fix

* Improve

* Update documentation of Initializer.dumps() (apache#6128)

* Doc Improvement - RMSProp and RMSPropAlex (apache#6107)

* rmsprop

* rmsprop alex

* add link in optimizer.py

* fix

* fix

* missed fix..

* Docforcs,fft,ifft (apache#6145)

* fft.cc

* add all

* changed the description of set_lr_mult and set_wd_mult

* Explicitly specify quiet in R install_version (apache#6171)
@Roshrini Roshrini deleted the io-docs branch July 14, 2017 05:19
Guneet-Dhillon pushed a commit to Guneet-Dhillon/mxnet that referenced this pull request Sep 13, 2017
* DataBatch and NDArrayIter doc modified

* fixes after review

* fixes after review

* wording changed

* some more fixes

* improvement

* desc fix

* Datadesc info added

* minor addition

* fix

* fix

* fix after review
Guneet-Dhillon pushed a commit to Guneet-Dhillon/mxnet that referenced this pull request Sep 13, 2017
* updated docstring for set_lr_mult and set_wd_mult

* updated docstring per review

* Fixed imdecode crash bug when flag=0 (apache#6134)

* Fix (apache#6131)

* Docs for MXRecordIO, MXIndexedRecordIO modified (apache#6013)

* docs for MXIndexedRecordIO modified

* changes after review

* recordIO doc modified

* changes after review

* lint error

* minor change

* minor change after review

* empty commit to retrigger build

* changes after review

* Update documentation for mx.callback.Speedometer. (apache#6058)

* Update documentation for mx.callback.Speedometer.

* Minor doc changes.

* Use module instead of model in example code.

* update doc for Load (apache#6092)

* Installation instructions for MacOS and Cloud (apache#6012)

* Fix NDArray bool checking (apache#6130)

* fix shape order bug (apache#6136)

* TOC click unfold (apache#6133)

* [doc] new sphnix plugin  (apache#6105)

* update doc

* rm

* update

* update ndarray

* update mds

* update

* update

* update

* update

* update

* update

* update image.md and others

* update

* [doc] use debug mode to build (apache#6151)

* move ctc loss to contrib (apache#6154)

* Fix for invalid numpy float indexing (apache#6144)

* Fix python3 compatibilities (apache#6143)

* [doc] small changes to tutorials (apache#6164)

* [doc] Fix left toc link (apache#6162)

* [example]ADD practical functions and options for speech_recognition example (apache#6141)

* ADD practical functions and options for speech_recognition example

* add missing stt_bi_graphemes_util.py and deepspeech.cfg template

* Added reflection padding (apache#6123)

* Added reflection padding

* Lint fix

* Added 5d reflection padding

* Added failure in forward/backward for input dimensions other than 4 of 5

* Improved sanity check readability

* Fixing LICENSE file and adding NOTICE (apache#6172)

* Creating NOTICE. 

When code moves to Apache, it will need adjusting to the Apache format.

* Replacing source header with full license text

* doc improvement - softmax, metrics, and initializer (apache#5945)

* doc improvement, softmaxoutput, initializer-constant, minor fixes

* doc improvement, metrics

* fix softmax doc, fix metric lint

* softmax more fixes

* add doc change in initializer.py. some minor fix in softmax_cross_entropy

* doc change in initializer.py

* fix grammer

* fix

* fix

* fix

* minor fix

* fix

* minor fix

* DataBatch and NDArrayIter doc modified (apache#6091)

* DataBatch and NDArrayIter doc modified

* fixes after review

* fixes after review

* wording changed

* some more fixes

* improvement

* desc fix

* Datadesc info added

* minor addition

* fix

* fix

* fix after review

* [Scala] Change version to 0.9.5-SNAPSHOT (apache#6173)

* [scala] change version to 0.9.5-SNAPSHOT

* API doc improvement Dropout and SoftmaxActivation (apache#6088)

* doc improve for dropout oper

* doc improve for SoftmaxActivation oper

* fix

* fix

* Update documentation for mx.callback.do_checkpoint (apache#6059)

* Update documentation for mx.callback.do_checkpoint

* Use module instead of model for example code.

* Update documentation for plot_graph. (apache#6098)

* Update documentation for plot_graph.

* Minor doc fix.

* Restruct get started (apache#6167)

* Change get started page

* Small fix

* Improve

* Update documentation of Initializer.dumps() (apache#6128)

* Doc Improvement - RMSProp and RMSPropAlex (apache#6107)

* rmsprop

* rmsprop alex

* add link in optimizer.py

* fix

* fix

* missed fix..

* Docforcs,fft,ifft (apache#6145)

* fft.cc

* add all

* changed the description of set_lr_mult and set_wd_mult

* Explicitly specify quiet in R install_version (apache#6171)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants