Skip to content

Commit

Permalink
Shuffle examples in pipelines shown in the intro notebook.
Browse files Browse the repository at this point in the history
This is good practice to avoid showing examples always in the same order, and to break sequential correlations that may be in the original dataset.

PiperOrigin-RevId: 340745971
  • Loading branch information
lamblin committed Nov 5, 2020
1 parent 3512a82 commit c3f62a1
Showing 1 changed file with 9 additions and 4 deletions.
13 changes: 9 additions & 4 deletions Intro_to_Metadataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,8 @@
"- **use_bilevel_ontology_list**: This is a list of booleans indicating whether corresponding dataset in `ALL_DATASETS` should use bilevel ontology. Omniglot is set up with a hierarchy with two level: the alphabet (Latin, Inuktitut...), and the character (with 20 examples per character).\n",
"The flag means that each episode will contain classes from a single alphabet. \n",
"- **use_dag_ontology_list**: This is a list of booleans indicating whether corresponding dataset in `ALL_DATASETS` should use dag_ontology. Same idea for ImageNet, except it uses the hierarchical sampling procedure described in the article.\n",
"- **image_size**: All images from various datasets are down or upsampled to the same size. This is the flag controls the edge size of the square."
"- **image_size**: All images from various datasets are down or upsampled to the same size. This is the flag controls the edge size of the square.\n",
"- **shuffle_buffer_size**: Controls the amount of shuffling among examples from any given class."
]
},
{
Expand All @@ -259,7 +260,9 @@
" use_dag_ontology_list=use_dag_ontology_list,\n",
" use_bilevel_ontology_list=use_bilevel_ontology_list,\n",
" episode_descr_config=variable_ways_shots,\n",
" split=SPLIT, image_size=84)"
" split=SPLIT,\n",
" image_size=84,\n",
" shuffle_buffer_size=300)"
]
},
{
Expand Down Expand Up @@ -358,7 +361,8 @@
"- `ADD_DATASET_OFFSET` controls whether the class_id's returned by the iterator overlaps among different datasets or not. A dataset specific offset is added in order to make returned ids unique.\n",
"- `make_multisource_batch_pipeline()` creates a `tf.data.Dataset` object that returns datasets of the form (Batch, data source ID) where similarly to the\n",
"episodic case, the data source ID is an integer Tensor that identifies which\n",
"dataset the given batch originates from."
"dataset the given batch originates from.\n",
"- `shuffle_buffer_size` controls the amount of shuffling done among examples from a given dataset (unlike for the episodic pipeline)."
]
},
{
Expand Down Expand Up @@ -387,7 +391,8 @@
"source": [
"dataset_batch = pipeline.make_multisource_batch_pipeline(\n",
" dataset_spec_list=all_dataset_specs, batch_size=BATCH_SIZE, split=SPLIT,\n",
" image_size=84, add_dataset_offset=ADD_DATASET_OFFSET)\n",
" image_size=84, add_dataset_offset=ADD_DATASET_OFFSET,\n",
" shuffle_buffer_size=1000)\n",
"\n",
"for idx, ((images, labels), source_id) in iterate_dataset(dataset_batch, 1):\n",
" print(images.shape, labels.shape)"
Expand Down

0 comments on commit c3f62a1

Please sign in to comment.