Skip to content

Commit

Permalink
Merge pull request tensorflow#3608 from sachinprasadhs:patch-2
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 417833128
  • Loading branch information
copybara-github committed Dec 22, 2021
2 parents ac907cc + 92ac5f3 commit ef6cd07
Show file tree
Hide file tree
Showing 4 changed files with 7 additions and 9 deletions.
2 changes: 1 addition & 1 deletion docs/determinism.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@
"id": "gAJTLLsuFeuP"
},
"source": [
"Note: Setting `shuffle_files=True` also [disable](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/dataset_builder.py?l=676\u0026rcl=354322021) `experimental_deterministic` in [`tf.data.Options`](https://www.tensorflow.org/api_docs/python/tf/data/Options) to give some performance boost. So even small datasets which only have a single shard (like mnist), become non-deterministic.\n",
"Note: Setting `shuffle_files=True` also [disable](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/core/dataset_builder.py?l=676\u0026rcl=354322021) `deterministic` in [`tf.data.Options`](https://www.tensorflow.org/api_docs/python/tf/data/Options) to give some performance boost. So even small datasets which only have a single shard (like mnist), become non-deterministic.\n",
"\n",
"See recipe below to get deterministic file shuffling."
]
Expand Down
5 changes: 2 additions & 3 deletions docs/performances.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,11 +116,10 @@ ds = tfds.load('imagenet2012', split='train', shuffle_files=True)
```

Additionally, when `shuffle_files=True`, TFDS disables
[`options.experimental_deterministic`](https://www.tensorflow.org/api_docs/python/tf/data/Options#experimental_deterministic),
[`options.deterministic`](https://www.tensorflow.org/api_docs/python/tf/data/Options#deterministic),
which may give a slight performance boost. To get deterministic shuffling, it is
possible to opt-out of this feature with `tfds.ReadConfig`: either by setting
`read_config.shuffle_seed` or overwriting
`read_config.options.experimental_deterministic`.
`read_config.shuffle_seed` or overwriting `read_config.options.deterministic`.

### Auto-shard your data across workers (TF)

Expand Down
5 changes: 2 additions & 3 deletions tensorflow_datasets/core/dataset_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -646,11 +646,10 @@ def lookup_nest(features: Dict[str, Any]) -> Tuple[Any, ...]:
# non-deterministic
# This code should probably be moved inside tfreader, such as
# all the tf.data.Options are centralized in a single place.
if (shuffle_files and
read_config.options.experimental_deterministic is None and
if (shuffle_files and read_config.options.deterministic is None and
read_config.shuffle_seed is None):
options = tf.data.Options()
options.experimental_deterministic = False
options.deterministic = False
ds = ds.with_options(options)
# If shuffle is False, keep the default value (deterministic), which
# allow the user to overwritte it.
Expand Down
4 changes: 2 additions & 2 deletions tensorflow_datasets/core/utils/read_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ class ReadConfig:
Attributes:
options: `tf.data.Options()`, dataset options to use. Note that when
`shuffle_files` is True and no seed is defined, experimental_deterministic
will be set to False internally, unless it is defined here.
`shuffle_files` is True and no seed is defined, deterministic will be set
to False internally, unless it is defined here.
try_autocache: If True (default) and the dataset satisfy the right
conditions (dataset small enough, files not shuffled,...) the dataset will
be cached during the first iteration (through `ds = ds.cache()`).
Expand Down

0 comments on commit ef6cd07

Please sign in to comment.