Skip to content

v4.1.0

@Conchylicultor Conchylicultor tagged this 04 Nov 11:40
* It is now possible to manually download the data for all datasets (if the automated download fail for any reason). See [doc](https://www.tensorflow.org/datasets/overview#load_a_dataset).
* Simplification of the dataset creation API.
  * We've made it is easier to create datasets outside TFDS repository (see our updated [dataset creation guide](https://www.tensorflow.org/datasets/add_dataset)).
  * `_split_generators` should now returns `{'split_name': self._generate_examples(), ...}` (but current datasets are backward compatible).
  * All dataset inherit from `tfds.core.GeneratorBasedBuilder`. Converting a dataset to beam now only require changing `_generate_examples` (see [example and doc](https://www.tensorflow.org/datasets/beam_datasets#instructions)).
  * `tfds.core.SplitGenerator`, `tfds.core.BeamBasedBuilder` are deprecated and will be removed in future version.

* Better `pathlib.Path`, `os.PathLike` compatibility:
  * `dl_manager.manual_dir` now returns a pathlib-Like object. Example:

  ```python
  text = (dl_manager.manual_dir / 'downloaded-text.txt').read_text()
  ```

  * Note: Other `dl_manager.download`, `.extract`,... will return pathlib-like objects in future versions
  * `FeatureConnector`,... and most functions should accept `PathLike` objects. Let us know if some functions you need are missing.
  * Add a `tfds.core.as_path` to create pathlib.Path-like objects compatible with GCS (e.g. `tfds.core.as_path('gs://my-bucket/labels.csv').read_text()`).

* Other bug fixes and improvement. E.g.
  * Add `verify_ssl=` option to `tfds.download.DownloadConfig` to disable SSH certificate during download.
  * `BuilderConfig` are now compatible with Beam datasets #2348
  * `--record_checksums` now assume the new dataset-as-folder model
  * `tfds.features.Images` can accept encoded `bytes` images directly (useful when used with `img_name, img_bytes = dl_manager.iter_archive('images.zip')`).
  * Doc API now show deprecated methods, abstract methods to overwrite are now documented.
  * You can generate `imagenet2012` with only a single split (e.g. only the validation data). Other split will be skipped if not present.
* And of course, new datasets...

Thank you to all our contributors for improving TFDS!

PiperOrigin-RevId: 340614460
Assets 2
Loading