-
Notifications
You must be signed in to change notification settings - Fork 306
Description
With the upcoming DatasetV2 a lot of the APIs are getting simplified. That also opens up some additional possibilities than just passing the dataset to tf.keras.
One area of interest, is that we already have support for many columnized dataset, e.g, Arrow, Avro, Parquet, Json, HDF5, etc. Those dataset may potentially be standardized with the same API so that we could treat them homogeneously. For example, ArrowDataset already exposes a columns()
property method. We could apply the same to Avro, Parquet, Json, HDF5 etc. Thought?
Since those columnized dataset are largely numeric values, I think one area we also could have a common base class for those dataset, and support additional operations. For example, dataset_1 + dataset_2 => dataset_3 (add) where dataset_3 could be passed to tf.keras. The implementation could start with zip + map in python (not even needed in C++). Maybe this could be one use case that will help users?