-
Notifications
You must be signed in to change notification settings - Fork 306
Description
Following the discussion on #366 batching can serve different purposes and optimizing for each is not always done the same way.
Previously, we thought we will reuse the same batch concept for serving two purposes:
Batch is a way to improve performance for record based files such as parquet, feature, etc, as we really don't want to "read one integer or one float32 at a time". This is a caching issue.We also want to directly feed the dataset to tf.keras so batch concept in tf.keras is reused (number of samples populated in neural network). This is a model parameter.
But those two batch concepts are different. @BryanCutler I am wondering if it makes sense to split those things out. In other words, for 1) we should "read as much record as possible, as long as it fits the memory", and for 2) we should do a rebatch() to adjust the batch_size that has been feeded to tf.keras?