Skip to content

Discuss Batch Standards in TFIO with Keras #382

@BryanCutler

Description

@BryanCutler

Following the discussion on #366 batching can serve different purposes and optimizing for each is not always done the same way.

Previously, we thought we will reuse the same batch concept for serving two purposes:
Batch is a way to improve performance for record based files such as parquet, feature, etc, as we really don't want to "read one integer or one float32 at a time". This is a caching issue.

We also want to directly feed the dataset to tf.keras so batch concept in tf.keras is reused (number of samples populated in neural network). This is a model parameter.
But those two batch concepts are different. @BryanCutler I am wondering if it makes sense to split those things out. In other words, for 1) we should "read as much record as possible, as long as it fits the memory", and for 2) we should do a rebatch() to adjust the batch_size that has been feeded to tf.keras?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions