Skip to content

Release 0.8.0

Compare
Choose a tag to compare
@zoyahav zoyahav released this 28 Jun 20:33
· 907 commits to master since this release

Major Features and Improvements

  • Add TFTransformOutput utility class that wraps the output of tf.Transform for
    use in training. This makes it easier to consume the output written by
    tf.Transform (see update examples for usage).
  • Increase efficiency of quantiles (and therefore bucketize).

Bug Fixes and Other Changes

  • Change tft.sum/tft.mean/tft.var to only support basic numeric types.
  • Widen the output type of tft.sum for some input types to avoid overflow
    and/or to preserve precision.
  • For int32 and int64 input types, change the output type of tft.mean/
    tft.var/tft.scale_to_z_score from float64 to float32 .
  • Change the output type of tft.size to be always int64.
  • Context now accepts passthrough_keys which can be used when additional
    information should be attached to dataset instances in the pipeline which
    should not be part of the transformation graph, for example: instance keys.
  • In addition to using TFTransformOutput, the examples demonstrate new workflows
    where a vocabulary is computed, but not applied, in the preprocessing_fn.
  • Added dependency on the absl-py package.
  • TransformTestCase test cases can now be parameterized.
  • Add support for partitioned variables when loading a model.
  • Export the coders subpackage so that users can access it as tft.coders,
    e.g. tft.coders.ExampleProtoCoder.
  • Setting dtypes for numpy arrays in tft.coders.ExampleProtoCoder and
    tft.coders.CsvCoder.
  • tft.mean, tft.max and tft.var now support tf.SparseTensor.
  • Update examples to use "core" TensorFlow estimator API (tf.estimator).
  • Depends on protobuf>=3.6.0<4.

Breaking changes

  • apply_saved_transform is removed. See note on
    partially_apply_saved_transform in the Deprecations section.
  • No longer set vocabulary_file in IntDomain when using
    tft.compute_and_apply_vocabulary or tft.apply_vocabulary.
  • Requires pre-installed TensorFlow >=1.8,<2.

Deprecations

  • The expected_asset_file_contents of
    TransformTestCase.assertAnalyzeAndTransformResults has been deprecated, use
    expected_vocab_file_contents instead.
  • transform_fn_io.TRANSFORMED_METADATA_DIR and
    transform_fn_io.TRANSFORM_FN_DIR should not be used, they are now aliases
    for TFTransformOutput.TRANSFORMED_METADATA_DIR and
    TFTransformOutput.TRANSFORM_FN_DIR respectively.
  • partially_apply_saved_transform is deprecated, users should use the
    transform_raw_features method of TFTransformOuptut instead. These differ
    in that partially_apply_saved_transform can also be used to return both the
    input placeholders and the outputs. But users do not need this functionality
    because they will typically create the input placeholders themselves based
    on the feature spec.
  • Renamed tft.uniques to tft.vocabulary, tft.string_to_int to
    tft.compute_and_apply_vocabulary and tft.apply_vocab to
    tft.apply_vocabulary. The existing methods will remain for a few more minor
    releases but are now deprecated and should get migrated away from.